CloudWiki
Resource

Redshift

Amazon Web Services
Analytics
Amazon Redshift is a fully-managed, cloud-based data warehousing service that allows businesses and organizations to store and analyze large amounts of data in a highly scalable and cost-effective way. Redshift uses columnar storage and massively parallel processing to deliver fast query performance on large datasets.With Redshift, you can store petabytes of structured and semi-structured data and run complex queries across your data warehouse using SQL. It integrates with various data sources, including Amazon S3, Amazon DynamoDB, and Amazon EMR, making it easy to load data from multiple sources and analyze it using popular business intelligence tools like Tableau and Power BI.Redshift provides high availability and automatic backup and recovery capabilities, and it scales up or down automatically to handle changes in demand. This makes it an ideal choice for businesses and organizations that need to store and analyze large volumes of data, including e-commerce websites, financial institutions, healthcare providers, and media and entertainment companies.
Terraform Name
terraform
aws_redshift_cluster
Redshift
attributes:

The following arguments are supported:

  • cluster_identifier - (Required) The Cluster Identifier. Must be a lower case string.
  • database_name - (Optional) The name of the first database to be created when the cluster is created. If you do not provide a name, Amazon Redshift will create a default database called dev.
  • default_iam_role_arn - (Optional) The Amazon Resource Name (ARN) for the IAM role that was set as default for the cluster when the cluster was created.
  • node_type - (Required) The node type to be provisioned for the cluster.
  • cluster_type - (Optional) The cluster type to use. Either single-node or multi-node.
  • master_password - (Required unless a snapshot_identifier is provided) Password for the master DB user. Note that this may show up in logs, and it will be stored in the state file. Password must contain at least 8 chars and contain at least one uppercase letter, one lowercase letter, and one number.
  • master_username - (Required unless a snapshot_identifier is provided) Username for the master DB user.
  • cluster_security_groups - (Optional) A list of security groups to be associated with this cluster.
  • vpc_security_group_ids - (Optional) A list of Virtual Private Cloud (VPC) security groups to be associated with the cluster.
  • cluster_subnet_group_name - (Optional) The name of a cluster subnet group to be associated with this cluster. If this parameter is not provided the resulting cluster will be deployed outside virtual private cloud (VPC).
  • availability_zone - (Optional) The EC2 Availability Zone (AZ) in which you want Amazon Redshift to provision the cluster. For example, if you have several EC2 instances running in a specific Availability Zone, then you might want the cluster to be provisioned in the same zone in order to decrease network latency. Can only be changed if availability_zone_relocation_enabled is true.
  • availability_zone_relocation_enabled - (Optional) If true, the cluster can be relocated to another availabity zone, either automatically by AWS or when requested. Default is false. Available for use on clusters from the RA3 instance family.
  • preferred_maintenance_window - (Optional) The weekly time range (in UTC) during which automated cluster maintenance can occur. Format: ddd:hh24:mi-ddd:hh24:mi
  • cluster_parameter_group_name - (Optional) The name of the parameter group to be associated with this cluster.
  • automated_snapshot_retention_period - (Optional) The number of days that automated snapshots are retained. If the value is 0, automated snapshots are disabled. Even if automated snapshots are disabled, you can still create manual snapshots when you want with create-cluster-snapshot. Default is 1.
  • port - (Optional) The port number on which the cluster accepts incoming connections. Valid values are between 1115 and 65535. The cluster is accessible only via the JDBC and ODBC connection strings. Part of the connection string requires the port on which the cluster will listen for incoming connections. Default port is 5439.
  • cluster_version - (Optional) The version of the Amazon Redshift engine software that you want to deploy on the cluster. The version selected runs on all the nodes in the cluster.
  • allow_version_upgrade - (Optional) If true , major version upgrades can be applied during the maintenance window to the Amazon Redshift engine that is running on the cluster. Default is true.
  • apply_immediately - (Optional) Specifies whether any cluster modifications are applied immediately, or during the next maintenance window. Default is false.
  • aqua_configuration_status - (Optional) The value represents how the cluster is configured to use AQUA (Advanced Query Accelerator) after the cluster is restored. Possible values are enabled, disabled, and auto. Requires Cluster reboot.
  • number_of_nodes - (Optional) The number of compute nodes in the cluster. This parameter is required when the ClusterType parameter is specified as multi-node. Default is 1.
  • publicly_accessible - (Optional) If true, the cluster can be accessed from a public network. Default is true.
  • encrypted - (Optional) If true , the data in the cluster is encrypted at rest.
  • enhanced_vpc_routing - (Optional) If true , enhanced VPC routing is enabled.
  • kms_key_id - (Optional) The ARN for the KMS encryption key. When specifying kms_key_id, encrypted needs to be set to true.
  • elastic_ip - (Optional) The Elastic IP (EIP) address for the cluster.
  • skip_final_snapshot - (Optional) Determines whether a final snapshot of the cluster is created before Amazon Redshift deletes the cluster. If true , a final cluster snapshot is not created. If false , a final cluster snapshot is created before the cluster is deleted. Default is false.
  • final_snapshot_identifier - (Optional) The identifier of the final snapshot that is to be created immediately before deleting the cluster. If this parameter is provided, skip_final_snapshot must be false.
  • snapshot_identifier - (Optional) The name of the snapshot from which to create the new cluster.
  • snapshot_cluster_identifier - (Optional) The name of the cluster the source snapshot was created from.
  • owner_account - (Optional) The AWS customer account used to create or copy the snapshot. Required if you are restoring a snapshot you do not own, optional if you own the snapshot.
  • iam_roles - (Optional) A list of IAM Role ARNs to associate with the cluster. A Maximum of 10 can be associated to the cluster at any time.
  • logging - (Optional) Logging, documented below.
  • maintenance_track_name - (Optional) The name of the maintenance track for the restored cluster. When you take a snapshot, the snapshot inherits the MaintenanceTrack value from the cluster. The snapshot might be on a different track than the cluster that was the source for the snapshot. For example, suppose that you take a snapshot of a cluster that is on the current track and then change the cluster to be on the trailing track. In this case, the snapshot and the source cluster are on different tracks. Default value is current.
  • manual_snapshot_retention_period - (Optional) The default number of days to retain a manual snapshot. If the value is -1, the snapshot is retained indefinitely. This setting doesn't change the retention period of existing snapshots. Valid values are between -1 and 3653. Default value is -1.
  • snapshot_copy - (Optional) Configuration of automatic copy of snapshots from one region to another. Documented below.
  • tags - (Optional) A map of tags to assign to the resource. If configured with a provider default_tags configuration block present, tags with matching keys will overwrite those defined at the provider-level.

Nested Blocks

logging

  • enable - (Required) Enables logging information such as queries and connection attempts, for the specified Amazon Redshift cluster.
  • bucket_name - (Optional, required when enable is true and log_destination_type is s3) The name of an existing S3 bucket where the log files are to be stored. Must be in the same region as the cluster and the cluster must have read bucket and put object permissions. For more information on the permissions required for the bucket, please read the AWS documentation
  • s3_key_prefix - (Optional) The prefix applied to the log file names.
  • log_destination_type - (Optional) The log destination type. An enum with possible values of s3 and cloudwatch.
  • log_exports - (Optional) The collection of exported log types. Log types include the connection log, user log and user activity log. Required when log_destination_type is cloudwatch. Valid log types are connectionlog, userlog, and useractivitylog.

snapshot_copy

  • destination_region - (Required) The destination region that you want to copy snapshots to.
  • retention_period - (Optional) The number of days to retain automated snapshots in the destination region after they are copied from the source region. Defaults to 7.
  • grant_name - (Optional) The name of the snapshot copy grant to use when snapshots of an AWS KMS-encrypted cluster are copied to the destination region.

Associating resources with a
Redshift
Resources do not "belong" to a
Redshift
Rather, one or more Security Groups are associated to a resource.
Create
Redshift
via Terraform:
The following HCL provides a Redshift Cluster resource

Syntax:

resource "aws_redshift_cluster" "example" {
 cluster_identifier = "tf-redshift-cluster"
 database_name      = "mydb"
 master_username    = "exampleuser"
 master_password    = "Mustbe8characters"
 node_type          = "dc1.large"
 cluster_type       = "single-node"
}

Create
Redshift
via CLI:
Parametres:

create-cluster
[--db-name <value>]
--cluster-identifier <value>
[--cluster-type <value>]
--node-type <value>
--master-username <value>
--master-user-password <value>
[--cluster-security-groups <value>]
[--vpc-security-group-ids <value>]
[--cluster-subnet-group-name <value>]
[--availability-zone <value>]
[--preferred-maintenance-window <value>]
[--cluster-parameter-group-name <value>]
[--automated-snapshot-retention-period <value>]
[--manual-snapshot-retention-period <value>]
[--port <value>]
[--cluster-version <value>]
[--allow-version-upgrade | --no-allow-version-upgrade]
[--number-of-nodes <value>]
[--publicly-accessible | --no-publicly-accessible]
[--encrypted | --no-encrypted]
[--hsm-client-certificate-identifier <value>]
[--hsm-configuration-identifier <value>]
[--elastic-ip <value>]
[--tags <value>]
[--kms-key-id <value>]
[--enhanced-vpc-routing | --no-enhanced-vpc-routing]
[--additional-info <value>]
[--iam-roles <value>]
[--maintenance-track-name <value>]
[--snapshot-schedule-identifier <value>]
[--availability-zone-relocation | --no-availability-zone-relocation]
[--aqua-configuration-status <value>]
[--default-iam-role-arn <value>]
[--load-sample-data <value>]
[--cli-input-json <value>]
[--generate-cli-skeleton <value>]
[--debug]
[--endpoint-url <value>]
[--no-verify-ssl]
[--no-paginate]
[--output <value>]
[--query <value>]
[--profile <value>]
[--region <value>]
[--version <value>]
[--color <value>]
[--no-sign-request]
[--ca-bundle <value>]
[--cli-read-timeout <value>]
[--cli-connect-timeout <value>]

Example:

aws redshift create-cluster --node-type dw.hs1.xlarge --number-of-nodes 2 --master-username adminuser --master-user-password TopSecret1 --cluster-identifier mycluster

aws cost
Costs
The cost of using Amazon Redshift is based on several factors, including the size of the cluster, the type of nodes used, and the amount of data stored and transferred. Here are some of the key cost factors: Cluster size: The size of the Redshift cluster you choose will impact the overall cost. You can choose from different node types, ranging from small to extra-large, and the number of nodes you use can be scaled up or down based on your needs. Node type: The cost of each node varies depending on its specifications, such as CPU, memory, and storage capacity. The more powerful the node, the higher the cost. Data storage: You will be charged for the amount of data you store in Redshift, based on the amount of data you store per month. Data transfer: You will be charged for data transfer into and out of Redshift, including data transfers between regions. Reserved nodes: If you commit to a certain number of nodes for a 1-year or 3-year term, you can save up to 75% of the hourly rate compared to On-Demand pricing. It's worth noting that AWS provides a free trial of Redshift, which allows you to explore the service and test it with your data for two months, with up to 750 hours of usage per month. Additionally, AWS offers pricing calculators and detailed documentation to help you estimate and manage your Redshift costs.
Direct Cost

<Region>-Node:<Node_Type>

Indirect Cost
No items found.
Best Practices for
Redshift

Categorized by Availability, Security & Compliance and Cost

Low
Access allowed from VPN
No items found.
Low
Auto Scaling Group not in use
No items found.
Medium
Connections towards DynamoDB should be via VPC endpoints
No items found.
Medium
Container in CrashLoopBackOff state
No items found.
Low
EC2 with GPU capabilities
No items found.
Medium
EC2 with high privileged policies
No items found.
Medium
ECS cluster delete alarm
No items found.
Critical
ECS task with Admin access (*:*)
Medium
ECS task with high privileged policies
No items found.
Critical
EKS cluster delete alarm
No items found.
Medium
ElastiCache cluster delete alarm
No items found.
Medium
Ensure Container liveness probe is configured
No items found.
Medium
Ensure ECS task definition has memory limit
No items found.
Critical
Ensure EMR cluster master nodes are not publicly accessible
No items found.
More from
Amazon Web Services