Glue

Amazon Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to move data between data stores. It automatically discovers and categorizes your data, then suggests schemas for it and keeps track of your data as it changes over time. With Amazon Glue, you can create and run an ETL job with a few clicks in the AWS Management Console. The service handles provisioning, monitoring, and maintenance of the resources needed to run your ETL jobs. Amazon Glue is designed to be used with Amazon S3 and Amazon Redshift, but it can also be used with other data stores. The service is serverless, so you pay only for the resources you use and there is no need to provision or manage infrastructure.

Terraform Name

aws_glue_catalog_database

Glue

attributes:

The following arguments are supported:

catalog_id - (Optional) ID of the Glue Catalog to create the database in. If omitted, this defaults to the AWS Account ID.
create_table_default_permission - (Optional) Creates a set of default permissions on the table for principals. See create_table_default_permission below.
description - (Optional) Description of the database.
location_uri - (Optional) Location of the database (for example, an HDFS path).
name - (Required) Name of the database. The acceptable characters are lowercase letters, numbers, and the underscore character.
parameters - (Optional) List of key-value pairs that define parameters and properties of the database.
target_database - (Optional) Configuration block for a target database for resource linking. See target_database below.

target_database

catalog_id - (Required) ID of the Data Catalog in which the database resides.
database_name - (Required) Name of the catalog database.

create_table_default_permission

permissions - (Optional) The permissions that are granted to the principal.
principal - (Optional) The principal who is granted permissions.. See principal below.

principal

data_lake_principal_identifier - (Optional) An identifier for the Lake Formation principal.

‍

Associating resources with a

Glue

Resources do not "belong" to a

Glue

Rather, one or more Security Groups are associated to a resource.

Create

Glue

via Terraform:

The following HCL creates a Glue Catalog Database Resource with default permissions

Syntax:

resource "aws_glue_catalog_database" "aws_glue_catalog_database" {
name = "MyCatalogDatabase"

create_table_default_permission {
permissions = ["SELECT"]

principal {
data_lake_principal_identifier = "IAM_ALLOWED_PRINCIPALS"
}
}
}

Create

Glue

via CLI:

Parametres:

create-database
[--catalog-id <value>]
--database-input <value>
[--tags <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]
[--debug]
[--endpoint-url <value>]
[--no-verify-ssl]
[--no-paginate]
[--output <value>]
[--query <value>]
[--profile <value>]
[--region <value>]
[--version <value>]
[--color <value>]
[--no-sign-request]
[--ca-bundle <value>]
[--cli-read-timeout <value>]
[--cli-connect-timeout <value>]
[--cli-binary-format <value>]
[--no-cli-pager]
[--cli-auto-prompt]
[--no-cli-auto-prompt]

Example:

aws glue create-database \
--database-input "{\"Name\":\"tempdb\"}" \
--profile my_profile \
--endpoint https://glue.us-east-1.amazonaws.com

Costs

The cost of using Glue depends on several factors, including the amount of data processed, the number of ETL jobs run, and the number of data catalog API requests made. For data processing, you are charged for the number of Data Processing Units (DPUs) used to run your ETL jobs. Each DPU provides a certain amount of computing and memory resources, and the cost per DPU-Hour. For the data catalog, you are charged for the number of API requests made and the amount of data stored in the catalog.

Glue

Terraform Name

target_database

create_table_default_permission

principal

Costs

Indirect Cost