CloudWiki
Resource

SageMaker Notebook

Amazon Web Services
Compute
A SageMaker notebook instance is a machine learning (ML) compute instance running the Jupyter Notebook App, a web-based interactive computing platform that allows editing and running notebook documents via a web browser. SageMaker is a fully managed machine learning service that allows data scientists and developers to easily build and train machine learning models and directly deploy them into a production-ready hosted environment.date your models.
Terraform Name
terraform
aws_sagemaker_notebook_instance
SageMaker Notebook
attributes:
  • name - (Required) The name of the notebook instance (must be unique).
  • role_arn - (Required) The ARN of the IAM role to be used by the notebook instance which allows SageMaker to call other services on your behalf.
  • instance_type - (Required) The name of ML compute instance type.
  • platform_identifier - (Optional) The platform identifier of the notebook instance runtime environment. This value can be either notebook-al1-v1, notebook-al2-v1, or notebook-al2-v2, depending on which version of Amazon Linux you require.
  • volume_size - (Optional) The size, in GB, of the ML storage volume to attach to the notebook instance. The default value is 5 GB.
  • subnet_id - (Optional) The VPC subnet ID.
  • security_groups - (Optional) The associated security groups.
  • accelerator_types - (Optional) A list of Elastic Inference (EI) instance types to associate with this notebook instance. See Elastic Inference Accelerator for more details. Valid values: ml.eia1.medium, ml.eia1.large, ml.eia1.xlarge, ml.eia2.medium, ml.eia2.large, ml.eia2.xlarge.
  • additional_code_repositories - (Optional) An array of up to three Git repositories to associate with the notebook instance. These can be either the names of Git repositories stored as resources in your account, or the URL of Git repositories in AWS CodeCommit or in any other Git repository. These repositories are cloned at the same level as the default repository of your notebook instance.
  • default_code_repository - (Optional) The Git repository associated with the notebook instance as its default code repository. This can be either the name of a Git repository stored as a resource in your account, or the URL of a Git repository in AWS CodeCommit or in any other Git repository.
  • direct_internet_access - (Optional) Set to Disabled to disable internet access to notebook. Requires security_groups and subnet_id to be set. Supported values: Enabled (Default) or Disabled. If set to Disabled, the notebook instance will be able to access resources only in your VPC, and will not be able to connect to Amazon SageMaker training and endpoint services unless your configure a NAT Gateway in your VPC.
  • instance_metadata_service_configuration - (Optional) Information on the IMDS configuration of the notebook instance. Conflicts with instance_metadata_service_configuration. see details below.
  • kms_key_id - (Optional) The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt the model artifacts at rest using Amazon S3 server-side encryption.
  • lifecycle_config_name - (Optional) The name of a lifecycle configuration to associate with the notebook instance.
  • root_access - (Optional) Whether root access is Enabled or Disabled for users of the notebook instance. The default value is Enabled.
  • tags - (Optional) A map of tags to assign to the resource. If configured with a provider default_tags configuration block present, tags with matching keys will overwrite those defined at the provider-level.

instance_metadata_service_configuration

  • minimum_instance_metadata_service_version - (Optional) Indicates the minimum IMDS version that the notebook instance supports. When passed "1" is passed. This means that both IMDSv1 and IMDSv2 are supported. Valid values are 1 and 2.

Associating resources with a
SageMaker Notebook
Resources do not "belong" to a
SageMaker Notebook
Rather, one or more Security Groups are associated to a resource.
Create
SageMaker Notebook
via Terraform:
The following HCL creates a SageMaker Notebook Instance resource
Syntax:

resource "aws_sagemaker_notebook_instance" "ni" {
 name          = "my-notebook-instance"
 role_arn      = aws_iam_role.role.arn
 instance_type = "ml.t2.medium"

 tags = {
   Name = "foo"
 }
}

Create
SageMaker Notebook
via CLI:
Parametres:

create-notebook-instance
--notebook-instance-name <value>
--instance-type <value>
[--subnet-id <value>]
[--security-group-ids <value>]
--role-arn <value>
[--kms-key-id <value>]
[--tags <value>]
[--lifecycle-config-name <value>]
[--direct-internet-access <value>]
[--volume-size-in-gb <value>]
[--accelerator-types <value>]
[--default-code-repository <value>]
[--additional-code-repositories <value>]
[--root-access <value>]
[--platform-identifier <value>]
[--instance-metadata-service-configuration <value>]
[--cli-input-json | --cli-input-yaml]
[--generate-cli-skeleton <value>]
[--debug]
[--endpoint-url <value>]
[--no-verify-ssl]
[--no-paginate]
[--output <value>]
[--query <value>]
[--profile <value>]
[--region <value>]
[--version <value>]
[--color <value>]
[--no-sign-request]
[--ca-bundle <value>]
[--cli-read-timeout <value>]
[--cli-connect-timeout <value>]
[--cli-binary-format <value>]
[--no-cli-pager]
[--cli-auto-prompt]
[--no-cli-auto-prompt]

Example:
aws cost
Costs
The cost of using SageMaker Notebook instances depends on several factors, including the number and size of notebook instances, the amount of CPU and memory required by your notebooks, and the amount of data stored in Amazon S3. For Amazon SageMaker Notebook instances, you are charged based on the hourly rate for the instance type and the number of instances you run. The cost of instances varies depending on the instance type and the region you are using. For CPU and memory resources, you are charged based on the amount of CPU and memory resources required by your notebooks. The cost of CPU and memory resources varies depending on the instance type and the region you are using. For data storage in Amazon S3, you are charged for the amount of data stored and the number of requests made to access the data. The cost of data storage varies depending on the region you are using.
Direct Cost

<Region>-Notebk:<Instance_Type>

<Region>-Notebk:VolumeUsage.gp2

Indirect Cost
No items found.
Best Practices for
SageMaker Notebook

Categorized by Availability, Security & Compliance and Cost

Low
Access allowed from VPN
No items found.
Low
Auto Scaling Group not in use
No items found.
Medium
Connections towards DynamoDB should be via VPC endpoints
No items found.
Medium
Container in CrashLoopBackOff state
No items found.
Low
EC2 with GPU capabilities
No items found.
Medium
EC2 with high privileged policies
No items found.
Medium
ECS cluster delete alarm
No items found.
Critical
ECS task with Admin access (*:*)
Medium
ECS task with high privileged policies
No items found.
Critical
EKS cluster delete alarm
No items found.
Medium
ElastiCache cluster delete alarm
No items found.
Medium
Ensure Container liveness probe is configured
No items found.
Medium
Ensure ECS task definition has memory limit
No items found.
Critical
Ensure EMR cluster master nodes are not publicly accessible
No items found.
More from
Amazon Web Services