Documentation

AWS S3

IMPORTANT: This article covers setup of warehouse for load data from Improvado, not customer data warehouse from which data is being extracted. This article doesn't cover setup of customer data warehouse for Data Prep as well.

Required information

  • Title
  • AWS Access Key ID
  • AWS Secret Access Key
  • AWS Region
  • Bucket Name
  • ~Bucket Name can only contain letters, numbers, dashes, and dots and must start and end with a letter or a number
  • ~Bucket Name length must be between 3 and 63 characters
  • Folder
  • ~A forward slash ( / ) means that you need to enter the root path
  • File format
  • File name
  • Separator (optional)
  • ~The maximum length of the separator is 2 characters
  • Partition by (the way of splitting data for uploading to the file)
  • Encryption
  • Encryption Key
  • ~Only for KMS with customer managed key encryption

Folder

Possible parameters:

  • /{{ data_source }}/{{ data_table_title }}/{{report_type}}/{{ YYYY }}/{{ MM }}/{{ DD }}
  • ~{ data_source } is a data provider, integration, connector
  • ~{ data_table } is an object that contains all extraction orders with the same granularity (dimensional schema)
  • ~{report_type} is a set of such fields as metrics, properties, dimensions, etc.

If you use /{YYYY}/{MM}/{DD} settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.

Data structure of S3 storage by Improvado

By request to the support team we are able to support different folder structure in a bucket.

File format

Possible formats:

  • csv
  • csv+gzip
  • json
  • json+gzip
  • parquet

File name

Possible parameters:

  • {{filename}}-{{YYYY}}-{{MM}}-{{DD}}
  • ~{ filename } is the same as destination table name

IMPORTANT: you cannot use {{ DD }} for partition by month

  • ~{{filename}}-{{YYYY}}-{{MM}}-{{DD}} – for partition by day
  • ~{{filename}}-{{YYYY}}-{{MM}} – for partition by month

Also, you can use “_” instead of “-” or do not use any symbols at all, for example:

  • {{filename}}_{{YYYY}}-{{MM}}-{{DD}}
  • {{filename}}{{YYYY}}{{MM}}{{DD}}

Partition by

Possible ways of splitting data:

  • Day
  • Month

Encryption

  • No storage encryption - not recommended unless the S3 bucket has default encryption
  • SSE-S3 - recommended server-side encryption
  • SSE-KMS - make sure to give the required permissions for AWS managed key
  • SSE-KMS with customer managed key - make sure to give the required permissions for an encryption key

How to provide credentials (3 options)

1st option

SSE-S3 and SSE-KMS (with AWS managed and customer managed keys) encryption types can be used.

Create a user in the your AWS account for Improvado and provides following information:

  • Bucket Name
  • Access Key Id
  • Secret Access Key
  • Region

Created user should have following permissions for the S3 bucket:

  • s3:GetObject
  • s3:PutObject
  • s3:ListBucket
  • s3:DeleteObject

2nd option

SSE-S3 and SSE-KMS (with customer managed keys) encryption types can be used.

Share Read and Write access to the S3 bucket with Improvado’s AWS account (account id=716309063777).
Required information:

  • Bucket Name
  • Region
Instruction
  1. Create bucket
  2. Open bucket permissions
  1. Click to edit bucket policy
  1. Paste policy example that presented below
  2. Change your-bucket-name to your real bucket name
  3. Save changes
Policy example:

{% code-block language="json" %}
{
   "Id": "Policy1569503459134",
   "Version": "2012-10-17",
   "Statement": [
       {
           "Sid": "S3Access",
           "Action": [
               "s3:GetObject*",
               "s3:DeleteObject",
               "s3:ListBucket*",
               "s3:PutObject*",
               "s3:ListBucketMultipartUploads",
               "s3:ListMultipartUploadParts",
               "s3:AbortMultipartUpload"
           ],
           "Effect": "Allow",
           "Resource": [
               "arn:aws:s3:::your-bucket-name",
               "arn:aws:s3:::your-bucket-name/*"
           ],
           "Principal": {
               "AWS": [
                   "716309063777"
               ]
           }
       }
   ]
}
{% code-block-end %}

For SSE-KMS you have to share access to your KMS key using one of the methods below.

Note that AWS allows to share only customer managed KMS keys (keys that you created). The AWS managed KMS key (the key that was created by AWS automatically) cannot be shared.

Method 1

The KMS Key ID is required.

  1. Create KMS key
  2. Go to “Other AWS accounts” in Key settings and click to “Add other AWS account”
  1. Paste Improvado’s account id (716309063777)
  2. Save changes
Method 2

The KMS Key ID is required.

  1. Create KMS key
  2. In “Key policy” click “Switch to policy view”
  1. Click “Edit”
  1. Paste policy example that presented below
  2. Change example-region-1 to real key region
  3. Change 123456789098 to your real account id
  4. Change 111aa2bb-333c-4d44-5555-a111bb2c33dd to real key id
  5. Save changes
Policy example:

{% code-block language="json" %}
{
   "Version": "2012-10-17",
   "Statement":
    [
       {
           "Sid": "Enable IAM User Permissions",
           "Effect": "Allow",
           "Principal": {
               "AWS": "arn:aws:iam::123456789098:root"
           },
           "Action": "kms:*",
           "Resource": "*"
       }
       {
           "Sid": "Allow use of the key",
           "Effect": "Allow",
           "Principal": {
               "AWS": "arn:aws:iam::716309063777:root"
           },
           "Action": [
               "kms:Encrypt",
               "kms:Decrypt",
               "kms:ReEncrypt*",
               "kms:GenerateDataKey*",
               "kms:DescribeKey"
           ],
           "Resource": "arn:aws:kms:example-region-1:123456789098:key/111aa2bb-333c-4d44-5555-a111bb2c33dd"
       }
   ]
}
{% code-block-end %}

3rd option

SSE-S3 encryption type can be used.

Share access to Improvado account using our Canonical ID:

  • 5f2dfe7db1abc67daeea58fea77fb0c399ed924a2f88cf36d28738b7e1c838ef

Required information:

  • Bucket Name
  • Region
Instruction
  1. Create bucket
  2. Open bucket permissions
  1. Click to Edit Access control list (ACL)
  1. Click Add grantee button in the Access for other AWS accounts
  2. Paste 5f2dfe7db1abc67daeea58fea77fb0c399ed924a2f88cf36d28738b7e1c838ef canonical ID
  3. Set List and Write permissions for Objects
  1. Click Save changes button

If you used 2nd or 3rd options - notify our support or CSM about it and we will create specific users to load data and provide support.

Related articles
No items found.
No items found.