Documentation

AWS S3

Updated on

Sep 5, 2023

IMPORTANT: This article covers setup of warehouse for load data from Improvado, not customer data warehouse from which data is being extracted. This article doesn't cover setup of customer data warehouse for Data Prep as well.

Required information

  • Title
  • AWS Access Key ID
  • AWS Secret Access Key
  • AWS Region
  • Bucket Name
  • ~Bucket Name can only contain letters, numbers, dashes, and dots and must start and end with a letter or a number
  • ~Bucket Name length must be between 3 and 63 characters
  • Folder
  • ~A forward slash ( / ) means that you need to enter the root path
  • File format
  • File name
  • Separator
  • Partition by (the way of splitting data for uploading to the file)
  • Encryption
  • Encryption Key
  • ~Only for KMS with customer managed key encryption
  • Use load by account
  • ~If enabled, File name field must include ```{{account}}``` variable.
    You must enable this field if you want to use a specific account for data load.

Folder

Possible parameters:

```/{{workspace_id}}/{{workspace_title}}/{{data_source}}/{{data_table_title}}/{{report_type}}/{{YYYY}}/{{MM}}/{{DD}}/{{timestamp}}```

  • ```{{workspace_id}}``` and ```{{workspace_title}}``` are optional parameters that provide additional information about the workspace used for a destination connection
  • ```{{data_source}}``` is a data provider, integration, connector
  • ```{{data_table_title}}``` is an object that contains all extraction orders with the same granularity (dimensional schema)
  • ```{{report_type}}``` is a set of such fields as metrics, properties, dimensions, etc.
  • ```{{timestamp}}``` is the date and time when data load started

If you use ```/{{YYYY}}/{{MM}}/{{DD}}``` settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.

Data structure of S3 storage by Improvado

By request to the support team we are able to support different folder structure in a bucket.

File format

Possible formats:

  • csv
  • csv+gzip
  • json
  • json+gzip
  • parquet
  • avro

File name

Possible parameters:

```{{filename}}-{{account}}-{{YYYY}}-{{MM}}-{{DD}}```

  • ```{{filename}}``` - is the same as destination table name
  • ```{{account}}``` - is an optional parameter that allows you to to add specific account for the data load.
  • ~You must enable Use load by account field to add this parameter to the File name

IMPORTANT: you cannot use {{DD}} for partition by month

  • ```{{filename}}-{{YYYY}}-{{MM}}-{{DD}}``` – for partition by day
  • ```{{filename}}-{{YYYY}}-{{MM}}``` – for partition by month

Also, you can use “_” instead of “-” or do not use any symbols at all, for example:

  • ```{{filename}}_{{YYYY}}-{{MM}}-{{DD}}```
  • ```{{filename}}{{YYYY}}{{MM}}{{DD}}```

Separator

Possible delimiters that can separate data in your file:

  • comma
  • semicolon
  • tab

Partition by

Possible ways of splitting data:

  • Day
  • Month

Encryption

  • No storage encryption - not recommended unless the S3 bucket has default encryption
  • SSE-S3 - recommended server-side encryption
  • SSE-KMS - make sure to give the required permissions for AWS managed key
  • SSE-KMS with customer managed key - make sure to give the required permissions for an encryption key

How to provide credentials (3 options)

1st option

SSE-S3 and SSE-KMS (with AWS managed and customer managed keys) encryption types can be used.

Create a user in the your AWS account for Improvado and provides following information:

  • Bucket Name
  • Access Key Id
  • Secret Access Key
  • Region

Created user should have following permissions for the S3 bucket:

  • ```s3:GetObject```
  • ```s3:PutObject```
  • ```s3:ListBucket```
  • ```s3:DeleteObject```

2nd option

SSE-S3 and SSE-KMS (with customer managed keys) encryption types can be used.

Share Read and Write access to the S3 bucket with Improvado’s AWS account id:```716309063777```.
Required information:

  • Bucket Name
  • Region
Instruction
  1. Create bucket
  2. Open bucket permissions
  1. Click to edit bucket policy
  1. Paste policy example that presented below
  2. Change your-bucket-name to your real bucket name
  3. Save changes
Policy example:
{
  "Id": "Policy1569503459134",
  "Version": "2012-10-17",
  "Statement": [
      {
          "Sid": "S3Access",
          "Action": [
              "s3:GetObject*",
              "s3:DeleteObject",
              "s3:ListBucket*",
              "s3:PutObject*",
              "s3:ListBucketMultipartUploads",
              "s3:ListMultipartUploadParts",
              "s3:AbortMultipartUpload"
          ],
          "Effect": "Allow",
          "Resource": [
              "arn:aws:s3:::your-bucket-name",
              "arn:aws:s3:::your-bucket-name/*"
          ],
          "Principal": {
              "AWS": [
                  "716309063777"
              ]
          }
      }
  ]
}

For SSE-KMS you have to share access to your KMS key using one of the methods below.

Note that AWS allows to share only customer managed KMS keys (keys that you created). The AWS managed KMS key (the key that was created by AWS automatically) cannot be shared.

Method 1

The KMS Key ID is required.

  1. Create KMS key
  2. Go to “Other AWS accounts” in Key settings and click to “Add other AWS account”
  1. Paste Improvado’s AWS account id:```716309063777```.
  2. Save changes
Method 2

The KMS Key ID is required.

  1. Create KMS key
  2. In “Key policy” click “Switch to policy view”
  1. Click “Edit”
  1. Paste policy example that presented below
  2. Change ```example-region-1``` to real key region
  3. Change ```123456789098``` to your real account id
  4. Change ```111aa2bb-333c-4d44-5555-a111bb2c33dd``` to real key id
  5. Save changes
Policy example:
{
    "Version": "2012-10-17",
    "Statement":
    [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789098:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        }
        {
            "Sid": "Allow use of the key",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::716309063777:root"
            },
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
            ],
            "Resource": "arn:aws:kms:example-region-1:123456789098:key/111aa2bb-333c-4d44-5555-a111bb2c33dd"
        }
    ]
}

3rd option

SSE-S3 encryption type can be used.

Share access to Improvado account using our Canonical ID:

  • ```5f2dfe7db1abc67daeea58fea77fb0c399ed924a2f88cf36d28738b7e1c838ef```

Required information:

  • Bucket Name
  • Region
Instruction
  1. Create bucket
  2. Open bucket permissions
  1. Click to Edit Access control list (ACL)
  1. Click Add grantee button in the Access for other AWS accounts
  2. Paste ```5f2dfe7db1abc67daeea58fea77fb0c399ed924a2f88cf36d28738b7e1c838ef``` canonical ID
  3. Set List and Write permissions for Objects
  1. Click Save changes button

If you used 2nd or 3rd options - notify our support or CSM about it and we will create specific users to load data and provide support.

Schema information

Setup guide

Settings

No items found.

Troubleshooting

Troubleshooting guides

Check out troubleshooting guides for
AWS S3
here

Limits

Frequently asked questions

No items found.
☶ On this page
Description
Related articles
No items found.
No items found.

Questions?

Improvado team is always happy to help with any other questions you might have! Send us an email.

Contact your Customer Success Manager or raise a request in Improvado Service Desk.