Documentation

AWS S3 Destination

Updated on

Sep 23, 2024

Note: This article covers AWS S3 Destination setup process. This guide doesn’t cover the DataPrep setup for AWS S3.

You can learn how to extract data from AWS S3 Data source here.

Description

AWS S3 is an object storage service that offers companies industry-leading scalability, data availability, security, and performance. This means that companies of all sizes can use it to store and protect data for a range of use cases, no matter the amount of data available.

Setup guide

Follow our setup guide to connect AWS S3 to Improvado.

Choose an encryption option

Choose a server-side encryption option:

  • SSE-S3 (recommended) - server-side encryption with AWS S3 managed keys. Learn more.
  • SSE-KMS - server-side encryption with AWS KMS keys. Learn more.
    Make sure to give the required permissions for the AWS-managed key.
  • SSE-KMS with customer-managed key - server-side encryption with customer-managed AWS KMS keys. Learn more.
    Make sure to give the required permissions for an encryption key.
  • No storage encryption - not recommended unless the AWS S3 bucket has default encryption.

Important: Make sure to provide your AWS S3 bucket information by following our guide.

Permissions

Enable the following permissions for your AWS S3 bucket:

  • ```s3:GetObject```
  • ```s3:PutObject```
  • ```s3:ListBucket```
  • ```s3:DeleteObject```

Complete configuration

On the AWS S3 connection page, fill in the following fields:

  1. Enter a name for your Destination connection in the Title.
  2. Choose the Connection Option.
  3. Enter the AWS Access Key ID (only for Options #1, #2 and #3).
  4. Enter the AWS Secret Access Key (only for Options #1, #2 and #3).
  5. Enter the Assume Role ARN (only for the Option #4).
  6. Enter the AWS Region.
  7. Enter the Bucket Name. {%dropdown-button name="bucket-name"%}

{%dropdown-body name="bucket-name"%}

  • Bucket Name can only contain letters, numbers, dashes, and dots and must start and end with a letter or a number.
  • Bucket Name length must be between 3 and 63 characters.

{%dropdown-end%}

  1. Enter the Folder. {%dropdown-button name="folder"%}

{%dropdown-body name="folder"%}

A forward slash (```/``` ) means that you need to enter the root path.

{%dropdown-end%}

  1. Select the File format option from the dropdown.
  2. Enter the File name.
  3. Select the Separator option from the dropdown.
  4. Select the Partition by option from the dropdown. {%dropdown-button name="partition-by"%}

{%dropdown-body name="partition-by"%}

Partition is the way of splitting data for uploading to the file.

{%dropdown-end%}

  1. Select the Encryption option from the dropdown. Learn more about all available encryption options here.
  2. (SSE-KMS with customer-managed key) Enter the Encryption Key.
  3. Select whether you want to Use load by account for this Destination. {%dropdown-button name="use-load-by-account"%}

{%dropdown-body name="use-load-by-account"%}

If enabled, the File name field must include ```{{account}}``` variable.
You must enable this field if you want to use a specific account for data load.

{%dropdown-end%}

How to provide credentials to Improvado

There are four different ways to provide your credentials that you can choose depending on your security requirements and the type of selected Server-Side Encryption:

Option #1 (Create a user in your AWS account for Improvado)

{%docs-informer info%}

Available for SSE-S3 and SSE-KMS (with AWS-managed and customer-managed keys) only.

{%docs-informer-end%}

Create a user in your AWS account for Improvado and enter the following information in Complete configuration:

  • Bucket Name
  • AWS Access Key ID
  • AWS Secret Access Key
  • AWS Region

Make sure to enable the following permissions for your AWS S3 bucket.

Option #2 (Share Read and Write access with Improvado’s AWS account)

{%docs-informer info%}

Available for SSE-S3 and SSE-KMS (with customer-managed keys) only.

{%docs-informer-end%}

Important: If you’re gonna use this option - notify our Support or your CSM about it and we will create specific users to load data and provide support. We’ll create a Destination connection for you.

Share Read and Write access with Improvado’s AWS Account ID:

  1. Create an AWS S3 bucket.
  2. Select the Permissions tab on the Bucket Settings page.
  3. In the Bucket policy, click the Edit button.
  1. Copy & paste the Policy example below.
  2. ~Change ```your-bucket-name``` to your real Bucket name.
{
  "Id": "Policy1569503459134",
  "Version": "2012-10-17",
  "Statement": [
      {
          "Sid": "S3Access",
          "Action": [
              "s3:GetObject*",
              "s3:DeleteObject",
              "s3:ListBucket*",
              "s3:PutObject*",
              "s3:ListBucketMultipartUploads",
              "s3:ListMultipartUploadParts",
              "s3:AbortMultipartUpload"
          ],
          "Effect": "Allow",
          "Resource": [
              "arn:aws:s3:::your-bucket-name",
              "arn:aws:s3:::your-bucket-name/*"
          ],
          "Principal": {
              "AWS": [
                  "716309063777"
              ]
          }
      }
  ]
}

  1. Save changes.
  2. If you use SSE-S3, then Provide your AWS S3 bucket information to Improvado (Option #2).

{%docs-informer info title="Important"%}

If you use SSE-KMS, you have to share access to your KMS key using one of the methods below:

{%docs-informer-end%}

Note: AWS allows sharing only customer-managed KMS keys (keys that you created). The AWS-managed KMS key (the key that was created by AWS automatically) cannot be shared.

Method 1 (Add Improvado’s AWS account to KMS Key settings)

The KMS Key ID is required.

  1. Create a KMS key.
  2. Open the Key settings.
  3. Go to Other AWS accounts and click the Add other AWS account button.
  1. Paste Improvado’s AWS account ID: ```716309063777```.
  1. Save changes.
  2. Provide your AWS S3 bucket information to Improvado (Option #2)

Method 2 (Add Improvado’s AWS account ID to your KMS Key policy)

The KMS Key ID is required.

  1. Create a KMS key.
  2. Open the Key settings.
  3. In the Key policy tab, click the Switch to policy view button.
  1. Click the Edit button.
  1. Copy & paste the Policy example below.
  2. ~Change ```example-region-1``` to real key Region.
  3. ~Change ```123456789098``` to your real Account ID.
  4. ~Change ```111aa2bb-333c-4d44-5555-a111bb2c33dd``` to real Key ID.
{
    "Version": "2012-10-17",
    "Statement":
    [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789098:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        }
        {
            "Sid": "Allow use of the key",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::716309063777:root"
            },
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
            ],
            "Resource": "arn:aws:kms:example-region-1:123456789098:key/111aa2bb-333c-4d44-5555-a111bb2c33dd"
        }
    ]
}

  1. Save changes.
  2. Provide your AWS S3 bucket information to Improvado (Option #2)
Provide your AWS S3 bucket information to Improvado (Option #2)

Provide our Support Team or your CSM with the following information:

  • Bucket Name
  • AWS Region
  • KMS Key ID

Our team will create specific users to load data and provide support.

Option #3 (Share access with Improvado account using our Canonical ID)

{%docs-informer info%}

Available for SSE-S3 only.

{%docs-informer-end%}

Important: If you’re gonna use this option - notify our Support or your CSM about it and we will create specific users to load data and provide support. We’ll create a Destination connection for you.

Share access with the Improvado account using our Canonical ID:

  1. Create an AWS S3 bucket.
  2. Select the Permissions tab on the Bucket Settings page.
  1. In the Access control list (ACL), click the Edit button.
  1. Click the Add grantee button in the Access for other AWS accounts section.
  2. Paste ```5f2dfe7db1abc67daeea58fea77fb0c399ed924a2f88cf36d28738b7e1c838ef``` Canonical ID to the Grantee field.
  3. Set List and Write permissions for Objects.
  1. Click the Save changes button.
  2. Provide your AWS S3 bucket information to Improvado (Option #3)
Provide your AWS S3 bucket information to Improvado (Option #3)

Provide our Support Team or your CSM with the following information:

  • Bucket Name
  • AWS Region

Our team will create specific users to load data and provide support.

Option #4 (Provide access via Cross-Account AWS IAM Role Chaining)

This option uses the chain of AWS IAM roles: Improvado role assumes the customer’s role, which has access to S3 and (optionally) to KMS.

Supported encryption types:

  • SSE-S3 - server-side encryption with AWS S3 managed keys.
  • SSE-KMS - server-side encryption with AWS-managed KMS keys.
  • SSE-KMS with customer-managed key - server-side encryption with customer-managed AWS KMS keys.
  • No storage encryption - not recommended unless the AWS S3 bucket has default encryption.

Implementation steps:

  1. You notify our Support or your CSM.
  2. ~Improvado creates a specific AWS IAM role for you and sends you the role ARN.
  3. ~The Improvado role ARN will have such format: ```arn:aws:iam::<Improvado AWS account id>:role/uls_managed/file_sender_<some string>_<some number>```.
  4. You create an AWS S3 bucket and KMS Key (if needed).
  5. You create a role in the AWS IAM and set its Trust policy and Permissions.
  6. ~The role should have such a Trust policy (Assume role policy):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "Improvado role ARN"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

  1. ~~replace the ```Improvado role ARN``` with the value provided by Improvado.
  2. ~The role should have read and write permissions on the S3 bucket and (optionally if you use KMS encryption) usage permissions on the KMS key:
{
    "Statement": [
        {
            "Sid": "RWPolicy",
            "Action": [
                "s3:GetObject*",
                "s3:DeleteObject",
                "s3:PutObject*"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::your-bucket-name/*"
        },
        {
            "Sid": "ListPolicy",
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::your-bucket-name"
        },
        {
            "Sid": "OptionalKMSUsagePolicy",
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:kms:region:123456789:key/111aa2bb-2c33dd"
        }
    ],
    "Version": "2012-10-17"
}

  1. ~~replace ```your-bucket-name``` with your S3 bucket name.
  2. ~~you can allow to put/get/delete objects in some prefix (path or directory) of the S3 bucket instead of the whole S3 bucket. In this case the Resource format in the ```"RWPolicy"``` will be ```"arn:aws:s3:::your-bucket-name/path_prefix/*"```.
  3. ~~replace the ```region``` with your KMS key Region (i.e. ```us-east-1```).
  4. ~~replace the 123456789 with your AWS Account ID.
  5. ~~replace the 111aa2bb-2c33dd with your KMS Key ID.
  6. You notify Improvado with your role ARN. Improvado makes configuration adjustments.
  7. You create a new connection on the Destinations page, and set up data loads.
  8. ~choose the Connection Option “Provide access via Cross-Account AWS IAM Role Chaining”.
  9. ~provide your role ARN in the “Assume Role ARN” field.

Additional information

Folder

Possible parameters:

```/{{workspace_id}}/{{workspace_title}}/{{data_source}}/{{data_table_title}}/{{report_type}}/{{filename}}/{{account}}/{{YYYY}}/{{MM}}/{{DD}}```

  • ```{{workspace_id}}``` and ```{{workspace_title}}``` are optional parameters that provide additional information about the workspace used for a destination connection
  • ```{{data_source}}``` is a data provider, integration, connector.
  • ```{{data_table_title}}``` is an object that contains all extraction orders with the same granularity (dimensional schema).
  • ```{{report_type}}``` is a set of such fields as metrics, properties, dimensions, etc.
  • ```{{account}}``` - is an optional parameter that allows you to to add specific account for the data load.
  • ~ You must enable Use load by account field to add this parameter to the File name.
  • If you use ```/{{YYYY}}/{{MM}}/{{DD}}``` settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.
    • You can use ```DD_today``` instead of ```DD``` to use today’s date in the folder name. E.g., ```/{{workspace_id}}/{{workspace_title}}/{{data_source}}/{{report_type}}/{{YYYY}}/{{MM}}/{{DD_today}}``` will be resolved to ```/ws1/main_group/ds1/rt1/2024/06/18```

Data structure of S3 storage by Improvado. You can create a request to the Support Team to add support for different folder structures in a bucket.

File format

Possible formats:

  • csv
  • csv+gzip
  • json
  • json+gzip
  • parquet
  • avro

File name

Possible parameters:

```{{filename}}-{{account}}-{{YYYY}}-{{MM}}-{{DD}}```

  • ```{{filename}}``` - is the same as destination table name
  • ```{{account}}``` - is an optional parameter that allows you to to add specific account for the data load.
  • ~You must enable Use load by account field to add this parameter to the File name

IMPORTANT: you cannot use ```{{DD}}``` for partition by month

  • ```{{filename}}-{{YYYY}}-{{MM}}-{{DD}}``` – for partition by day
    • You can use ```DD_today``` instead of ```DD```, to use today’s date in the final file name. E.g., ```{{ filename }}-{{ YYYY }}{{ MM }}{{ DD_today }}T{{ H }}{{ M }}{{ S }}``` will be resolved to ```some_name-2024-06-20T121517```
  • ```{{filename}}-{{YYYY}}-{{MM}}``` – for partition by month

Also, you can use “_” instead of “-” or do not use any symbols at all, for example:

  • ```{{filename}}_{{YYYY}}-{{MM}}-{{DD}}```
  • ```{{filename}}{{YYYY}}{{MM}}{{DD}}```

Separator

Possible delimiters that can separate data in your file:

  • comma
  • semicolon
  • tab

Partition by

Possible ways of splitting data:

  • Day
  • Month

Schema information

Setup guide

Settings

No items found.

Troubleshooting

Troubleshooting guides

Check out troubleshooting guides for
AWS S3 Destination
here:

Limits

Frequently asked questions

No items found.
☶ On this page
Description
Related articles
No items found.
No items found.

Questions?

Improvado team is always happy to help with any other questions you might have! Send us an email.

Contact your Customer Success Manager or raise a request in Improvado Service Desk.