Note: This article covers AWS S3 Destination setup process. This guide doesn’t cover the DataPrep setup for AWS S3.
You can learn how to extract data from AWS S3 Data sourcehere.
Description
AWS S3 is an object storage service that offers companies industry-leading scalability, data availability, security, and performance. This means that companies of all sizes can use it to store and protect data for a range of use cases, no matter the amount of data available.
Setup guide
Follow our setup guide to connect AWS S3 to Improvado.
Choose an encryption option
Choose a server-side encryption option:
SSE-S3 (recommended) - server-side encryption with AWS S3 managed keys. Learn more.
SSE-KMS - server-side encryption with AWS KMS keys. Learn more. Make sure to give the required permissions for the AWS-managed key.
SSE-KMS with customer-managed key - server-side encryption with customer-managed AWS KMS keys. Learn more. Make sure to give the required permissions for an encryption key.
No storage encryption - not recommended unless the AWS S3 bucket has default encryption.
Important: Make sure to provide your AWS S3 bucket information by following our guide.
Permissions
Enable the following permissions for your AWS S3 bucket:
Select the Partition by option from the dropdown. {%dropdown-button name="partition-by"%}
{%dropdown-body name="partition-by"%}
Partition is the way of splitting data for uploading to the file.
{%dropdown-end%}
Select the Encryption option from the dropdown. Learn more about all available encryption options here.
(SSE-KMS with customer-managed key) Enter the Encryption Key.
Select whether you want to Use load by account for this Destination. {%dropdown-button name="use-load-by-account"%}
{%dropdown-body name="use-load-by-account"%}
If enabled, the File name field must include ```{{account}}``` variable. You must enable this field if you want to use a specific account for data load.
{%dropdown-end%}
How to provide credentials to Improvado
There are four different ways to provide your credentials that you can choose depending on your security requirements andthe type of selected Server-Side Encryption:
Option #2 (Share Read and Write access with Improvado’s AWS account)
{%docs-informer info%}
Available for SSE-S3 and SSE-KMS (with customer-managed keys) only.
{%docs-informer-end%}
Important: If you’re gonna use this option - notify our Support or your CSM about it and we will create specific users to load data and provide support. We’ll create a Destination connection for you.
Share Read and Write access with Improvado’s AWS Account ID:
Create an AWS S3 bucket.
Select the Permissions tab on the Bucket Settings page.
In the Bucket policy, click the Edit button.
Copy & paste the Policy example below.
~Change ```your-bucket-name```to your real Bucket name.
Note: AWS allows sharing only customer-managed KMS keys (keys that you created). The AWS-managed KMS key (the key that was created by AWS automatically) cannot be shared.
Method 1 (Add Improvado’s AWS account to KMS Key settings)
The KMS Key ID is required.
Create a KMS key.
Open the Key settings.
Go to Other AWS accounts and click the Add other AWS account button.
Provide your AWS S3 bucket information to Improvado (Option #2)
Provide our Support Team or your CSM with the following information:
Bucket Name
AWS Region
KMS Key ID
Our team will create specific users to load data and provide support.
Option #3 (Share access with Improvado account using our Canonical ID)
{%docs-informer info%}
Available for SSE-S3 only.
{%docs-informer-end%}
Important: If you’re gonna use this option - notify our Support or your CSM about it and we will create specific users to load data and provide support. We’ll create a Destination connection for you.
Share access with the Improvado account using our Canonical ID:
Create an AWS S3 bucket.
Select the Permissions tab on the Bucket Settings page.
In the Access control list (ACL), click the Edit button.
Click the Add grantee button in the Access for other AWS accounts section.
Paste ```5f2dfe7db1abc67daeea58fea77fb0c399ed924a2f88cf36d28738b7e1c838ef``` Canonical ID to the Grantee field.
~~replace ```your-bucket-name``` with your S3 bucket name.
~~you can allow to put/get/delete objects in some prefix (path or directory) of the S3 bucket instead of the whole S3 bucket. In this case the Resource format in the ```"RWPolicy"``` will be ```"arn:aws:s3:::your-bucket-name/path_prefix/*"```.
~~replace the ```region``` with your KMS key Region (i.e. ```us-east-1```).
~~replace the 123456789 with your AWS Account ID.
~~replace the 111aa2bb-2c33dd with your KMS Key ID.
You notify Improvado with your role ARN. Improvado makes configuration adjustments.
You create a new connection on the Destinations page, and set up data loads.
~choose the Connection Option “Provide access via Cross-Account AWS IAM Role Chaining”.
~provide your role ARN in the “Assume Role ARN” field.
```{{workspace_id}}``` and ```{{workspace_title}}``` are optional parameters that provide additional information about the workspace used for a destination connection
```{{data_source}}``` is a data provider, integration, connector.
```{{data_table_title}}``` is an object that contains all extraction orders with the same granularity (dimensional schema).
```{{report_type}}``` is a set of such fields as metrics, properties, dimensions, etc.
```{{account}}``` - is an optional parameter that allows you to to add specific account for the data load.
~ You must enable Use load by account field to add this parameter to the File name.
If you use ```/{{YYYY}}/{{MM}}/{{DD}}``` settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.
You can use ```DD_today``` instead of ```DD``` to use today’s date in the folder name. E.g., ```/{{workspace_id}}/{{workspace_title}}/{{data_source}}/{{report_type}}/{{YYYY}}/{{MM}}/{{DD_today}}``` will be resolved to ```/ws1/main_group/ds1/rt1/2024/06/18```
```{{filename}}``` - is the same as destination table name
```{{account}}``` - is an optional parameter that allows you to to add specific account for the data load.
~You must enable Use load by account field to add this parameter to the File name
IMPORTANT: you cannot use ```{{DD}}``` for partition by month
```{{filename}}-{{YYYY}}-{{MM}}-{{DD}}``` – for partition by day
You can use ```DD_today``` instead of ```DD```, to use today’s date in the final file name. E.g., ```{{ filename }}-{{ YYYY }}{{ MM }}{{ DD_today }}T{{ H }}{{ M }}{{ S }}``` will be resolved to ```some_name-2024-06-20T121517```
```{{filename}}-{{YYYY}}-{{MM}}``` – for partition by month
Also, you can use “_” instead of “-” or do not use any symbols at all, for example:
```{{filename}}_{{YYYY}}-{{MM}}-{{DD}}```
```{{filename}}{{YYYY}}{{MM}}{{DD}}```
Separator
Possible delimiters that can separate data in your file: