IMPORTANT: This article covers setup of warehouse for load data from Improvado, not customer data warehouse from which data is being extracted. This article doesn't cover setup of customer data warehouse for Data Prep as well.
Required information
- Title
- Bucket Name
- ~Preferred bucket for GCS uploading
- ~Bucket Name can only contain letters, numbers, dots and underscores and must start and end with a letter or number
- ~Bucket Name length must be between 3 and 222 characters
- Authorization JSON Data
- ~To work with your GCS storage Improvado needs only a JSON-file with your GCS credentials
- Filename
- File format
- Separator (optional)
- ~The maximum length of the separator is 2 characters
- GCS Region
- Partition by
- Encryption
- Encryption Key (optional)
- Root Name for GCS uploading (optional)
- ~Root Name can only contain letters and numbers and have between 1 and 64 characters in length
File name
Possible parameters:
- {{filename}}-{{YYYY}}-{{MM}}-{{DD}}
- ~{ filename } is the same as destination table name
IMPORTANT: you cannot use {{ DD }} for partition by month
- ~{{filename}}-{{YYYY}}-{{MM}}-{{DD}} – for partition by day
- ~{{filename}}-{{YYYY}}-{{MM}} – for partition by month
Also, you can use “_” instead of “-” or do not use any symbols at all, for example:
- {{filename}}_{{YYYY}}-{{MM}}-{{DD}}
- {{filename}}{{YYYY}}{{MM}}{{DD}}
File format
Possible formats:
- csv
- csv+gzip
- json
- json+gzip
- parquet
Partition by
Possible ways of splitting data:
Encryption
Possible options:
- Default Cloud Storage encryption
- Customer-managed encryption keys
- Customer-supplied encryption keys
Encryption Key
If you have selected the Default Cloud Storage encryption type, you will not be able to edit this field, the default value is stub.
Otherwise, you should enter your AES-256 key, encoded in standard Base64 or resource name of Cloud KMS key used to encrypt the blob’s contents. For more info, see Google Cloud Storage encryption docs.
Root Name
Possible parameters:
- /{{ data_source }}/{{ data_table_title }}/{{report_type}}/{{ YYYY }}/{{ MM }}/{{ DD }}
- ~{ data_source } is a data provider, integration, connector
- ~{ data_table } is an object that contains all extraction orders with the same granularity (dimensional schema)
- ~{report_type} is a set of such fields as metrics, properties, dimensions, etc.
If you use /{YYYY}/{MM}/{DD} settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.
By request to the support team we are able to support different root structure in a bucket.
How to connect
You should share access to his google cloud storage bucket to our account: improvado-gcs-loader@green-post-223109.iam.gserviceaccount.com with role at GCS bucket: Storage Object Admin
More info here