Documentation

Google Cloud Storage

IMPORTANT: This article covers setup of warehouse for load data from Improvado, not customer data warehouse from which data is being extracted. This article doesn't cover setup of customer data warehouse for Data Prep as well.

Required information

  • Title
  • Bucket Name
  • ~Preferred bucket for GCS uploading
  • ~Bucket Name can only contain letters, numbers, dots and underscores and must start and end with a letter or number
  • ~Bucket Name length must be between 3 and 222 characters
  • Authorization JSON Data
  • ~To work with your GCS storage Improvado needs only a JSON-file with your GCS credentials
  • Filename
  • File format
  • Separator (optional)
  • ~The maximum length of the separator is 2 characters
  • GCS Region
  • Partition by
  • Encryption
  • Encryption Key (optional)
  • Root Name for GCS uploading (optional)
  • ~Root Name can only contain letters and numbers and have between 1 and 64 characters in length

File name

Possible parameters:

  • {{filename}}-{{YYYY}}-{{MM}}-{{DD}}
  • ~{ filename } is the same as destination table name

IMPORTANT: you cannot use {{ DD }} for partition by month

  • ~{{filename}}-{{YYYY}}-{{MM}}-{{DD}} – for partition by day
  • ~{{filename}}-{{YYYY}}-{{MM}} – for partition by month

Also, you can use “_” instead of “-” or do not use any symbols at all, for example:

  • {{filename}}_{{YYYY}}-{{MM}}-{{DD}}
  • {{filename}}{{YYYY}}{{MM}}{{DD}}

File format

Possible formats:

  • csv
  • csv+gzip
  • json
  • json+gzip
  • parquet

Partition by

Possible ways of splitting data:

  • Day (default value)
  • Month

Encryption

Possible options:

  • Default Cloud Storage encryption
  • Customer-managed encryption keys
  • Customer-supplied encryption keys

Encryption Key

If you have selected the Default Cloud Storage encryption type, you will not be able to edit this field, the default value is stub.

Otherwise, you should enter your AES-256 key, encoded in standard Base64 or resource name of Cloud KMS key used to encrypt the blob’s contents. For more info, see Google Cloud Storage encryption docs.

Root Name

Possible parameters:

  • /{{ data_source }}/{{ data_table_title }}/{{report_type}}/{{ YYYY }}/{{ MM }}/{{ DD }}
  • ~{ data_source } is a data provider, integration, connector
  • ~{ data_table } is an object that contains all extraction orders with the same granularity (dimensional schema)
  • ~{report_type} is a set of such fields as metrics, properties, dimensions, etc.

If you use /{YYYY}/{MM}/{DD} settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.

By request to the support team we are able to support different root structure in a bucket.

How to connect

You should share access to his google cloud storage bucket to our account: improvado-gcs-loader@green-post-223109.iam.gserviceaccount.com with role at GCS bucket: Storage Object Admin

More info here

Related articles
No items found.
No items found.