Documentation

Google Cloud Storage

Updated on

Apr 17, 2024

Note: Google Cloud Storage is currently supported only as a Destination. This guide doesn’t cover the DataPrep setup for GCS.

Description

Google Cloud Storage is a highly available and durable object storage service offered by Google Cloud Platform, designed to store and access large, unstructured data sets with high reliability, scalability, and performance.

Schema information

Setup guide

Follow our setup guide to connect Google Cloud Storage to Improvado.

Generate a Service Account Key JSON file

In order to use Service Account Key authentication, first, you need to generate a JSON file via Google Cloud Console using official documentation or an interactive step-by-step guide provided by Google.

Alternatively, you can follow the instructions below:

  1. In Google Cloud Console, go to IAM & AdminService Accounts.
  1. Click on the Actions button for your Service account and select Manage keys.
  1. In the KEYS tab section, click ADD KEYCreate new key. Choose JSON as a key type and click Create.
  1. In the downloaded JSON file, copy your Project ID.
{
  "type": "service_account",
  "project_id": "{{PROJECT_ID}}",
  "private_key_id": "{{KEY_ID}}",
  "private_key": "-----BEGIN PRIVATE KEY-----\n{{PRIVATE_KEY}}\n-----END PRIVATE KEY-----\n",
  "client_email": "SERVICE_ACCOUNT_EMAIL",
  "client_id": "{{CLIENT_ID}}",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/{{SERVICE_ACCOUNT_EMAIL}}"
}

Complete configuration

On the Google Cloud Storage connection page, fill in the following fields:

  1. Enter a name for your Destination connection in the Title.
  2. Enter the Bucket Name. {%dropdown-button name="bucket-name"%}

{%dropdown-body name="bucket-name"%}

Preferred bucket for GCS uploading.

Bucket Name can only contain letters, numbers, dots, and underscores and must start and end with a letter or number.

Bucket Name length must be between 3 and 222 characters.

{%dropdown-end%}

  1. Enter the Filename. {%dropdown-button name="filename"%}

{%dropdown-body name="filename"%}

Possible parameters:

```{{filename}}-{{dataclass}}-{{YYYY}}-{{MM}}-{{DD}}```

  • ```{{filename}}``` is the same as the destination table name
  • ```{{dataclass}}``` - is an optional parameter that describes how the data will be updated in the destination.
    • Possible data class values:
      • ```daily```
      • ```monthly```
      • ```weekly```
      • ```last_day```
      • ```last_day_incremental```
      • ```unknown```
  • Note: you cannot use ```{{DD}}``` for partition by month.
    • ```{{filename}}-{{YYYY}}-{{MM}}-{{DD}}``` – for partition by day
    • ```{{filename}}-{{YYYY}}-{{MM}}``` – for partition by month
  • Also, you can use “_” instead of “-” or do not use any symbols at all, for example:
    • ```{{filename}}_{{YYYY}}-{{MM}}-{{DD}}```
    • ```{{filename}}{{YYYY}}{{MM}}{{DD}}```

{%dropdown-end%}

  1. Select the necessary File Format option from the dropdown. {%dropdown-button name="file-format"%}

{%dropdown-body name="file-format"%}

Possible formats:

  • csv
  • csv+gzip
  • json
  • json+gzip
  • parquet
  • avro

{%dropdown-end%}

  1. Select the necessary Separator option from the dropdown. {%dropdown-button name="separator"%}

{%dropdown-body name="separator"%}

Possible delimiters that can separate data in your file:

  • comma
  • semicolon
  • tab

{%dropdown-end%}

  1. Select the necessary GCS Region option from the dropdown.
  2. Select the necessary Partition by option from the dropdown. {%dropdown-button name="partition-by"%}

{%dropdown-body name="partition-by"%}

Possible ways of splitting data:

  • Day (default value)
  • Month

{%dropdown-end%}

  1. Select the necessary Encryption option from the dropdown. {%dropdown-button name="encryption"%}

{%dropdown-body name="encryption"%}

Possible options:

  • Default Cloud Storage encryption;
  • Customer-managed encryption keys;
  • Customer-supplied encryption keys.

{%dropdown-end%}

  1. (Optional) Enter the Root Name. {%dropdown-button name="root-name"%}

{%dropdown-body name="root-name"%}

Possible parameters:

```/{{data_source}}/{{data_table_title}}/{{report_type}}/{{YYYY}}/{{MM}}/{{DD}}/{{timestamp}}```

  • ```{{data_source}}``` is a data provider, integration, connector
  • ```{{data_table_title}}``` is an object that contains all extraction orders with the same granularity (dimensional schema)
  • ```{{report_type}}``` is a set of such fields as metrics, properties, dimensions, etc.
  • ```{{timestamp}}``` is the date and time when the data load started

If you use ```/{{YYYY}}/{{MM}}/{{DD}}``` settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date. By request to the support team, we are able to support different root structures in a bucket.

{%dropdown-end%}

  1. Select the necessary Use static IP option from the dropdown. {%dropdown-button name="use-static-ip"%}

{%dropdown-body name="use-static-ip"%}

Select Yes for Use static IP option if you allow Improvado to connect your database by the static IPs mentioned on the Destination connection page.

Select No if you have permitted access to your database from any IP. In this case, Improvado will connect your database using dynamic IPs not listed on the Destination connection page.

{%dropdown-end%}

  1. Select Workload Identity Federation as the Authentication type (recommended).
  2. Upload your Service account key JSON file to the Service account key.
  3. Enter the Project ID.
  4. (Workload Identity Federation only) Enter the GCP Project Number.
  5. (Workload Identity Federation only) Enter the Workload Pool ID. {%dropdown-button name="workload-pool-id"%}

{%dropdown-body name="workload-pool-id"%}

Pool IDs are used as identifiers in IAM.

{%dropdown-end%}

  1. (Workload Identity Federation only) Enter the AWS Provider ID. {%dropdown-button name="aws-provider-id"%}

{%dropdown-body name="aws-provider-id"%}

Providers manage and verify identities.

{%dropdown-end%}

  1. (Workload Identity Federation only) Enter the Service Account Email. {%dropdown-button name="aws-provider-id"%}

{%dropdown-body name="aws-provider-id"%}

A service account is identified by its email address, which is unique to the account.

{%dropdown-end%}

  1. Select the necessary Use load by accounts option from dropdown.

Secondary Authentication Option (Workload Identity Federation)

Note: We recommend using the Service Account Key as an authentication method.

With identity federation, you can use Identity and Access Management (IAM) to grant external identities IAM roles, including the ability to impersonate service accounts. This approach eliminates the maintenance and security burden associated with service account keys.

Learn more about Identity Federation here: Workload identity federation | IAM Documentation | Google Cloud.

  1. Setup a Workload pool and Provider for your Google Cloud project.
  2. Specify the Improvado AWS account ID that you can find on Improvado UI:
  1. Configure attribute mapping and conditions to allow only one AWS IAM role that is called: "workload_identity_federation".

How to connect

You need to share access for your Google Cloud Storage bucket to Improvado Google Service account: improvado-gcs-loader@green-post-223109.iam.gserviceaccount.com with a role at GCS bucket: Storage Object Admin.

Learn more here.

Settings

No items found.

Troubleshooting

Troubleshooting guides

Check out troubleshooting guides for
Google Cloud Storage
here:

Limits

Frequently asked questions

No items found.
☶ On this page
Description
Related articles
No items found.
No items found.

Questions?

Improvado team is always happy to help with any other questions you might have! Send us an email.

Contact your Customer Success Manager or raise a request in Improvado Service Desk.