Documentation

Google Big Query

Updated on

Jun 28, 2023

IMPORTANT: This article covers setup of warehouse for load data from Improvado, not customer data warehouse from which data is being extracted. This article doesn't cover setup of a customer data warehouse for Data Prep as well.

BigQuery is Google's serverless, highly scalable enterprise data warehouse designed for your data analysts. Improvado can load all data gathered from dozens of available data sources to this storage.

Select how you want to authenticate to Google Big Query

There are two ways you can get authenticated to GBQ using Improvado UI:

You can learn how to use any of these methods by following the instructions below.

Workload Identity Federation authentication

With identity federation, you can use Identity and Access Management (IAM) to grant external identities IAM roles, including the ability to impersonate service accounts. This approach eliminates the maintenance and security burden associated with service account keys.
Learn more about Identity Federation here: Workload identity federation  |  IAM Documentation  |  Google Cloud.

Configure Workload Identity Federation

  1. Setup a Workload pool and Provider for your Google Cloud project
  2. Specify the Improvado AWS account ID that you can find on Improvado UI:
  1. Configure attribute mapping and conditions to allow only one AWS IAM role that is called: "workload_identity_federation"

Learn more about Workload Identity Federation configuration here.

Required information

  • Title
  • Authentication type
  • ~Select Workload Identity Federation
  • Transitional GCS bucket
  • ~Improvado-owned bucket
  • ~Customer-owned bucket. Unless you have specific security requirements, it is recommended to use Improvado-owned bucket. Google Cloud Storage bucket must belong to the same Google Cloud Platform project
  • Transitional GCS Bucket Name
  • Dataset Name
  • Use static IP
  • Project ID - GCP project ID
  • GBQ Project Number
  • ~To locate it go to Google Cloud Developers Console → Cloud Overview → Dashboards
  • Workload Pool ID - Pool IDs are used as identifiers in IAM
  • AWS Provider ID - providers manage and verify identities
  • Service Account Email - a service account is identified by its email address, which is unique to the account

Service Account Key authentication

Generate a Service Account Key JSON file

In order to use Service Account Key authentication, first, you need to generate a JSON file via Google Cloud Console using official documentation or interactive step-by-step guide provided by Google. Alternatively, you can follow the instruction below:

  1. In Google Cloud Console, go to IAM & AdminService Accounts.
  1. Click on the Actions button for your Service account and select Manage keys.
  1. In the KEYS tab section, click ADD KEY Create new key. Choose JSON as a key type and click Create.
  1. In the JSON file, make a note of your Project ID.

Required information

  • Title
  • Authentication type
  • ~Select Service Account Key
  • Transitional GCS bucket
  • ~Improvado-owned bucket
  • ~Customer-owned bucket. Unless you have specific security requirements, it is recommended to use Improvado-owned bucket. Google Cloud Storage bucket must belong to the same Google Cloud Platform project
  • Transitional GCS Bucket Name
  • Dataset Name
  • Use static IP
  • Service account key - upload your JSON file here
  • Project ID - this field is needed for unique constraint

How to connect

Don't forget to set up the ```jobUser``` & ```dataEditor```/```dataOwner``` roles given to the Service account or a custom role with needed permissions. A detailed list of needed permissions should look as follows:

Learn more about roles and permissions in GBQ documentation system.

Use static IP

You can learn how to restrict IP addresses that are allowed to access Google BigQuery here: Restrict IP addresses allowed to access Google BigQuery.
Learn more about VPC Service Controls in the official documentation here.

Select Yes for Use static IP option if you allow Improvado to connect your database by the static IPs mentioned on the Destination connection page.

Select No if you have permitted access to your database from any IP. In this case, Improvado will connect your database using dynamic IPs not listed on the Destination connection page.

Data formats and breakdowns

Improvado stores data GBQ in "sharding" format in tables named like ```TABLE_NAME_YYYYMMDD```.Typically we break the data down by platform, account, dimension, and month, but there are some alternatives. We can implement a day-by-day division if you need it.

Additional options

If you want to have your accounts divided into groups and load those groups into different datasets (and even different GBQ accounts), we can handle this. Don't hesitate to reach our support for additional details.

Schema information

Setup guide

Settings

No items found.

Troubleshooting

Failing to Connect Google Big Query

When establishing a connection to your Google Big Query, the following error message is displayed:

{%docs-informer error%} Checking connection failed: Access denied: Dataset <your-project-name>:<your-dataset>: Permission bigquery.tables.create denied on dataset <your-project-name>:<your-dataset> (or it may not exist). {%docs-informer-end%}

{%docs-accordion title="Solution"%}

You can follow our step-by-step guide below to make sure that your Service account has sufficient permissions and is created correctly.

Step 1. Check if the correct principal for the project is specified.

Step 2. To re-create the service account, you need to access the corresponding section in the Google Cloud Console using the left navigation panel:

Step 3. There would be an option to create a new Service account:

Note: Please, do not use the “improvado-gcs-loader“ name since it would be confusing. We use this account in our internal green-post project.

Step 4. Assign necessary permissions to the created service account in the IAM section. You can find a list of required permissions in the Google Big Query guide in How to connect section.

Step 5. Once the service account is created, the new JSON key will be automatically generated.

Note: If there’s a need to reissue the key, it could be done on the same screen using the three dots menu on the newly created service account:

{%docs-accordion-end%}

Troubleshooting guides

Check out troubleshooting guides for
Google Big Query
here

Limits

Frequently asked questions

No items found.
☶ On this page
Description
Related articles
No items found.
No items found.

Questions?

Improvado team is always happy to help with any other questions you might have! Send us an email.

Contact your Customer Success Manager or raise a request in Improvado Service Desk.