Updated on
Jun 28, 2023
IMPORTANT: This article covers setup of warehouse for load data from Improvado, not customer data warehouse from which data is being extracted. This article doesn't cover setup of a customer data warehouse for Data Prep as well.
BigQuery is Google's serverless, highly scalable enterprise data warehouse designed for your data analysts. Improvado can load all data gathered from dozens of available data sources to this storage.
There are two ways you can get authenticated to GBQ using Improvado UI:
You can learn how to use any of these methods by following the instructions below.
With identity federation, you can use Identity and Access Management (IAM) to grant external identities IAM roles, including the ability to impersonate service accounts. This approach eliminates the maintenance and security burden associated with service account keys.
Learn more about Identity Federation here: Workload identity federation | IAM Documentation | Google Cloud.
Learn more about Workload Identity Federation configuration here.
In order to use Service Account Key authentication, first, you need to generate a JSON file via Google Cloud Console using official documentation or interactive step-by-step guide provided by Google. Alternatively, you can follow the instruction below:
Don't forget to set up the ```jobUser``` & ```dataEditor```/```dataOwner``` roles given to the Service account or a custom role with needed permissions. A detailed list of needed permissions should look as follows:
Learn more about roles and permissions in GBQ documentation system.
You can learn how to restrict IP addresses that are allowed to access Google BigQuery here: Restrict IP addresses allowed to access Google BigQuery.
Learn more about VPC Service Controls in the official documentation here.
Select Yes for Use static IP option if you allow Improvado to connect your database by the static IPs mentioned on the Destination connection page.
Select No if you have permitted access to your database from any IP. In this case, Improvado will connect your database using dynamic IPs not listed on the Destination connection page.
Improvado stores data GBQ in "sharding" format in tables named like ```TABLE_NAME_YYYYMMDD```.Typically we break the data down by platform, account, dimension, and month, but there are some alternatives. We can implement a day-by-day division if you need it.
If you want to have your accounts divided into groups and load those groups into different datasets (and even different GBQ accounts), we can handle this. Don't hesitate to reach our support for additional details.
When establishing a connection to your Google Big Query, the following error message is displayed:
{%docs-accordion title="Solution"%}
You can follow our step-by-step guide below to make sure that your Service account has sufficient permissions and is created correctly.
Step 1. Check if the correct principal for the project is specified.
Step 2. To re-create the service account, you need to access the corresponding section in the Google Cloud Console using the left navigation panel:
Step 3. There would be an option to create a new Service account:
Note: Please, do not use the “improvado-gcs-loader“ name since it would be confusing. We use this account in our internal green-post project.
Step 4. Assign necessary permissions to the created service account in the IAM section. You can find a list of required permissions in the Google Big Query guide in How to connect section.
Step 5. Once the service account is created, the new JSON key will be automatically generated.
Note: If there’s a need to reissue the key, it could be done on the same screen using the three dots menu on the newly created service account:
{%docs-accordion-end%}
Improvado team is always happy to help with any other questions you might have! Send us an email.
Contact your Customer Success Manager or raise a request in Improvado Service Desk.