IMPORTANT: This article covers setup of warehouse for load data from Improvado, not customer data warehouse from which data is being extracted. This article doesn't cover setup of customer data warehouse for Data Prep as well.
Required information
- Title
- Server hostname
- ~Azure / AWS / Google Cloud Databricks
- Databricks Access Token
- ~Learn more here
- Filepath
- Partition by (the way of splitting data for uploading to the file)
- File format
- Separator (optional)
- ~the maximum length of the separator is 2 characters
Server hostname
- Azure Databricks - https://adb-ХХХХ.ХХ.azuredatabricks.net
- AWS Databricks - https://dbc-ХХХХ.cloud.databricks.com
- Google Cloud Databricks - https://XXXX.X.gcs.databricks.com
Filepath
Possible parameters:
- /FileStore/{{ filename }}-{{ YYYY }}-{{ MM }}-{{ DD }}
- ~{ filename } is the same as destination table name
IMPORTANT: you cannot use {{ DD }} for partition by month
- {{filename}}-{{YYYY}}-{{MM}}-{{DD}} – for partition by day
- {{filename}}-{{YYYY}}-{{MM}} – for partition by month
Also, you can use “_” instead of “-” or do not use any symbols at all, for example:
- {{filename}}_{{YYYY}}-{{MM}}-{{DD}}
- {{filename}}{{YYYY}}{{MM}}{{DD}}
Partition by
Possible ways of splitting data:
File format
Possible formats:
- csv
- csv+gzip
- json
- json+gzip
- parquet