Documentation

Databricks

IMPORTANT: This article covers setup of warehouse for load data from Improvado, not customer data warehouse from which data is being extracted. This article doesn't cover setup of customer data warehouse for Data Prep as well.

Required information

  • Title
  • Server hostname
  • ~Azure / AWS / Google Cloud Databricks
  • Databricks Access Token
  • ~Learn more here
  • Filepath
  • Partition by (the way of splitting data for uploading to the file)
  • File format
  • Separator (optional)
  • ~the maximum length of the separator is 2 characters

Server hostname

  • Azure Databricks - https://adb-ХХХХ.ХХ.azuredatabricks.net
  • AWS Databricks - https://dbc-ХХХХ.cloud.databricks.com
  • Google Cloud Databricks - https://XXXX.X.gcs.databricks.com

Filepath

Possible parameters:

  • /FileStore/{{ filename }}-{{ YYYY }}-{{ MM }}-{{ DD }}
  • ~{ filename } is the same as destination table name

IMPORTANT: you cannot use {{ DD }} for partition by month

  • {{filename}}-{{YYYY}}-{{MM}}-{{DD}} – for partition by day
  • {{filename}}-{{YYYY}}-{{MM}} – for partition by month

Also, you can use “_” instead of “-” or do not use any symbols at all, for example:

  • {{filename}}_{{YYYY}}-{{MM}}-{{DD}}
  • {{filename}}{{YYYY}}{{MM}}{{DD}}

Partition by

Possible ways of splitting data:

  • Day
  • Month

File format

Possible formats:

  • csv
  • csv+gzip
  • json
  • json+gzip
  • parquet
Related articles
No items found.
No items found.