Documentation

Azure Data Lake

Updated on

Sep 5, 2024

Note: Azure Data Lake is currently supported only as a Destination. This guide doesn’t cover the DataPrep setup for Azure Data Lake.

Description

Azure Data Lake is a scalable cloud storage solution that enables the efficient storage, processing, and analysis of large amounts of structured and unstructured data. It supports big data analytics by integrating with Azure's analytics services, offering seamless data ingestion, high-throughput processing, and advanced security features.

Schema information

Setup guide

Follow our setup guide to connect Azure Data Lake to Improvado.

Complete configuration

On the Azure Data Lake connection page, fill in the following fields:

  1. Enter a name for your Destination connection in the Title.
  2. Enter the Account URL. {%dropdown-button name="account-url"%}

{%dropdown-body name="account-url"%}

Account URL must satisfy the following regular expression: ```https://[a-z0-9]*.blob.core.windows.net$```.

The ```[a-z0-9]``` part of Account URL must be between 3 and 24 characters in length.

{%dropdown-end%}

  1. Enter the SAS Token. {%dropdown-button name="sas-token"%}

{%dropdown-body name="sas-token"%}

Learn how to create a SAS Token with this guide.

{%dropdown-end%}

  1. Enter the File System Name. {%dropdown-button name="file-system-name"%}

{%dropdown-body name="file-system-name"%}

File System Name length must be between 3 and 63 characters and must satisfy the following regular expression: ```r'^(?!.*--.*)[a-z0-9][a-z0-9]*[a-z0-9]$’```

{%dropdown-end%}

  1. Select the necessary Encryption type option from the dropdown. {%dropdown-button name="encryption-type"%}

{%dropdown-body name="encryption-type"%}

Possible options:

  • No encryption (default cloud storage encryption is still enabled)
  • Customer-provided key

{%dropdown-end%}

  1. (Customer-provided keys only) Enter the Encryption key. {%dropdown-button name="encryption-key"%}

{%dropdown-body name="encryption-key"%}

If you have selected the Default Cloud Storage encryption type, you will not be able to edit this field.

Otherwise, you should enter your AES-256 key, encoded in standard Base64 or resource name of Cloud KMS key used to encrypt the blob’s contents. For more info, see Azure Data Lake encryption docs.

{%dropdown-end%}

  1. Enter the Folder. {%dropdown-button name="folder"%}

{%dropdown-body name="folder"%}

Possible parameters:

```/data_source/data_table_title/report_type/YYYY/MM/DD/timestamp```

  • ```data_source``` is a data provider, integration, connector
  • ```data_table_title``` is an object that contains all extraction orders with the same granularity (dimensional schema)
  • ```report_type``` is a set of such fields as metrics, properties, dimensions, etc.
  • ```timestamp``` is the date and time when data load started

If you use ```/YYYY/MM/DD``` settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.

The maximum length is 254 characters.

{%dropdown-end%}

  1. Select the necessary File format option from the dropdown. {%dropdown-button name="file-format"%}

{%dropdown-body name="file-format"%}

Possible formats:

  • csv
  • csv+gzip
  • json
  • json+gzip
  • parquet
  • avro

{%dropdown-end%}

  1. Enter the Filename. {%dropdown-button name="filename"%}

{%dropdown-body name="filename"%}

Possible parameters:

```filename-YYYY-MM-DD```

  • ```filename``` is the same as destination table name

Note: you cannot use ```DD``` for partition by month.

  • ```filename-YYYY-MM-DD``` – for partition by day
  • ```filename-YYYY-MM``` – for partition by month

Also, you can use “_” instead of “-” or do not use any symbols at all, for example:

  • ```filenameYYYY-MM-DD```
  • ```filenameYYYYMMDD```

{%dropdown-end%}

  1. Select the necessary Separator option from the dropdown. {%dropdown-button name="separator"%}

{%dropdown-body name="separator"%}

Possible delimiters that can separate data in your file:

  • comma
  • semicolon
  • tab

{%dropdown-end%}

  1. Select the necessary Partition by option from the dropdown. {%dropdown-button name="partition-by"%}

{%dropdown-body name="partition-by"%}

Possible ways of splitting data:

  • Day
  • Month

{%dropdown-end%}

Settings

No items found.

Troubleshooting

Troubleshooting guides

Check out troubleshooting guides for
Azure Data Lake
here:

Limits

Frequently asked questions

No items found.
☶ On this page
Description
Related articles
No items found.
No items found.

Questions?

Improvado team is always happy to help with any other questions you might have! Send us an email.

Contact your Customer Success Manager or raise a request in Improvado Service Desk.