Documentation

Data Extraction - Best practices

Learn how to set up your initial reports to ensure ongoing data integrity and performance optimization

This document provides comprehensive best practices for creating, managing, and optimizing reports in Improvado. It is designed to help users from setting up their initial reports to ensuring ongoing data integrity and performance optimization.

  1. 1Simplification of Reports: Emphasize clarity by reducing the number of report elements, focusing on essential data only.
  2. 2Avoiding High Cardinality Dimensions: Limit the use of dimensions with a large number of unique values to enhance report performance and readability.
Click to open in full screen 👀
  1. 3Optimizing Report Structures for Similar Accounts: Avoid creating duplicate or highly similar report structures for the same accounts. This practice can lead to an increased number of orders for data processing. Instead, find a balance between highly granular reports and having 2-3 reports where the only differences are one or two dimensions.
  2. 4Balancing Data Granularity and Aggregation: Determine the optimal level of detail versus aggregated data to meet analysis needs without compromising performance.
Click to open in full screen 👀
  1. 1Separate granular and aggregated data:  If you need granular values and aggregated values, then it’s better to create separate reports for granular data and aggregated values.
  2. 2Utilize atomic granularity for precise comparisons: Use the smallest level of detail for critical data comparisons to ensure accuracy.
  3. 3Replicate data structures from the Data provider’s platform: Follow structured steps to accurately mirror the data setup in Improvado, ensuring consistency and reliability in reports.
  4. 4Avoid overly granular reports: Identify key metrics and dimensions that directly relate to your reporting goals. Limit your report to include only those elements that will provide you with actionable insights.
  5. 5Optimize Data Extraction schedules: Set data extractions during off-peak hours to improve performance and reduce system load.
  6. 6Streamline Reports for Efficiency: Avoid creating identical or very similar structures for the same accounts, as this can increase the number of processing operations for the same data.
  • Case: User attempted to extract every possible dimension and metric available from their sales data across multiple regions. The extraction included detailed transaction times, customer demographics, and individual product IDs. Such exhaustive data capturing led to a significant increase in extraction times and processing load.
  • Solution:
    • Review your reporting requirements and identify the necessary dimensions that are essential for achieving your analysis objectives. Focus on those that directly impact decision-making or performance measurement.
    • Avoid dimensions with a high number of unique values (high cardinality), as they significantly increase the complexity and processing times of your data extractions. Determine if high cardinality dimensions can be aggregated into broader categories without losing critical insights.
  • Case: Extracted data was processed from a global marketing campaign involving multiple metrics such as click rates, engagement rates, and demographic information from several platforms. The complexity and volume of data processed exceeded the system's optimal performance threshold.
  • Solution:
    • Make sure that you have the right expectations based on data volume and granularity. Assess the total number of rows and the complexity of the data including the number of dimensions and metrics. Larger datasets and more complex queries generally require longer processing times.
  • Case: User set an automated extraction from a financial reporting platform without considering the time zone impact or the data generation cycle of the source system. Variations in extraction times coupled with timing misalignment led to inconsistent data retrieval.
  • Solution:
    • Setting a specific time zone can impact the availability of data and the accuracy of your reports. You’ll need to carefully select the time zone for your Extraction. Delays are a common occurrence and should be anticipated.
    • Schedule data extractions during low-activity periods to optimize system performance and minimize delays.
    • If consistent delays occur, consider adjusting the schedule or frequency of the extractions to better align with system capabilities and data needs.
No items found.

Related articles

Related articles

Related articles

No items found.

Questions?

Improvado team is always happy to help with any other questions you might have! Send us an email.

Contact your Customer Success Manager or raise a request in Improvado Service Desk.