How Improvado Uses Clickhouse to Get Superb Processing Speed
Everyday Improvado processes and stores billions of data rows to provide our customers with actionable insights on their marketing performance. When working with large data volumes, the first thing to take care of is to choose a fast and efficient database management system. Each solution on the market claims to have the most reliable and efficient infrastructure.
For example, SaaS data warehouses like Amazon Redshift or Snowflake are some of the most well-known solutions when it comes to storing data for further analysis. However, when diving deeper, we realized that the most popular doesn’t mean the best one. So, our engineers conducted in-depth research and decided to stick with the fastest analytical DBMS for today -- ClickHouse. In this post, we’ll explain why Improvado picked ClickHouse and how it’s better than any other database management solution for today.
What is ClickHouse?
ClickHouse is an open-source columnar database management system for real-time analytics that uses SQL to process queries. Thanks to columnar storage and compression, ClickHouse achieved one of the best processing performances across its competitors. ClickHouse’s data processing speed reaches up to 30Gb/s and increases linearly when using distributed processing.
Another ClickHouses’s advantage is its scalability. The network easily scales from single-server deployment to a large network with hundreds of nodes. ClickHouse can use all available CPU cores and disk spaces to execute a single query. This also applies not only to a single server but also to all CPUs and disks in the network. That’s one of the reasons why ClickHouse tears opponents apart when it comes to performance.
Among all other benefits, Clickhouse is also reliable and highly-secured storage. Enterprise-grade security algorithms and mechanisms protect data against corruption, malicious actions, and user mistakes. The DBMS ensures best-in-class availability. Due to the support of multi-master replication, ClickHouse effectively performs in multi-region configurations.
No wonder that such a powerful database management tool became recognized among the world’s leading tech giants. For example, Uber transferred its logging system to ClickHouse to achieve a 10x performance increase and reduce the hardware cost in half.
eBay also utilizes ClickHouse. The company uses ClickHouse for its real-time OLAP events infrastructure. The ClickHouse adoption allowed eBay to reduce DevOps department efforts, use ten times less hardware, and achieve better results in visualization and analytics with Grafana.
Spotify, Deutsche Bank, Cloudflare, and other well-known brands also use ClickHouse to accelerate their data operations.
Data Warehouse Pricing Model: What Do You Pay For?
Now, it’s time to find out how popular data warehouses charge money for their services and why ClickHouse is more cost-efficient than any alternative on the market. Improvado analyzed pricing models of all popular data warehouse vendors to compare them with ClickHouse. We’ll consider Snowflake and Redshift.
Snowflake Pricing Model
The price for Snowlake’s virtual warehouse depends on three main factors:
- Number of servers you use
- The warehouse’s uptime
- The volume of data you’re storing
To use Snowflake’s services you have to purchase so-called Snowflake credits. These credits are used to keep your servers up and running and for analyzing the data from your storage. The price for computing credits starts at $2 per credit. However, it depends on the plan you’re using. For example, enterprise plans that ensure sensitive data safety (HIPAA, GDPR, CCPA compliant plan) will cost $4 per credit. The price also depends on the preferred cloud provider and region.
Here’s a table that displays how many credits your warehouse will consume based on the number of servers.
Besides, Snowflake charges a flat rate of $23/TB of data per month for storing it in dedicated storage. Mind that you don’t have to pay Snowflake credits 24/7. The platform charges money only when executing queries. It’s free of charge when idle, so you only pay for data storing.
However, marketing analysts ought to think twice before making new queries. Since the platform charges money per second of query execution, marketers and analysts become limited in terms of experiments, especially when it comes to large datasets. It’s risky to drain the marketing budget down the pipe with unpredictable results of analytical experiments. So, even though Snowflake is a great solution in terms of performance, it binds marketers’ hands and puts the marketing budget at risk.
Redshift Pricing Model
In line with its competitors, Amazon Redshift also has two pricing models:
- Reserved Instance pricing
- On-demand pricing
We’ll consider on-demand pricing first because the reserved model is the same on-demand model but with additional discounts.
With the on-demand model, Amazon charges for processing capacities on an hourly basis. Depending on the type and number of nodes in your cluster, you’ll pay different hourly rates for the query execution. The price varies considerably, starting from $0.33 per hour for two virtual CPUs and 15 GiBs of memory to $14.424 per hour for 48 virtual CPUs and 384 GiBs of memory.
Amazon bills partial hours in one-second increments. You can also suspend the analysis process. The system won’t charge money until it’s idle.
The price becomes more affordable if you pay in advance for a reserved plan. Here you can learn in detail how the price is calculated and what processing capacities you can get.
Besides, you still need a place to store all of your data. Redshift is ready to help you with that for $24 per TB per month.
However, pricing isn’t the platform’s biggest problem. It’s all about the performance of the service. According to recent benchmarks, ClickHouse outperforms Redshift in any possible scenario. Even when the same query is distributed between three Redshift servers, ClickHouse still manages to execute the query faster in most cases. Depending on the query, ClickHouse is from 1.5x to 5x faster than Redshift. That’s the reason why we chose it as our database management solution.
As for Redshift, considering its price and the performance gap with ClickHouse, it made no sense to use this platform. It limits marketing analysts not only in terms of resources but also wastes excessive time on data processing.
ClickHouse Pricing Model
So, what’s so special about ClickHouse apart from its performance? Its enormous execution speed can be achieved at almost no cost. ClickHouse doesn’t charge any money if you want to deploy it on your physical machines. But, if you considered Snowflake or Redshift, an on-premise solution is most likely not the thing you’re looking for.
"The key benefit of ClickHouse lies in its reasonable pricing terms. Unlike other data warehouses, ClickHouse allowed us to build a predictable pricing model that doesn't charge money for each operation with data. Analysts can focus on pure analysis without thinking about rational usage of credits, tokens, or whatever currency your platform has with limitless access to data and queries. Furthermore, ClickHouse showed the best performance results among competitors, allowing analysts to make more complex queries with lower latency." — Dmitry Nasikanov, Chief Technical Officer at Improvado.
That’s why Improvado utilizes ClickHouse to provide clients with high-performance cloud-based storage for all of their marketing data. We deploy ClickHouse databases on virtual instances and help our clients seamlessly connect their BI tools to the data warehouse. In this way, marketing analysts don’t have to pay for the query execution time or the amount of processed data.
This fact allows marketers and analysts to experiment with accumulated data without the risk of spending too many resources. With an unlimited number of available queries, companies can identify new trends, analyze customer behavior, and keep an eye on all insights that previously were overlooked.
Why ClickHouse is the Best Choice for Improvado?
As a fully automated marketing ETL platform, Improvado offers managed warehouse services to our clients. Since we are marketers ourselves, we understand how much data means when it comes to building and adjusting campaigns. Artificial constraints like pricing per query combined with low performance poorly affect the outcomes of analysis and campaign optimization. That’s why we searched for a solution that processes data quickly and doesn’t put any limitations on the analysis process. ClickHouse appeared to be the best candidate.
Unlike other columnar databases, ClickHouse not only stores data but also processes it in columns. This leads to a far more balanced and efficient CPU cache utilization and allows for SIMD CPU instructions usage. Besides, ClickHouse is a very scalable solution. It can utilize all CPU cores to execute a single SQL query.
What’s more, with third-party solutions like Redshift you have to set up dashboards for various metrics on your own. If you’re using a marketing ETL solution such as Improvado or Adverity that calculates custom metrics beforehand, you still have to recalculate them in your own warehouse because any data filtering may ruin the granularity of your insights.
Improvado offers seamless integration with any business intelligence tool due to ClickHouse. Our clients can easily connect any business intelligence tool and monitor performance indicators on their dashboards with a managed data warehouse. ClickHouse is a great solution to work with data in real-time because of its input/output speed. Moreover, you can change visualization tools (if you work with Tableau and Google Data Studio) without spending time on data adjustments.
As it comes clear, ClickHouse is a versatile tool that, combined with an automated data pipeline, grants unlimited possibilities for marketing analysts. An outstanding performance, cost-effectiveness, and interoperability with business intelligence tools make ClickHouse a strong alternative to popular solutions. Now, marketers don’t have to worry about spending too many resources on experiments and fully dedicate themselves to marketing analysis.
500+ data sources under one roof to drive business growth. 👇