AdTech custom development: Simpals Data Architecture Optimization Case
Admixer Development Team helps advertisers and publishers to overcome technical difficulties in development of their in-house solutions. The most common problem for media companies is handling large amounts of data. Building the right solution in this area can save a lot of money and resources and significantly increase operational efficiency. Admixer, having great experience in working with big data projects, helps partners with such complex development.
The Big Data Problem
Today, most of the companies are faced with the problem of storing, processing, and analyzing their data. The amount of data is constantly growing, and it becomes more difficult to handle it. Some companies solve this problem by reducing the data stored. This is fraught with the loss of important data points and could affect the overall performance. Others increase computing capacity and number of servers. Which inevitably leads to a significant increase in overall costs.
A few years ago, Admixer faced the similar problem, as we had to store and manage tens of terabytes of data with the requirement of quick access to it. Having tried various solutions and approaches, we discovered ClickHouse, which is a column-based database management system (DBMS) for online analytical processing (OLAP). ClickHouse is a very productive DBMS that provides a high-performance, easily scalable, and fault-tolerant architecture. Based on this product, we built a highly efficient system for storing, processing, and analyzing data for Admixer.
A few months ago, our partners from Simpals approached us with the similar problem. Simpals is a large media holding, who runs the largest eCommerce project in Moldova – 999.md. This marketplace generates huge number of statistics daily, and the major problem was in the constantly increasing amount of data to process.
Data architecture of the Simpals’ main project was built based on Elasticsearch. Elasticsearch is a RESTful distributed search and analytics system. It is quite suitable storage and a good engine if you only work with strings and need a proper string search. But for many real cases it is not enough, and more optimized storage system is needed.
At some point Simpals’ existing architecture ceased to be efficient both in terms of storage space and query execution speed. Given that the complexity of queries was constantly growing, they needed a more powerful solution.
Admixer’s Data Engineers conducted an audit of the current architecture and collected the main problems that the company faces while working with data and typical chains of interaction with the data. Using ClickHouse for this case was the right decision since there were many non-text values in the data structure, which meant the data storage could be greatly optimized. Furthermore, they needed a high speed of getting data outright.
We rebuilt the data structure from Elasticsearch to ClickHouse, selected the right data types, and created an optimal data processing chain. Generally, the data processing chain looks like this:
This structure can be described as a waterfall. The data first gets into the fastest and most efficient buffers, which are in RAM. Then splitting occurs, if necessary, into several large Raw plates (such splitting is necessary, for example, if the data structure is completely or basically different). And then, there is another stage of stratification into smaller plates, which are configured for specific types of requests to maximize the speed of obtaining data. Layering occurs with the help of MaterializedViews structures that allow you to process data in chains.
Our next step was to make a service that receives and inserts data into ClickHouse and acts as a proxy for receiving data directly from the servers. The service is written in Golang, which is high-performance and widely used. All data exchange between ClickHouse and the service occurs over a pure TCP protocol.
Having built a new structure, we migrated data from Elasticsearch to ClickHouse to test and analyze the capabilities of the new system. As a result, the transition from Elasticsearch to ClickHouse made it possible to speed up the execution of requests for obtaining data by several times, in some cases by dozens of times, and this applies to complex queries. Simple requests are executed within a couple of milliseconds.
The transition also reduced data storage by more than 10 times (worth noticing that it was carried out on a fairly small amount of data of several terabytes). This system is easily scalable on large volumes, allows Simpals to save infrastructure costs, and speeds up data processing significantly.
All this made it possible to optimize resources on the project, speed up a working process with data both in recording and receiving, and reduce the number of routine tasks, giving the business the opportunity to focus on more important and necessary challenges.
Want to know more about Admixer Custom Development Solutions?