KAPANDR Dec 29 2020 at 19:59

Tarantool: an analyst's view

8 min

1.9K

VK corporate blogSystem Analysis and Design*Database Administration*Internet marketing*Tarantool*

Hi all! I'm Andrey Kapustin. I work as a system analyst at Mail.ru Group. Our products form a unified ecosystem. Many independent infrastructures generate data in it: taxi and food delivery services, email services, social networks, etc. The faster and more precise we can predict a client's needs, the sooner and more correctly we can offer our products.

Many system analysts and engineers are keen to know:

How to design the architecture of a trigger platform for real-time marketing?
How to arrange a data structure that would be in line with the requirements of a marketing strategy for interacting with clients?
How to ensure the stable operations of the system under very heavy workloads?

Such systems are based on technologies of high-load processing and Big Data analysis. We have accumulated considerable experience in these areas. Our expertise is in high demand on the market. I'm going to show how we help our customers to switch from off-line to on-line in their interactions with clients using Real-Time Marketing solutions based on Tarantool.

One day, a major telecom operator asked us for help.

Their objective was as follows:

We have over 100 million subscribers.
We need to keep track of the metrics: current balance, traffic volume, connected services, movements.

This is a huge array of data. It can only be processed in the background. We run data handlers every night when the workload is minimal, generate advertising campaigns by morning and send out offers. It turns out that our data is one day behind, but we want to interact with clients in real-time!

We realized that marketing must work 24/7.

Why? Because the faster a telecom operator handles data, the more money it can earn. Take an impulse purchase, for example: a user passes a cafe and receives a discount notification. That's it, we «merely» need to offer the right product at the right time and help to respond to an offer.

Real-Time Marketing

This is what we need to reach the business goals:

Identify client’s needs through the client's Profile.
Identify the right time – by events in a person's life.
Encourage responses – by choosing the optimum channel for communication.

This is called Real-Time Marketing. In terms of telecom it means that we need to send relevant personalized messages to our subscribers at the right time with the possibility of responding to an offer IMMEDIATELY. Offers can be addressed to a target group or to a specific user. The request should be processed in real-time.

Technical objectives are:

Keep the data of over 100 million subscribers up to date;
Handle a stream of events in real-time with 30,000 requests per second (RPS);
Generate and route offers to subscribers while meeting non-functional requirements (response time, availability, etc.);
Provide a seamless connection of new sources of miscellaneous subscriber data.

Now we should make clear what «Real-Time» is. There are two key parameters:

Processing time – how much time we have to process a request. For situational marketing, the processing time is around 30 seconds.

Delay time – time spent to process a request. It's possible to both process a request and communicate with a subscriber under 30 seconds: send an offer through a communication channel that is convenient for the subscriber in the form of a PUSH message or SMS.

A longer time makes no sense – the opportunity is missed, the client went away. The worst part is that in this situation we don't know – because we made a wrong offer or because we missed the right moment?

The answer to this question is crucial for product development.

Promoting your products to market: verify hypotheses and boost revenues.
Winning potential clients: invest in advertising and take over the market.
Connecting additional services or features: extend the product range.

It's easy to make a mistake at each stage, and the cost of a mistake is high. We should be quick and accurate. We should have complete and relevant client information. In this case, information costs money!

Stratification

Under such conditions, segmenting the client base it is a very effective approach. We decided to use the stratification procedure – multi-factor classification of subscribers.

Consider a cube that is divided into random cells where a ball can move. The cells are strata, and the ball is a subscriber.

Place attributes on the axes: age, average check, frequency of service usage, occupation, monthly income, and place of residence.

Therefore, the strata represent marketing groups, and we can distribute subscribers among them in real-time. Add business logic for generating offers at subscriber’s entry to/exit from a stratum – and there you have it, real-time marketing.

The figure below shows an example of a 3D stratification model.

Customer digital profile

The function of marketing is to constantly verify new hypotheses. For each client we can determine how much we spent to win him, how much we earned, and how. The more we know about our subscribers, the more we earn. Accordingly, targeting precision increases with every new parameter added to a subscriber's profile. It means that we know how much information costs, and how much we lose if we don't update it.

So we realized that we needed updates. New challenges arise: it's never enough. There are new requirements in every new project, which conflicts with the terms of reference, with the architecture, with each other, and … with the common sense. It's harder and harder to maintain data integrity day by day. New information sources emerge with new attributes that we don't know where to store and how to process. This is an ongoing process, because:

The client base keeps growing; and
The range of services increases.

It should be noted, however, that the more normalized the data is, the more limitations, references, and verifications are in it. When you add a couple of fields to the table 'on the fly’, they are often not suitable for the current data model! How should we tell the customer that we'll have to rewrite half of the project's code if we add a new field?! We merge or reject ‘redundant’ analysis processes at the input, and finally, we are unable to build relevant offers.

This effect is called ‘shit in – shit out’.

As a result, data occupies more space and becomes difficult to process. Transaction processing speed decreases as information volume grows.

Conclusion: normalization is not suitable for real-time marketing with 100+ million subscribers.

How should we arrange the data structure?

To achieve real-time performance, we should receive information within 1 request. This is possible if we store all the attributes in one place, i.e. the data structure should be flexible and expandable.

We decided to create a unified customer profile. It's stored in a key-value storage, so we can avoid freezing the data structure. Each attribute is a key-value that can be anything.

We had a combination of:

Static attributes which are rarely updated (name, passport data, address, etc.); a mandatory block with ID; and
A dynamic 'tail' of an arbitrary length: frequently updated data which depend on the source. Several independent blocks for each source.

This approach is called denormalization. Why is it convenient?

The 'tail' doesn't need to be validated.
We save 'raw' data as is, without processing.
We save all the incoming information, so we lose nothing.
To load the source data it's enough to know the ID of the client by which we link up the block.
The data is stored compactly (space savings are up to 2–3 times) which is critical for large data volumes.
Access to the data is facilitated: we can choose to update and request only the necessary block of information.

Big Data

Now we need to choose a tool for implementation. This is usually done by the architect, based on requirements gathered by analysts. It is essential to figure out the expected data volume and the workload level. This determines data storage and processing methods.

Our service will process a lot of data. How much exactly? Let's find out.

Data can be considered big if interrelations in it can't be seen by the naked eye.

We handle over 100 million different client profiles. They contain unstructured information, and they are frequently updated and used. This is true Big Data.

How should we accelerate data processing?

We need to cache current client profiles to decrease the workload on production systems. You can't achieve real-time processing without storing hot data in short-term memory.

High load

The term 'high load' describes a situation where equipment is no longer able to sustain the workload.

We process different types of events which occur continuously at rates of 10,000 to 30,000 RPS. At the same time, complex business logic is used, and the response speed is critical. We are designing a high load service that should be dynamically scaled depending on a given workload.

Tarantool as an accelerator

In Mail.Ru we have an out-of-box solution for building high-load applications. Tarantool is an In-memory DBMS with a built-in application server and support for scaling.

Originally, Tarantool is an open-source solution, and we are developing it extensively together with the community. We rreleased the most popular functionality as the Enterprise version.

Tarantool is a fully-featured analyst's tool which enables setting up data handling rules with minimum effort:

Design the data model.
Set up conversion rules: computation of predicates, decision tree.
Select the data which needs caching.
Monitor the workload, enable balancing.

When dealing with a large data volumes, we advise to use it:

As a Data Mart for caching data in short-term memory for quick access.
As an application server which allows you to implement business logic inside and not transfer data through the network.

Business logic is stored near the data. This is vital for high-load services. In our project, we used Tarantool as a 'smart' Data Mart with built-in business logic. The incoming stream of events and information is processed 'on the fly'.

Why Tarantool is effective for RTM:

The purpose of RTM: to come up with a personalized offer quickly. To do this, a large amount of data needs to be processed in real-time. We did this in several steps. We ran tests, determined where there was a decline in performance, and figured out how to decrease the workload on production systems.

First, we moved the subscriber profile and strata to Tarantool.
Then we added the product catalog and connected services.
Further, we added marketing campaign management, promo codes, and communications history.
We gradually moved critical data and business logic to the cache.
Finally, we built a fail-safe and scalable solution.

There are two obvious risks in our project:

Every new connected service requests information from the client profile. This can lead to a considerable increase in database read workload. In this case, replication helps: creating the required number of Tarantool instances with a copy of the database, to balance the read workload.
We have a huge client base which is constantly growing. Meanwhile, the same services may be launched on users' smartphones in parallel, each updating the client's profile in real-time. This means we always have a high database write workload. Replication will not help in this case because changes in a user profile need to be replicated on all servers. What required is sharding, i.e. distributing 100 million records from clients' profiles table among several shards. This way we'll parallelize the processing of requests and decrease write load. The simplest example: divide clients' profiles table according to ranges of ID values. Tarantool offers horizontal scaling tools to do this, which are discussed in detail in the article entitled «Тarantool Cartridge: Sharding Lua Backend in Three Lines«.

Conclusion

Tarantool is not a substitute for Oracle or another analytical storage. However, it is effective for handling a large amount of data in real-time. It is an excellent tool that can be used for a wide range of applications (Data Marts, queues, authorization systems, etc.). We successfully fulfilled the customer's task and stayed under the agreed time frames and budget. I recommend trying out Tarantool for creating high-load services.

In this case the matter of technologies used becomes broader in scope and importance.

For business, going on-line equals survival.

It is of vital importance to build relationships with external parties, to promote our services and to make them more user-friendly.

Cloud-based real-time data processing technologies enable the achievement of these goals.

We need to share experiences to develop them. Together, we can make life better!

Tags:

Hubs: