System design study: How Flipkart Scaled TiDB to 1M QPS
This blog deep dives into why Flipkart switched to TiDB and how they scaled their database to 1M QPS
Problem
As per Flipkart’s blog, most microservices relied on the MySQL cluster for data storage. This helped them get ACID properties and low latency for high throughput applications. As user traffic grew on Flipkart, they tried vertical scaling to their existing resources but they needed a long-term solution that’s horizontally scalable.
There aren’t clear reasons mentioned in the blog for going away from MySQL, but ensuring ACID properties & low latency across a high number of nodes is a tough challenge in MySQL. Read this blog for why is it hard to horizontally scale SQL databases.
Solution
As they considered a new alternative solution for MySQL, they looked out for TiDB which was already in use for some use cases at Flipkart: around 60k QPS for reads, and 15k QPS for writes.
As per the official website of PingCap (the parent company that developed TiDB), the database TiDB is a great solution for SQL storage for a lot of reasons. One of them is support for automatic sharding which provides built-in horizontal scalability across nodes without having to do any manual sharding.
There’s a blog that tries to test TiDB performance against MySQL on a single server. The observation is that TiDB is quite fast for OLAP reads (select queries that are complex and do not benefit from indexes) because it supports parallel query execution and utilizes multiple CPU cores whereas MySQL is limited to a single core for select queries.
Setup
There’s a lot of detail about resources being deployed which you can find in the main blog. In a nutshell, for reads: a mix of 12 SELECT statements were selected as used in production. Similarly, for writes: A single database transaction that executes 10 INSERT statements on 5 different tables was selected. I guess this randomization was done to get the true throughput from the database.
Scaling the database
After the initial setup, the teams made a bunch of improvements. The P99 latency for reads stood at 7.4ms at 1.04 million QPS. For the writes workload, the P99 latency at 123K QPS stood at 13.1 ms.
Beyond the above-mentioned throughput, the CPU contention(multiple processes fighting for the CPU’s resources) caused an increase in latency.
But the journey till 1M read QPS also presented some interesting challenges which teams at Flipkart solved by doing the following:
Using custom load generators and prepared statements: Flipkart used a tool called Sysbench with Lua scripts to generate the custom load. This tool used prepared statements. Prepared statements are pre-compiled queries that can improve performance by reducing the overhead of parsing the query on every execution. This is because the database knows exactly what the query is asking for and can optimize the execution plan accordingly. Thus, they were able to generate more load on the database for testing.
Replacing the TCP ELB with a custom HAProxy: They found that the standard TCP Elastic Load Balancer (ELB) was a bottleneck. They replaced the TCP ELB with a custom HAProxy configuration. HAProxy is a load balancer that can distribute traffic across multiple servers. Flipkart's custom HAProxy configuration likely improved performance by reducing CPU utilization on the load balancer and improving the latency of requests.
Distributing TiKV pods across K8s nodes: TiKV is a distributed key-value storage component of TiDB. The article mentions that Flipkart found that keeping all of the TiKV pods on the same K8s nodes created a noisy neighbor problem. A noisy neighbor problem is when one process uses a disproportionate amount of resources, which can slow down other processes on the same node. By distributing the TiKV pods across multiple K8s nodes, Flipkart was able to isolate the TiKV processes and improve the overall performance of the system.
Dedicating more cores to packet handling on the K8s nodes: The blog also mentions that Flipkart increased the number of cores dedicated to packet handling on the K8s nodes. This likely improved the throughput of the system by allowing the K8s nodes to handle more network traffic without exceeding latency targets.
Conclusion
Here are a few conclusion remarks:
TiDB can be horizontally scaled to achieve high throughput and low latency. It can deliver 1 million QPS for reads with a latency of around 5 milliseconds (P95). This is better than MySQL, which can only offer eventually consistent reads whereas this experiment was conducted for consistent reads.
TiDB can also deliver high write throughput, achieving 100,000 QPS with a latency of 6.1 milliseconds (P95).
These benchmarks were achieved using real data and queries, and reflect the potential of TiDB for real-world applications.
The top learning for me in this article was reading about the TiDB database and how it can act as an alternative to MySQL storage. Kudos to the Flipkart team and the author Sachin Japate for sharing learnings on this.
That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!
Resources
Scaling TiDB to 1 Million QPS by Flipkart
A quick look into TiDB performance on a Single Server by Alexander Rubin