System Design Study: Flipkart's Autosuggestion & Discover More Feature
This article talks about about how Flipkart has designed Autosuggestion & Discover More features for their customers
What is Autosuggestion?
Whenever you type anything in Flipkart’s Search bar, then after a few letters you get auto-suggestions that predict what you are trying to search for. This feature is called autocomplete or autosuggestion and is prominent across the industry for various use cases like Google Maps, Movie Recommendations on Netflix, YouTube videos, etc.
The autosuggestion feature is a good user experience because it allows users to find their desired product on Flipkart without entering the full query. Moreover, the users might enter a misspelled search query which leads them to wrong results. The autosuggestion feature proposes similar relevant searches and thus tries to account for the user’s typo.
In E-commerce, the discoverability of relevant products among thousands of product types is very important and autosuggestion is that bridge which connects the user and right product.
When a user types in Flipkart’s search bar, there are autosuggestions presented to the user based on their past searches. Moreover, below the autosuggestions, there is a “Discover More” section which presents search queries for the most popular categories personalized for the end user. Let’s first discuss the Autosuggestion feature and then talk briefly about the “Discover More“ feature.
Current Design
The following image represents the current design for supporting the autosuggestion feature. Let’s discuss each component in detail:
Shopping Journey store: This component captures all the user activity on the Flipkart website. It contains search events representing what the user searched, products viewed by the user, purchased by the user, etc. This data is prepared through ETL (Extract-Transform-Load) pipelines and becomes input to CGF.
CGF (Content Generation Framework): The framework establishes the relationship between the user search queries and multiple (possible) destination queries and then publishes them into a Distributed key-value store.
AutoSuggest Service: When the users type in the Flipkart search bar, a request from the client(web or app) is sent to the autosuggest service which proposes the relevant destination queries by querying the distributed <Key, Value> store.
What is Content Generation Framework (CGF)?
The content generation framework is responsible for matching user-typed search queries with the destination queries that eventually help in finding the product. The CGF computes a relatedness score which determines the similarity between the user-typed query and destination query. The higher the relatedness score, the more the similarity.
There are three main components of CGF:
CGF Component 1: Finding related queries
This CGF component consists of grouping all the similar user-typed search queries that lead to the same products on Flipkart. This is implemented using two approaches:
1) Immediate query to query reformulations: this approach calculates that if the user has searched for a “source query”, what is the probability that the user will reformulate their search query to “destination query“. Thus, this approach finds out the probability of every <source query, destination query> in a single category.
2) Co-clicked products: Co-clicked products are those products that the user can reach but via using different search queries. The extent of overlap between product clicks for different queries represents the extent of relatedness between the queries. For example: user 1 searched “Harry Potter“ and user 2 searched “JK Rowling books“ but both users reached the same products because J. K. Rowling is the author of Harry Potter books.
CGF Component 2: Blending of related queries
This CGF component consists of blending the related queries obtained from both approaches: immediate query to query reformulations and co-clicked products. As per Flipkart’s blog in 2021, the current blending function weighs both sources equally, it is expected to move towards a weighted function computed using machine learning.
CGF Component 3: Ensuring Candidate Quality
The third and final CGF component ensures the candidate quality check just before presenting them to the Flipkart customer. This consists of various steps. For example:
Flipkart uses its Search’s Semantic Spell module to rectify the misspelled queries typed by the users.
Flipkart also uses deduplication to eliminate duplicates or similar query suggestions. For example: “shirts for boys” is the same as the “shirts for boy”. This is achieved using stemming which is a process of reducing inflected (or sometimes derived) words to their word stem, base, or root form.
Flipkart uses historic performance numbers (like click-through rate, popularity, etc) on all the destination queries and eliminates the ones that do not meet the threshold. This is done to make sure that no weird destination query is being presented to the end customer and that users have already searched for it or clicked on it.
Why Distributed Key-Value Store?
The CGF stores the output data into the HDFS. The data is in the order of tens of millions of rows. This data needs to be migrated from HDFS to a data store layer that can serve auto suggestions via autosuggest service in single-digit milliseconds latency.
This is because Flipkart autosuggest is a user-facing feature and would easily face thousands of requests per second. Also, if the autosuggestions are slow, then the end customer would drop out from the website and move to other competitors of Flipkart for their purchase. Flipkart needed a solution that could scale with time as the size of the dataset continued to grow.
For all the above reasons, Flipkart selected an in-memory distributed key-value store that could offer high throughput, low latency, and no scalability concerns in the short term.
How is the “Discover More” feature designed?
The “Discover More” feature provides personalized suggestions to the user. It aggregates results from two approaches: Personalized suggestion provider and Popular suggestion provider.
Personalized suggestion provider: This approach takes in the user’s past search queries and then fetches the corresponding related searches along with their relatedness score from the distributed key-value store. All the results are blended and then ranked together which is a function of the relatedness score and recency of the query.
Popular Suggestion Provider: Though the blog does not mention much about this approach, this could be a generic way where Flipkart passes the user-typed queries to a (internal/external) provider of Flipkart and then fetches the most popular suggestions for that category.
All results from both of the providers are mixed and then top k results are selected for rendering to the user.
That’s it, folks for this edition of the newsletter.
Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!
Book exclusive 1:1 with me here.
Resources
“Predicting your next query even before you type!” by Flipkart