Cleo

Cleo is a flexible software library for enabling rapid development of partial, out-of-order and real-time typeahead and autocomplete services. It is suitable for data sets of various sizes from different domains. The Cleo software library is published under the terms of the Apache Software License version 2.0, a copy of which has been included in the LICENSE file shipped with the Cleo distribution.

Not to be mistaken with query autocomplete, Cleo does not suggest search terms or queries. Cleo is a library for developing applications that can perform real typeahead queries and deliver instantaneous typeahead results/objects/elements as you type.

Cleo is also different from general-purpose search libraries because 1) it does not evaluate search terms but the prefixes of those terms, and 2) it enables search by means of Bloom Filter and forward indexes rather than inverted indexes.

Still confused about the meanings of Cleo? Let's now take a look at a search query example. If you perform a search query, say "j weiner", at Google, you will have a list of suggested search queries as you type. This list changes automatically depending on the words in your search query. This is query autocomplete. You choose a search query from the list and then Google provides you with corresponding search results.

Google search autocomplete for j wein

On LinkedIn, when you type "j wein", you are presented a list of search results instead of search queries. These search results are real-time, aggregated from different search domains, and then filtered accordingly based on your 1st and 2nd degree network connections. By real-time, we mean that any new members joining LinkedIn are immediately searchable through Cleo-powered typeahead services.

LinkedIn typeahead search for j wein

Cleo @ LinkedIn

Cleo has been in extensive use to power LinkedIn real-time typeahead search covering different data sets, which include members (1st and 2nd degree network connections), companies, groups, questions, skills, and various site features. Its use cases are in two broad categories:

From an architectural perspective, LinkedIn typeahead search is composed of different layers: browser cache, web tier, results aggregator, and various typeahead backend services. Cleo is powering all backend services. Each backend service runs in a cluster and presents the same API to the aggregator. Depending on the landing page, the aggregator automatically choose and aggregate different types of typeahead search results for the web tier to consume. The browser cache is also used to cache typeahead search results for faster rendering.

LinkedIn typeahead search architecture

From a performance perspective, Cleo is very fast in returning typeahead search results. Within a cluster, generic typeahead services can normally return results in less than 1 millisecond. In contrast, network typeahead services are slower when the total number of 1st and 2nd degree network connections is very large. For performance reasons, a timeout is set at 15 milliseconds and it may occur for LinkedIn members such as recruiters and LION (LinkedIn Open Networker), who typically have a very large network. At the level of aggregator, the response time is approximately 20 to 25 milliseconds on average.

References

  1. The Life of a Typeahead Query
  2. This is an excellent Facebook engineering blog on understanding the various technical aspects and challenges of real-time typeahead search in the context of social network.

  3. Efficient type-ahead search on relational data: a TASTIER approach
  4. This research paper describes a relational approach to typeahead search by means of specialized index structures and algorithms for joining related tuples in the database.