Design

Cleo is designed to support typeahead search for members' social network connections and activities. Cleo relies on Adjacency List, Bloom Filter and Forward Index to enable partial out-of-order string prefix match over a vary large number of elements by means of in-memory scanning and filtering. The design of Cleo can be summarized as:

During indexing, a tiny bloom filter (e.g., 4 bytes) is calculated against the pre-processed terms (e.g., {"jeff", "weiner"}) of an element using a predefined hash function h. During search, the same function h is used to calculate the query bloom filter against the terms in a typeahead query (e.g., {"j", "wein"}). The connections of the searcher are retrieved in the form of adjacency list in memory. The connections are scanned to filter out prefix mismatch using the query bloom filter and the pre-computed bloom filter values for every member connected to the searcher. Upon each bloom filter hit, the forward index is visited to reject any potential false positive prefix matches.

The design simplicity of Cleo makes it easy to combine and/or extend the three key components for varying application purposes. The Cleo library has provided several different typeahead search implementations which we will discuss shortly.

Basic API

Similar to general-purpose search libraries, Cleo provides separate interfaces for searching and indexing. The following classes define the basic API of Cleo.

Core Typeahead Classes

There are different implementations for typeahead search in Cleo. The core Typeahead classes listed below all implement the Searcher and Indexer interfaces. Generic typeahead search is typically implemented as the subclasses of AbstractTypeahead. In contrast, NetworkTypeahead need to maintain various social network connections to support typeahead search over a person's social network. Therefore, the subclasses of NetworkTypeahead implement the ConnectionIndexer interface. The following figure shows the class inheritance hierarchy of Searcher.

Cleo Searcher API