Skip to content

LinkedIn Signal – a look under the hood

On September 29th, we unveiled LinkedIn Signal, a social search application for LinkedIn shares and tweets from LI-Twitter bounded accounts.

linkedin Signal

Let’s take a look at what’s under the hood:

overview

The Scalatra-backend is a Restful service written in Scala, on the Sinatra framework. The Rest/Json RPC model is chosen for quick adhoc data manipulation for fast iteration.

The choice for going with a JRuby frontend was also made for the reasons of fast iteration.

Data used for facet decoration is fairly static and small, we just put a BDB instance behind a service abstraction.

Saved-searches and follow were features we thought of later in the developments.  We wanted to have something running right-away that was certainly scalable and elastic.  Voldemort became an obvious choice as our query access pattern fits a key-value store perfectly.  Furthermore, Voldemort’s data-rebalancing and elastic features were to the point with respect of expected data growth.

The data stream is an aggregate of LinkedIn shares, Tweets from bounded accounts, Linkedin Profiles and member derived information. This was built on our distributed messaging queue.

We however, had to do a bit of work with the search system:

Let me first defined the search technologies we leveraged for Signal:

  • Zoie – Realtime indexing/search system.
  • Bobo – Faceted Search Engine

Both leverages Apache Lucene.

  • Sensei - distributed realtime searchable database with dynamic clustering. Leverages Zoie and Bobo.

We started with a Sensei cluster streaming Sharing data into it. We very quickly got faceted search capability on the realtime stream working.

When building the Signal search nodes, we realized Zoie‘s update support is not necessarily optimal for the append-only type data we are handling. So we built a variation of Zoie called Hourglass that is optimized for append-only data. Hourglass is analogous to a logging system, where a daily index “forward-rolls”, and old “rolls” that fall off some threshold is either archived or deleted.

We also wrote a dynamic FacetHandler for the Social Graph. For this, we leveraged the Bobo‘s FacetHandler API’s support for Runtime and Composite FacetHandler. The Social Graph FacetHandler needs to be Runtime because the connection graph is not known until query time, where the viewer ID is part of the request. And it also needs to be composite because it builds on the Member FacetHandler to get the mapping between a share and a member.

Constructing such a FacetHandler at query time can be expensive when the connection graph is large. Thus we built a LRU cache for subsequent requests to the system by the same user. Because our search nodes are replicated, we built a Consistent Hash routing strategy to ensure the same user is routed to the same search node in order to achieve maximum cache hit.

We also noticed that if we treat an article contained in a share as meta data on the share, we can construct article data in a FacetHandler as well, from that, we are able to get popular articles calculated dynamically for each query, and we have dynamic article suggestion!

After a few iteration After a few iterations, while the system seems to be holding up pretty well under load, that¹s when we noticed that we were affected by people spamming the twitter stream.  A simple example, is individuals doing some self promotions by ‘sharing’ a given url upwards of thousands of times.  It simply was affecting us as we were using a scoring scheme that put too much emphasis on share counts.  Simply put, a popularity measure purely based on sharing was not good enough for ranking.

An intuitive fix to sharing spamming is to attempt to determine the number of unique individuals behind the overall sharing pool for a given article.  In a sense the algorithm should promote articles also based on the number of unique individuals that share it and demote articles that are overtly shared a large number of times by a single individual.  Now results are fantastic!

Whew! That is a very coarse rundown on how Signal was built. As you may imagine, we had a great deal of fun building it!