Distributed Search: Attention Frontier Efficiency

I have discussed the power of the Attention Frontier in a previous post. It is interesting to compare some of the numbers and assumptions from the post to actual numbers from Twitter feeds, which are really Twitter Attention Frontier (TAF).

First, there are, on average, about 2K urls/min posted on Twitter, in posts containing (shortened) links. That results in about 3M links/day, before deduplication, or any kind of quality, spamming or relevance analysis. This number should be considered in the context of total number of Twitter users, which is not published, though we do know that there are now over 40 million unique visitors/month. Considering such a large number of users, it becomes pretty clear that the fraction of Twitter users who are posting links is actually quite small - one can say that an average Twitter user posts about 2-3 links/month.

It is actually quite amazing that with such low "yield" of links/month/user the resulting feeds of updates are so informative. We at Wowd strongly believe that this is a very good indication of the power of the concept of Attention Frontier.

Our vision of Attention Frontier is designed to be (much) more efficient, requiring no explicit action by users. Instead, everything is derived transparently and implicitly, from their natural browsing actions. The number of potential user actions (clicks) for an average user ranges in thousands/month. Our system is designed to be most efficient in leveraging these actions.

Another interesting question is the speed of propagation on the Attention Frontier. Twitter provides a great example since the speed of propagation of the most popular trends is pretty amazing. This is actually not very surprising, since the whole point of real-time search is identification and analysis of fast-rising trends, by definition.

There is also another interesting question, of how to rank new pages so they are not swamped by older pages which had time to accumulate rank. This is actually a burning issue for many small publishers, of how to break into Google index when competing with already available material. There is even a well-known Google Sandbox, where new pages have to sit for a period of time (rumored in months) no-matter-what, to prevent people who are able to , through some trick, game the system by introducing suddenly enormous quantities of new content.

New pages can be assigned meaningful rank in terms of their conventional relevancy and quality scores, together with other indirect factors such as the reputation of the author. Of course, the overwhelming ranking signal for new pages is freshness i.e. how quickly they are discovered.

The key point is that real-time search is an instance of search, and as such it is of vital importance to pay close attention to the relevancy of results, in addition to freshness. In this respect, the nascent real-time search industry has ways to go but the journey should be pretty exciting :)

Distributed Search

Wednesday, September 23, 2009

Attention Frontier Efficiency

No comments:

Other Interesting Blogs

Blog Archive

About Me