Distributed Search: Real-Time Search APIs

Search APIs have always been of great interest to the developer community as well as other parties, such as researchers. In addition, the issue of huge resources required for modern search gave rise to the perception that big-time search is a province of only a few. Google used to have rather limited API which has been, sadly, discontinued. Yahoo BOSS effort has been recently the leader in the field of open search APIs. Its rate limits for developers are rather generous, but still present. On the other hand, Yahoo clearly states that they plan to introduce a fee based structure for use of BOSS APIs. Another example of a widely known search API is Twitter.

It is rather interesting that rate limiting has been rather prominently featured in all search APIs to date. The main reason for this requirement, we believe, is the perception that APIs are a cost item, almost by definition. The rate limits are hence present, to put some kind of bounds on this cost. In the case of Twitter, the issue is further exacerbated by the real-time nature of the search API which places further constraints on its computation and delivery. In simpler terms, it is considerably more expensive to operate a real-time search API than a more conventional one.

Twitter has been the leader in this field and they have been quite accomodative with developers and other parties using their service. However, even they have clearly defined rate limits e.g. the number of parties using full twitter API (so called "firehose") is very limited.

A natural question arises - does it make sense to even think of an open search API with only a few, or no, resource limitations? And how would such a service look like and who would provide for resources required to operate it?

As great fans and practitioners of search art & science, we have been asking these questions ourselves and we have come to a conclusion that it would be possible to create such and open RTS API, with very few resource limitations, by leveraging unique nature of our P2P-based distributed cloud. We would like to engage the developer community as well as other interested parties in a discussion where we would like to hear what kind of interest there would be for such a project as well as what features and modes of delivery and operation would be of most interest.

Wowd distributed network is rather unique with respect to resource contribution because users themselves are able to contribute resources. Our main idea is that by contributing to our distributed cloud users should be able to access the API in a proportional way. As an example, consider BitTorrent and their policies for controlling download speeds. It is well known that if one wants to greatly speed up download rates in BitTorrent, they need to allow faster rate of contributing back by increasing upload speeds.

The nature of Wowd system is very different from BitTorrent as Wowd is about discovering, sharing and searching new content across the entire Web as opposed to BitTorrent which is about sharing media content. However, in both cases the principal reason for using user based contributions as reciprocal measures for API usage is to ensure full scalability and availability of API resources , regardless of the levels of load placed on it.

There are several ways such an API could be operated and we are eager to hear your feedback! For instance, the level of contributions can be measured by bandwidth, RAM, disk etc that are shared. In simple terms, if one is using a bigger machine with more bandwidth and other resources, by that very fact they should be able to access the API at a higher rate.

In addition to resource contributions, there is also a natural question of contributing user attention data to Wowd Attention Frontier. Of course, a single user would contribute a given amount of attention data no matter what kind of machine they are using but we are particularly interested in cases where said user would get other users to participate attention data too. Such an user should clearly be able to access the API at (much) higher rates. There are various ways such a mechanism could be implemented e.g. by introducing "attention data keys" which would be used for labeling groups of users introduced by a source.

Furthermore, in addition to increased limits of API usage, there are additional benefits that could be provided by, for instance, processing attention data to obtain affinity group specific rankings, tags, segments of attention frontier etc. These are just initial ideas, there could be many things that could be devised based on such a contribution scheme.

We are very eager to hear back from the developers, potential partners, as well as other interested parties, about their level of interest, feedback, comments about the above ideas. We feel such a scheme would be a novel way of creating an almost limit-free search API. We would love to hear if you feel the same!

Distributed Search

Monday, March 22, 2010

Real-Time Search APIs

2 comments:

Other Interesting Blogs

Blog Archive

About Me