Monday, September 28, 2009

Distributed Cloud

As promised we have another white paper on the topic of Distributed Clouds.

Clouds have become a fascinating topic. Of course, as with most very popular subjects, there is no clear definition what they really are and the concept is very broad. On the other hand, some things are starting to solidify.

Our take on clouds is a bit different. We focus on Distributed Clouds, more specifically clouds across very wide area networks such as the Internet and clouds which are comprised of many independent users in contrast to the prevalent view where there are many machines which are hosted and operated by a single entity.

The emergence of computing clouds has put a renewed emphasis on the issue of scale in computing. The enormous size of the Web, together with ever-more demanding requirements such as freshness (results in seconds, not weeks) means that massive resources are required to handle enormous datasets in a timely fashion. Datacenters are now considered to be the new units of computer power, e.g. Google's Warehouse-Scale Computer. The number of organizations able to deploy such resources is ever shrinking. Wowd aims to demonstrate that there is an even bigger scale of computing than that yet imagined -- specifically -- planetary-sized distributed clouds. Such clouds can be deployed by motivated collections of users, instead of a handful of gigantic organizations.

The definition of cloud is still not firmly established, so let us start with ours. We consider a cloud to be a collection of computing resources, where it is possible to allocate and provision additional resources in an incremental and seamless way, with no disruption to the deployed applications.

In this key respect, a cloud is not simply a group of servers co-located at some data center since with such a collection it is not simple, nor very clear, how to deploy additional machines for many tasks. Consider, for example, the task of a server supporting a Relational Database Management System. A large increase in the number of records in the database cannot be simply handled only by adding additional machines since the underlying database needs to be partitioned such that all underlying operations and queries perform in a satisfactory fashion across all of the machines. The solution in this situation requires significant re-engineering of the database application.

Clouds are considered to be collections of machines where it is possible to dynamically scale and provision additional resources for underlying application(s) with no change nor disruption to the operation. Some, such as Google, consider datacenters which are basis for clouds, to be a new form of "warehouse-scale computer" (source: "The Datacenter as a Computer", Google Inc. 2009) Clearly, the number of organizations capable of deploying such resources is small, and getting smaller, due to prohibitive cost.

Consider, as an example, P2P networks. For the longest time, indeed, since the very inception of P2P, these networks have been asssociated with a rather narrow scope of activities – principally, sharing of media content. The scale of computing occurring in such networks every moment is truly staggering. However, there is a common (mis-)perception that such massive distributed systems are good only for a very limited set of activities, specifically, the sharing of (often illicit) content. Our goal is to demonstrate that distributed networks can be a basis for tremendously powerful distributed clouds, quite literally of planetary-scale. At that scale, the power provided by such a cloud actually dwarfs the power of even the biggest proprietary clouds.

I am posting the preceding part of our white paper as a preview, if you like it, you can read the rest at Wowd Distributed Cloud .


Thomas Costick said...

The Wowd definition of "the cloud" is, of course, very much in line with what Wowd has set out to do. What we used to call "distributed computing" is not new, as anyone familiar with the SETI project will know. Calling it "cloud" gives it a new, attractive buzz.

As for Google's cloud and others like it, it doesn't matter where and what the processing resource is. Right now, the Google cloud lives in data centres around the globe. The model can evolve while the API remains [largely] unchanged. Who, knows, it could already be moving out there onto a peer-level network a la Wowd.

Borislav Agapiev said...

SETI is indeed a great example of distributed computing, also known as grid computing.

OTOH there are some key differences between clouds and grid computing, the principal IMHO being the issue of latency. In grid computing there is no, or very little, communication among chunks of computation sent off to the grid. As a result, it does nort really matter how long it takes to complete them as long the results are assembled properly.

In clouds latency does matter a lot, it is actually the key. For instance if you are doing search on a distributed cloud, the latency bounds are very strict, to guarantee 1 sec overall response.

As for Google cloud, the unit of computation is a single datacenter. There is no migration between datacenters, you get assigned one and you have to stick with it.

I consider as the next step a cloud where operations, in particular IO operations, can move between machines with no geographical constraints, and are not confined to a single datacenter.