The premise is sound: the traditional approach of “one application – one server” is inefficient. Through virtualization, multiple operating systems and applications can share the same CPU, memory, network, and disk interfaces. Application “A” is shoehorned into the 97% of hardware time that Application “B” isn't using.
Typically, servers can be consolidated at a ratio of 9:1. The math doesn't work out perfectly, but that means, roughly, that a traditional datacenter is 9 times larger than it needs to be. It uses 9 times the electricity it needs to. Its climate control system is 9 times larger than it needs. Consolidating servers into virtual machines represents a huge cost savings in the physical plant alone.
However, the benefits of virtualization don't stop there. Because there are fewer physical machines running and they are similar, if not identical, service agility is dramatically increased. Spare parts are a lot easier to stock and the technicians working with the physical servers have more specialized knowledge. When something does break, the application is not lost, as in the traditional model. Rather, the cluster merely loses those resources until they can be brought back online.
Because of these advantages, virtual datacenters can easily achieve true 5-nines uptime. If virtualization had a tagline, it would be, “come for the savings, stay for the service!”
And, it's not theoretical. Every organization of any size either has or is in the process of virtualizing their datacenters.
Virtual machines and virtual networks follow all the rules of physical servers as far as administration and security policies, but two additional concerns are introduced. First, every virtual machine must be no-kidding, fail-proof, 100% isolated from all other virtual machines; they cannot be allowed to share the physical resources at the same time. Second, since multiple applications are relying on each set of physical hardware, no single point of failure is acceptable – everything from power supplies to network and disk connections must have redundant components.
Enter the Cloud
If consolidating physical servers as virtual machines makes a good business case, consolidating datacenters into one big datacenter makes a great business case. Physical security, power requirements, hardware concerns, and climate control are all offloaded to a servicer who benefits from economies of scale.
As datacenters are consolidated, however, the value of the data being processed is increased geometrically, which amplifies the need for isolation and redundancy. Since the data has moved from under the physical control of the individual organizations, however, there is an additional, overriding concern for data security.
Ideally, all data should be encrypted before transmitting to the cloud, stored encrypted, and be decrypted only at the authorized destination when transmitted from the cloud. In reality, though, a lot of data is transmitted, stored, and retransmitted in the clear for various reasons. This has a couple of effects that should concern any organization contemplating or using a cloud service.
First, unencrypted data has the effect of making encrypted data stand out like a sore thumb. An analogy would be a textbook where only certain chapters are encrypted. Even if all the characters on every page of the text were randomized prior to “printing” the actual content of the text, the presence of clear text offers clues about the start and end of blocks of encrypted text.
Second, the unencrypted data gives clues about the ownership of the encrypted data through a process of elimination.
This suggests, at a minimum, two rules that any cloud participant should insist on: (1) data stored by the participant should be anonymized (indistinguishable from any other data stored in the cloud), and (2) indexing of owned data blocks should be accomplished by the participant (or a third-party service) and should not be available to the cloud servicer.
Wikileaks and Amazon as Instructable
Politics aside, the decision by Amazon Web Services (AWS) to expurgate Wikileaks data should alarm any cloud participant. While you and your organization may be a “respectable” and your organization follows all the laws, and you adhere strictly to AWS terms of service, your opinion doesn't count. The only opinion that counts is that of the legal department at AWS.
That makes AWS a single point of failure.
Because AWS knows where your data is on their disks, it can concern itself with what your data is. While that may be wholly appropriate for Amazon's business model, any user of the AWS service should consciously evaluate the wisdom of outsourcing the availability of their data to the opinion of the AWS legal team.
Also, because AWS knows where your data is, your data is vulnerable to political pressure. Although AWS denies political pressure had anything to do with removing the Wikileaks data, the fact remains that it could have played a role. In such a case, AWS would have to balance the cost of protecting your business against the potential of losing all their other business. No matter how big your organization is or how much money you spend with Amazon, you will be on the short end of that equation.
AWS should have the tagline, “come for the savings, hope for the service!”
Ominous Dark Clouds
Note that this is not a condemnation of AWS (or Google or Microsoft or ...) or of cloud computing in general. Rather, this incident should act as a bright, flashing indicator that something is wrong with the cloud resource protection model. The availability of an organization's data should not be held hostage to the whims of a vendor's staff.
Nor should domain lookup be held hostage by the hierarchical model of domain name servicing in common use. This model is widely recognized by professionals and totalitarian governments alike as somewhat of an Achilles Heel, and the widespread use of DNS within common software should be alarming in and of itself.
These are not overwhelming technical problems (not, really, even especially challenging ones). Solutions to these problems already power darknets, and the cypherpunks who create darknets do so primarily because they recognize them as problems.
As just one instance, Freenet combines dispersed, redundant, anonymized, encrypted storage with distributed DNS and indexing. It's probably slower than the public network, but incredibly robust. An AWS-type cloud could run Freenet (or similar darknet) and completely avoid the need and ability to evaluate customer data, thus removing themselves as a single point of failure (although DNS would remain as a known hole, one that would either resolve itself or become irrelevant if organizations were to take the Wikileaks/Amazon lessons seriously).
While there will almost certainly remain a demand for AWS-type clouds, organizations should (and will) be looking closely in the coming weeks at realizing the benefits of cloud computing while eliminating another single point of failure.