
Cloud computing has brought along the promise of easy-to-scale-and-yet-affordable computer clusters. There are various clouds out there that provide Infrastructure as a Service, such as Amazon EC2, Google App Engine, Mosso, and the newcomer Force.com Sites to name a few. I personally have experience as a developer only with Amazon EC2, and I am a devoted fan and user of the entire AWS stack. Nonetheless, I believe that what I have to say here is relevant to all other platforms.
While the cloud and IaaS model have indeed many significant advantages over traditional physical hosting, there is one major annoyance still to overcome in this space, and that is: your virtual host is still connected to a physical machine. And that machine is non-redundant, it doesn’t have any hot backup, and there’s no way to transparently and hassle-free fail over from it once its malfunctioning. And this is why, from time to time I get this email from Amazon:
Hello,
We have noticed that one or more of your instances are running on a host degraded due to hardware failure.
i-XXXXXXXX
The host needs to undergo maintenance and will be taken down at XX:XX GMT on XXXX-XX-XX. Your instances will be terminated at this point.
The risk of your instances failing is increased at this point. We cannot determine the health of any applications running on the instances. We recommend that you launch replacement instances and start migrating to them.
Feel free to terminate the instances with the ec2-terminate-instance API when you are done with them.
Let us know if you have any questions.
Sincerely,
The Amazon EC2 Team
At this stage, this is one of the greatest shortcomings of EC2 from my point of view. As a customer of EC2, I don’t want to care if a host has hardware failure. Why can’t my instance just be mirrored somewhere else, consistent hot-backup style, and upon failure of host hardware be transparently switched to the backup host? I don’t care paying the extra buck for this service.
In my vision, in a true IaaS cloud there is no connection between the virtual machine and the physical host. The virtual machine is truly floating in the cloud, unbound to the physical realm by means of some consistent mirroring across physical hosts.
And you might be thinking “you can implement this on your own on the existing infrastucture that EC2 offers”, and “you should be prepared for any instance going poof”. And you are correct, at the current offering of EC2, this is the case. You always have to be prepared for an instance failure (in the last month, I had 2 physical hosts failure out of about 20, that’s about a monthly 10% (!!) ), and you always have to build your architecture so that a single host failure can fail over gracefully. But were my vision a reality, I wouldn’t have to worry about these things, and wouldn’t have to spend time and money on the overhead that they incur.
I am not certain that this is the situation in the other clouds, but if it is not, it might come with the price of less flexibility, which is a major part of EC2 on which I am not willing to give up. If that flexibility can be maintained, I would love to see my vision become a reality on EC2.
