In the recent Amazon AWS Newsletter, they asked the following:
Some customers have asked us about ways to easily convert virtual machines from VMware vSphere, Citrix Xen Server, and Microsoft Hyper-V to Amazon EC2 instances - and vice versa. If this is something that you're interested in, we would like to hear from you. Please send an email to firstname.lastname@example.org describing your needs and use case.
I'll share my reply here for comment!
This is a killer feature that allows a number of important activities.
1. Product VMs. Many suppliers are starting to provide third-party products in the form of VMs instead of software to ease install complexity, or in an attempt to move from a hardware appliance approach to a more-software approach. This pretty much prevents their use in EC2. <cue sad music> As opposed to "Hey, if you can VM-ize your stuff then you're pretty close to being able to offer it as an Amazon AMI or even SaaS offering." <schwing!>
2. Leveraging VM Investments. For any organization that already has a VM infrastructure, it allows for reduction of cost and complexity to be able to manage images in the same way. It also allows for the much promised but under-delivered "cloud bursting" theory where you can run the same systems locally and use Amazon for excess capacity. In the current scheme I could make some AMIs "mostly" like my local VMs - but "close" is not good enough to use in production.
3. Local testing. I'd love to be able to bring my AMIs "down to me" for rapid redeploy. I often find myself having to transfer 2.5 gigs of software up to the cloud, install it, find a problem, have our devs fix it and cut another release, transfer it up again (2 hour wait time again, plus paying $$ for the transfer)...
4. Local troubleshooting. We get an app installed up in the cloud and it's not acting quite right and we need to instrument it somehow to debug. This process is much easier on a local LAN with the developers' PCs with all their stuff installed.
5. Local development. A lot of our development exercises the Amazon APIs. This is one area where Azure has a distinct advantage and can be a threat; in Visual Studio there is a "local Azure fabric" and a dev can write their app and have it running "in Azure" but on their machine, and then when they're ready deploy it up. This is slightly more than VM consumption, it's VMs plus Eucalyptus or similar porting of the Amazon API to the client side, but it's a killer feature.
Xen or VMWare would be fine - frankly this would be big enough for us I'd change virtualization solutions to the one that worked with EC2.
I just asked one of our developers for his take on value for being able to transition between VMs and EC2 to include in this email, and his response is "Well, it's just a no-brainer, right?" Right.
Here's a couple tidbits I've gleaned that are useful.
When you start an "instance-store" Amazon EC2 instance, you get a certain amount of ephemeral storage allocated and mounted automatically. The amount of space varies by instance size and is defined here. The storage location and format also varies by instance size and is defined here.
The upshot is that if you start an "instance-store" small Linux EC2 instance, it automagically has a free 150 GB /mnt disk and a 1 GB swap partition up and runnin' for ya. (mount points vary by image, but that's where they are in the Amazon Fedora starter.)
[root@domU-12-31-39-00-B2-01 ~]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 10321208 1636668 8160252 17% / /dev/sda2 153899044 192072 145889348 1% /mnt none 873828 0 873828 0% /dev/shm
[root@domU-12-31-39-00-B2-01 ~]# free total used free shared buffers cached Mem: 1747660 84560 1663100 0 4552 37356 -/+ buffers/cache: 42652 1705008 Swap: 917496 0 917496
But, you say, I am not old or insane! I use EBS-backed images, just as God intended. Well, that's a good point. But when you pull up an EBS image, these ephemeral disk areas are not available to you. The good news is, that's just by default.
The ephemeral storage is still available and can be used (for free!) by an EBS-backed image. You just have to set the block devices up either explicitly when you run the instance or bake them into the image.
You refer to the ephemeral chunks as "ephemeral0", "ephemeral1", etc. - they don't tell you explicitly which is which but basically you just count up based on your instance type (review the doc). For a small image, it has an ephemeral0 (ext3, 15 GB) and an ephemeral1 (swap, 1 GB). To add them to an EBS instance and mount them in the "normal" places, you do:
ec2-run-instances <ami id> -k <your key> --block-device-mapping '/dev/sda2=ephemeral0' --block-device-mapping '/dev/sda3=ephemeral1'
On the instance you have to mount them - add these to /etc/fstab and mount -a or do whatever else it is you like to do:
/dev/sda3 swap swap defaults 0 0 /dev/sda2 /mnt ext3 defaults 0 0
And if you want to turn the swap on immediately, "swapon /dev/sda3".
You can also bake them into an image. Add a fstab like the one above and when you create the image, do it like this, using the exact same --block-device-mapping flag:
ec2-register -n <ami id> -d "AMI Description" --block-device-mapping /dev/sda2=ephemeral0 --block-device-mapping '/dev/sda3=ephemeral1' --snapshot your-snapname --architecture i386 --kernel<aki id> --ramdisk <ari id>
Ta da. Free storage that doesn't persist. Very useful as /tmp space. Opinion is split among the Linuxerati about whether you want swap space nowadays or not; some people say some mix of "if you're using more than 1.8 GB of RAM you're doing it wrong" and "swapping is horrid, just let bad procs die due to lack of memory and fix them." YMMV.
As another helpful tip, let's say you're adding an EBS to an image that you don't want to be persistent when the instance dies. By default, all EBSes are persistent and stick around muddying up your account till you clean them up. If you don't want certain EBS-backed drives to persist, what you do is of the form:
ec2-modify-instance-attribute --block-device-mapping "/dev/sdb=vol-f64c8e9f:true" i-e2a0b08a
Where 'true' means "yes, please, delete me when I'm done." This command throws a stack trace to the tune of
Unexpected error: java.lang.ClassCastException: com.amazon.aes.webservices.client.InstanceBlockDeviceMappingDescription cannot be cast to com.amazon.aes.webservices.client.InstanceBlockDeviceMappingResponseDescription
But it works, that's just a lame API tools bug.
What Is Microsoft Azure?
I'm going to attempt to explain Microsoft Azure in "normal Web person" language. Like many of you, I am more familiar with Linux/open source type solutions, and like many of you, my first forays into cloud computing have been with Amazon Web Services. It can often be hard for people not steeped in Redmondese to understand exactly what the heck they're talking about when Microsoft people try to explain their offerings. (I remember a time some years ago I was trying to get a guy to explain some new Microsoft data access thing with the usual three letter acronym name. I asked, "Is it a library? A language? A protocol? A daemon? Branding? What exactly is this thing you're trying to get me to uptake?" The reply was invariably "It's an innovative new way to access data!" Sigh. I never did get an answer and concluded "Never mind.")
Microsoft has released their new cloud offering, Azure. Our company is a close Microsoft partner since we use a lot of their technologies in developing our company's desktop software products, so as "cloud guy" I've gotten some in depth briefings and even went to PDC this year to learn more (some of my friends who have known me over the course of my 15 years of UNIX administration were horrified). "Cloud computing" is an overloaded enough term that it's not highly descriptive and it took a while to cut through the explanations to understand what Azure really is. Let me break it down for you and explain the deal.
Point of Comparison: Amazon (IaaS)
In Amazon EC2, as hopefully everyone knows by now, you are basically given entire dynamically-provisioned, hourly-billed virtual machines that you load OSes on and install software and all that. "Like servers, but somewhere out in the ether." Those kinds of cloud offerings (e.g. Amazon, Rackspace, most of them really) are called Infrastructure As A Service (IaaS). You're responsible for everything you normally would be, except for the data center work. Azure is not an IaaS offering but still bears a lot of similarities to Amazon; I'll get into details later.
Point of Comparison: Google App Engine (PaaS)
Take Google's App Engine as another point of comparison. There, you just upload your Python or Java application to their portal and "it runs on the Web." You don't have access to the server or OS or disk or anything. And it "magically" scales for you. This approach is called Platform as a Service (PaaS). They provide the full platform stack, you only provide the end application. On the one hand, you don't have to mess with OS level stuff - if you are just a Java programmer, you don't have to know a single UNIX (or Windows) command to transition your app from "But it works in Eclipse!" to running on a Web server on the Internet. On the other hand, that comes with a lot of limitations that the PaaS providers have to establish to make everything play together nicely. One of our early App Engine experiences was sad - one of our developers wrote a Java app that used a free XML library to parse some XML. Well, that library had functionality in it (that we weren't using) that could write XML to disk. You can't write to disk in App Engine, so its response was to disallow the entire library. The app didn't work and had to be heavily rewritten. So it's pretty good for code that you are writing EVERY SINGLE LINE OF YOURSELF. Azure isn't quite as restrictive as App Engine, but it has some of that flavor.
Windows Azure falls between the two. First of all, Azure is a real "hosted cloud" like Amazon Web Services, like most of us really think about when we think cloud computing; it's not one of these on premise things that companies are branding as "cloud" just for kicks. That's important to say because it seems like nowadays the larger the company, the more they are deliberately diluting the term "cloud" to stick their products under its aegis. Microsoft isn't doing that, this is a "cloud offering" in the classical (where classical means 2008, I guess) sense.
However, in a number of important ways it's not like Amazon. I'd definitely classify it as a PaaS offering. You upload your code to "Roles" which are basically containers that run your application in a Windows 2008(ish) environment. (There are two types - a "Web role" has a stripped down IIS provided on it, a "Worker role" doesn't - the only real difference between the two.) You do not have raw OS access, and cannot do things like write to the registry. But, it is less restrictive than App Engine. You can bundle up other stuff to run in Azure - even run Java apps using Apache Tomcat. You have to be able to install whatever you want to run "xcopy only" - in other words, no fancy installers, it needs to be something you could just copy the files to a Windows PC, without administrative privilege, and run a command from the command line and have it work. Luckily, Tomcat/Java fits that description. They have helper packs to facilitate doing this with Tomcat, memcached, and Apache/PHP/MediaWiki. At PDC they demoed Domino's Pizza running their Java order app on it and a WordPress blog running on it. So it's not only for .NET programmers. Managed code is easier to deploy, but you can deploy and run about anything that fits the "copy and run command line" model.
I find this approach a little ironic actually. It's been a lot easier for us to get the Java and open source (well, the ones with Windows ports) parts of our infrastructure running on Azure than Windows parts! Everybody provides Windows stuff with an installer, of course, and you can't run installers on Azure. Anyway, in its core computing model it's like Google App Engine - it's more flexible than that (g00d) but it doesn't do automatic scaling (bad). If it did autoscaling I'd be willing to say "It's better than App Engine in every way."
In other ways, it's a lot like Amazon. They offer a variety of storage options - blobs (like S3), tables (like mySQL), queues (like SQS), drives (like EBS). They have an integral CDN. They do hourly billing. Pricing is pretty similar to Amazon - it's hard to totally equate apples to apples, but Azure compute is $0.12/hr and an Amazon small Windows image compute is $0.12/hr (Coincidence? I think not.). And you have to figure out scaling and provisioning yourself on Amazon too - or pay a lot of scratch to one of the provisioning companies like RightScale.
What's Unique and Different
Well, the largest thing that I've already mentioned is the PaaS approach. If you need OS level access, you're out of luck; if you don't want to have to mess with OS management, you're in luck! So to the first order of magnitude, you can think of Azure as "like Amazon Web Services, but the compute uses more of a Google App Engine model."
But wait, there's more!
One of the biggest things that Azure brings to the table is that, using Visual Studio, you can run a local Azure "fabric" on your PC, which means you can develop, test, and run cloud apps locally without having to upload to the cloud and incur usage charges. This is HUGE. One of the biggest pains about programming for Amazon, for instance, is that if you want to exercise any of their APIs, you have to do it "up there." Also, you can't move images back and forth between Amazon and on premise. Now, there are efforts like EUCALYPTUS that try to overcome some of this problem but in the end you pretty much just have to throw in the towel and do all dev and test up in the cloud. Amazon and Eclipse (and maybe Xen) - get together and make it happen!!!!
Here's something else interesting. In a move that seems more like a decision from a typical cranky cult-of-personality open source project, they have decided that proper Web apps need to be asynchronous and message-driven, and by God that's what you're going to do. Their load balancers won't do sticky sessions (only round robin) and time out all connections between all tiers after 60 seconds without exception. If you need more than that, tough - rewrite your app to use a multi-tier message queue/event listener model. Now on the one hand, it's hard for me to disagree with that - I've been sweating our developers, telling them that's the correct best-practice model for scalability on the Web. But again you're faced with the "Well what if I'm using some preexisting software and that's not how it's architected?" problem. This is the typical PaaS pattern of "it's great, if you're writing every line of code yourself."
In many ways, Azure is meant to be very developer friendly. In a lot of ways that's good. As a system admin, however, I wince every time they go on about "You can deploy your app to Azure just by right clicking in Visual Studio!!!" Of course, that's not how anyone with a responsibly controlled production environment would do it, but it certainly does make for fast easy adoption in development. The curve for a developer who is "just" a C++/Java/.NET/whatever wrangler to get up and going on an IaaS solution like Amazon is pretty large comparatively; here, it's "go sign up for an account and then click to deploy from your IDE, and voila it's running on the Intertubes." So it's a qualified good - it puts more pressure on you as an ops person to go get the developers to understand why they need to utilize your services. (In a traditional server environment, they have to go through you to get their code deployed.) Often, for good or ill, we use the release process as a touchstone to also engage developers on other aspects of their code that need to be systems engineered better.
Now, that's my view of the major differences. I think the usual Azure sales pitch would say something different - I've forgotten two of their huge differentiators, their service bus and access control components. They are branded under the name "AppFabric," which as usual is a name Microsoft is also using for something else completely different (a new true app server for Windows Server, including projects formerly code named Dublin and Velocity - think of it as a real WebLogic/WebSphere type app server plus memcache.)
Their service bus is an ESB. As alluded to above, you're going to want to use it to do messaging. You can also use Azure Queues, which is a little confusing because the ESB is also a message queue - I'm not clear on their intended differentiation really. You can of course just load up an ESB yourself in any other IaaS cloud solution too, so if you really want one you could do e.g. Apache ServiceMix hosted on Amazon. But, they are managing this one for you which is a plus. You will need to use it to do many of the common things you'd want to do.
Their access control - is a mess. Sorry, Microsoft guys. The whole rest of the thing, I've managed to cut through the "Microsoft acronyms versus the rest of the world's terms and definitions" factor, but not here. "You see, you use ACS's WIF STS to generate a SWT," says our Microsoft rep with a straight face. They seem to be excited that it will use people's Microsoft Live IDs, so if you want people to have logins to your site and you don't want to manage any of that, it is probably nice. It takes SAML tokens too, I think, though I'm not sure if the caveats around that end up equating to "Well, not really." Anyway, their explanations have been incoherent so far and I'm not smelling anything I'm really interested in behind it. But there's nothing to prevent you from just using LDAP and your own Internet SSO/federation solution. I don't count this against Microsoft because no one else provides anything like this, so even if I ignore the Azure one it doesn't put it behind any other solution.
Microsoft has said they plan to add on some kind of VM/IaaS offering eventually because of the demand. For us, the PaaS approach is a bit of a drawback - we want to do all kinds of things like "virus scan uploaded files," "run a good load balancer," "run an LDAP server", and other things that basically require more full OS access. I think we may have an LDAP direction with the all-Java OpenDS, but it's a pain point in general.
I think a lot of their decisions that are a short term pain in the ass (no installs, no synchronous) are actually good in the long term. If all developers knew how to develop async and did it by default, and if all software vendors, even Windows based ones, provided their product in a form that could just be "copy and run without admin privs" to install, the world would be a better place. That's interesting in that "Sure it's hard to use now but it'll make the world better eventually" is usually heard from the other side of the aisle.
Azure's a pretty legit offering! And I'm very impressed by their velocity. I think it's fair to say that overall Azure isn't quite as good as Amazon except for specific use cases (you're writing it all in .NET by hand in Visual Studio) - but no one else is as good as Amazon either (believe me, I evaluated them) and Amazon has years of head start; Azure is brand new but already at about 80%! That puts them into the top 5 out of the gate.
Without an IaaS component, you still can't do everything under the sun in Azure. But if you're not depending on much in the way of big third party software chunks, it's feasible; if you're doing .NET programming, it's very compelling.
Do note that I haven't focused too much on the attributes and limitations of cloud computing in general here - that's another topic - this article is meant to compare and contrast Azure to other cloud offerings so that people can understand its architecture.
I hope that was clear. Feel free and ask questions in the comments and I'll try to clarify!
For those of us getting into amazon's Elastic Compute Cloud (ec2), this is a really cool idea. The idea is that your load grows and a new node is ready to handle additional capacity. Once load lessens, boxes are turned off. Integrating this with box stats, response times, monitoring per service makes sense.
I wanted everyone to be thinking of the consumable computing model. Pay as you go for what you use is really attractive. No more do you have to have 10 boxes in your www cluster all day long if your spike is only during 8am to 3pm. Now you can run the 10 boxes during those times and use less boxes during non peak times... Pretty cool. And cheap!