Web Admin Blog Real Web Admins. Real World Experience.

16Apr/103

Amazon Web Services – Convert To/From VMs?

In the recent Amazon AWS Newsletter, they asked the following:

Some customers have asked us about ways to easily convert virtual machines from VMware vSphere, Citrix Xen Server, and Microsoft Hyper-V to Amazon EC2 instances - and vice versa. If this is something that you're interested in, we would like to hear from you. Please send an email to aws-vm@amazon.com describing your needs and use case.

I'll share my reply here for comment!

This is a killer feature that allows a number of important activities.

1.  Product VMs.  Many suppliers are starting to provide third-party products in the form of VMs instead of software to ease install complexity, or in an attempt to move from a hardware appliance approach to a more-software approach.  This pretty much prevents their use in EC2.  <cue sad music>  As opposed to "Hey, if you can VM-ize your stuff then you're pretty close to being able to offer it as an Amazon AMI or even SaaS offering."  <schwing!>

2.  Leveraging VM Investments.  For any organization that already has a VM infrastructure, it allows for reduction of cost and complexity to be able to manage images in the same way.  It also allows for the much promised but under-delivered "cloud bursting" theory where you can run the same systems locally and use Amazon for excess capacity.  In the current scheme I could make some AMIs "mostly" like my local VMs - but "close" is not good enough to use in production.

3.  Local testing.  I'd love to be able to bring my AMIs "down to me" for rapid redeploy.  I often find myself having to transfer 2.5 gigs of software up to the cloud, install it, find a problem, have our devs fix it and cut another release, transfer it up again (2 hour wait time again, plus paying $$ for the transfer)...

4.  Local troubleshooting. We get an app installed up in the cloud and it's not acting quite right and we need to instrument it somehow to debug.  This process is much easier on a local LAN with the developers' PCs with all their stuff installed.

5.  Local development. A lot of our development exercises the Amazon APIs.  This is one area where Azure has a distinct advantage and can be a threat; in Visual Studio there is a "local Azure fabric" and a dev can write their app and have it running "in Azure" but on their machine, and then when they're ready deploy it up.  This is slightly more than VM consumption, it's VMs plus Eucalyptus or similar porting of the Amazon API to the client side, but it's a killer feature.

Xen or VMWare would be fine - frankly this would be big enough for us I'd change virtualization solutions to the one that worked with EC2.

I just asked one of our developers for his take on value for being able to transition between VMs and EC2 to include in this email, and his response is "Well, it's just a no-brainer, right?"  Right.

23Mar/1010

Amazon EC2 EBS Instances and Ephemeral Storage

Here's a couple tidbits I've gleaned that are useful.

When  you start an "instance-store" Amazon EC2 instance, you get a certain amount of ephemeral storage allocated and mounted automatically.  The amount of space varies by instance size and is defined here.  The storage location and format also varies by instance size and is defined here.

The upshot is that if you start an "instance-store" small Linux EC2 instance, it automagically has a free 150 GB /mnt disk and a 1 GB swap partition up and runnin' for ya.  (mount points vary by image, but that's where they are in the Amazon Fedora starter.)

[root@domU-12-31-39-00-B2-01 ~]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             10321208   1636668   8160252  17% /
/dev/sda2            153899044    192072 145889348   1% /mnt
none                    873828         0    873828   0% /dev/shm
[root@domU-12-31-39-00-B2-01 ~]# free
total       used       free     shared    buffers     cached
Mem:       1747660      84560    1663100          0       4552      37356
-/+ buffers/cache:      42652    1705008
Swap:       917496          0     917496

But, you say, I am not old or insane!  I use EBS-backed images, just as God intended.  Well, that's a good point.  But when you pull up an EBS image, these ephemeral disk areas are not available to you.  The good news is, that's just by default.

The ephemeral storage is still available and can be used (for free!) by an EBS-backed image.  You just have to set the block devices up either explicitly when you run the instance or bake them into the image.

Runtime:

You refer to the ephemeral chunks as "ephemeral0", "ephemeral1", etc. - they don't tell you explicitly which is which but basically you just count up based on your instance type (review the doc).  For a small image, it has an ephemeral0 (ext3, 15 GB) and an ephemeral1 (swap, 1 GB).  To add them to an EBS instance and mount them in the "normal" places, you do:

ec2-run-instances <ami id> -k <your key> --block-device-mapping '/dev/sda2=ephemeral0'
--block-device-mapping '/dev/sda3=ephemeral1'

On the instance you have to mount them - add these to /etc/fstab and mount -a or do whatever else it is you like to do:

/dev/sda3                 swap                    swap    defaults 0 0
/dev/sda2                 /mnt                    ext3    defaults 0 0

And if you want to turn the swap on immediately, "swapon /dev/sda3".

Image:

You can also bake them into an image.  Add a fstab like the one above and when you create the image, do it like this, using the exact same --block-device-mapping flag:

ec2-register -n <ami id> -d "AMI Description" --block-device-mapping  /dev/sda2=ephemeral0
--block-device-mapping '/dev/sda3=ephemeral1' --snapshot your-snapname --architecture i386
--kernel<aki id>  --ramdisk <ari id>

Ta da. Free storage that doesn't persist.  Very useful as /tmp space.  Opinion is split among the Linuxerati about whether you want swap space nowadays or not; some people say some mix of  "if you're using more than 1.8 GB of RAM you're doing it wrong" and "swapping is horrid, just let bad procs die due to lack of memory and fix them."  YMMV.

Ephemeral EBS?

As another helpful tip, let's say you're adding an EBS to an image that you don't want to be persistent when the instance dies.  By default, all EBSes are persistent and stick around muddying up your account till you clean them up.   If you don't want certain EBS-backed drives to persist, what you do is of the form:

ec2-modify-instance-attribute --block-device-mapping "/dev/sdb=vol-f64c8e9f:true" i-e2a0b08a

Where 'true' means "yes, please, delete me when I'm done."  This command throws a stack trace to the tune of

Unexpected error: java.lang.ClassCastException: com.amazon.aes.webservices.client.InstanceBlockDeviceMappingDescription
cannot be cast to com.amazon.aes.webservices.client.InstanceBlockDeviceMappingResponseDescription

But it works, that's just a lame API tools bug.

5Mar/106

Microsoft Azure for Dummies – or for Smarties?

What Is Microsoft Azure?

I'm going to attempt to explain Microsoft Azure in "normal Web person" language.  Like many of you, I am more familiar with Linux/open source type solutions, and like many of you, my first forays into cloud computing have been with Amazon Web Services.  It can often be hard for people not steeped in Redmondese to understand exactly what the heck they're talking about when Microsoft people try to explain their offerings.  (I remember a time some years ago I was trying to get a guy to explain some new Microsoft data access thing with the usual three letter acronym name.  I asked, "Is it a library?  A language?  A protocol?  A daemon?  Branding?  What exactly is this thing you're trying to get me to uptake?"  The reply was invariably "It's an innovative new way to access data!"  Sigh.  I never did get an answer and concluded "Never mind.")

Microsoft has released their new cloud offering, Azure.  Our company is a close Microsoft partner since we use a lot of their technologies in developing our company's desktop software products, so as "cloud guy" I've gotten some in depth briefings and even went to PDC this year to learn more (some of my friends who have known me over the course of my 15 years of UNIX administration were horrified).  "Cloud computing" is an overloaded enough term that it's not highly descriptive and it took a while to cut through the explanations to understand what Azure really is.  Let me break it down for you and explain the deal.

Point of Comparison: Amazon (IaaS)

In Amazon EC2, as hopefully everyone knows by now, you are basically given entire dynamically-provisioned, hourly-billed virtual machines that you load OSes on and install software and all that.  "Like servers, but somewhere out in the ether."  Those kinds of cloud offerings (e.g. Amazon, Rackspace, most of them really) are called Infrastructure As A Service (IaaS).  You're responsible for everything you normally would be, except for the data center work.  Azure is not an IaaS offering but still bears a lot of similarities to Amazon; I'll get into details later.

Point of Comparison: Google App Engine (PaaS)

Take Google's App Engine as another point of comparison.  There, you just upload your Python or Java application to their portal and "it runs on the Web."  You don't have access to the server or OS or disk or anything.  And it "magically" scales for you.  This approach is called Platform as a Service (PaaS).   They provide the full platform stack, you only provide the end application.  On the one hand, you don't have to mess with OS level stuff - if you are just a Java programmer, you don't have to know a single UNIX (or Windows) command to transition your app from "But it works in Eclipse!" to running on a Web server on the Internet.  On the other hand, that comes with a lot of limitations that the PaaS providers have to establish to make everything play together nicely.  One of our early App Engine experiences was sad - one of our developers wrote a Java app that used a free XML library to parse some XML.  Well, that library had functionality in it (that we weren't using) that could write XML to disk.  You can't write to disk in App Engine, so its response was to disallow the entire library.  The app didn't work and had to be heavily rewritten.  So it's pretty good for code that you are writing EVERY SINGLE LINE OF YOURSELF.  Azure isn't quite as restrictive as App Engine, but it has some of that flavor.

Azure's Model

Windows Azure falls between the two.  First of all, Azure is a real "hosted cloud" like Amazon Web Services, like most of us really think about when we think cloud computing; it's not one of these on premise things that companies are branding as "cloud" just for kicks. That's important to say because it seems like nowadays the larger the company, the more they are deliberately diluting the term "cloud" to stick their products under its aegis.  Microsoft isn't doing that, this is a "cloud offering" in the classical (where classical means 2008, I guess) sense.

However, in a number of important ways it's not like Amazon.  I'd definitely classify it as a PaaS offering.  You upload your code to "Roles" which are basically containers that run your application in a Windows 2008(ish) environment.  (There are two types - a "Web role" has a stripped down IIS provided on it, a "Worker role" doesn't - the only real difference between the two.)  You do not have raw OS access, and cannot do things like write to the registry.  But, it is less restrictive than App Engine.  You can bundle up other stuff to run in Azure - even run Java apps using Apache Tomcat.  You have to be able to install whatever you want to run "xcopy only" - in other words, no fancy installers, it needs to be something you could just copy the files to a Windows PC, without administrative privilege, and run a command from the command line and have it work.  Luckily, Tomcat/Java fits that description. They have helper packs to facilitate doing this with Tomcat, memcached, and Apache/PHP/MediaWiki.  At PDC they demoed Domino's Pizza running their Java order app on it and a WordPress blog running on it.  So it's not only for .NET programmers.  Managed code is easier to deploy, but you can deploy and run about anything that fits the "copy and run command line" model.

I find this approach a little ironic actually.  It's been a lot easier for us to get the Java and open source (well, the ones with Windows ports) parts of our infrastructure running on Azure than Windows parts!  Everybody provides Windows stuff with an installer, of course, and you can't run installers on Azure.  Anyway, in its core computing model it's like Google App Engine - it's more flexible than that (g00d) but it doesn't do automatic scaling (bad).  If it did autoscaling I'd be willing to say "It's better than App Engine in every way."

In other ways, it's a lot like Amazon.  They offer a variety of storage options - blobs (like S3), tables (like mySQL), queues (like SQS), drives (like EBS).  They have an integral CDN.  They do hourly billing.  Pricing is pretty similar to Amazon - it's hard to totally equate apples to apples, but Azure compute is $0.12/hr and an Amazon small Windows image compute is $0.12/hr (Coincidence?  I think not.).  And you have to figure out scaling and provisioning yourself on Amazon too - or pay a lot of scratch to one of the provisioning companies like RightScale.

What's Unique and Different

Well, the largest thing that I've already mentioned is the PaaS approach.  If you need OS level access, you're out of luck;  if you don't want to have to mess with OS management, you're in luck!  So to the first order of magnitude, you can think of Azure as "like Amazon Web Services, but the compute uses more of a Google App Engine model."

But wait, there's more!

One of the biggest things that Azure brings to the table is that, using Visual Studio, you can run a local Azure "fabric" on your PC, which means you can develop, test, and run cloud apps locally without having to upload to the cloud and incur usage charges.  This is HUGE.  One of the biggest pains about programming for Amazon, for instance, is that if you want to exercise any of their APIs, you have to do it "up there."  Also, you can't move images back and forth between Amazon and on premise.  Now, there are efforts like EUCALYPTUS that try to overcome some of this problem but in the end you pretty much just have to throw in the towel and do all dev and test up in the cloud.  Amazon and Eclipse (and maybe Xen) - get together and make it happen!!!!

Here's something else interesting.  In a move that seems more like a decision from a typical cranky cult-of-personality open source project, they have decided that proper Web apps need to be asynchronous and message-driven, and by God that's what you're going to do.  Their load balancers won't do sticky sessions (only round robin) and time out all connections between all tiers after 60 seconds without exception.  If you need more than that, tough - rewrite your app to use a multi-tier message queue/event listener model.  Now on the one hand, it's hard for me to disagree with that - I've been sweating our developers, telling them that's the correct best-practice model for scalability on the Web.  But again you're faced with the "Well what if I'm using some preexisting software and that's not how it's architected?" problem.  This is the typical PaaS pattern of "it's great, if you're writing every line of code yourself."

In many ways, Azure is meant to be very developer friendly.  In a lot of ways that's good.  As a system admin, however, I wince every time they go on about "You can deploy your app to Azure just by right clicking in Visual Studio!!!"  Of course, that's not how anyone with a responsibly controlled production environment would do it, but it certainly does make for fast easy adoption in development.   The curve for a developer who is "just" a C++/Java/.NET/whatever wrangler to get up and going on an IaaS solution like Amazon is pretty large comparatively; here, it's "go sign up for an account and then click to deploy from your IDE, and voila it's running on the Intertubes."  So it's a qualified good - it puts more pressure on you as an ops person to go get the developers to understand why they need to utilize your services.  (In a traditional server environment, they have to go through you to get their code deployed.)  Often, for good or ill, we use the release process as a touchstone to also engage developers on other aspects of their code that need to be systems engineered better.

Now, that's my view of the major differences.  I think the usual Azure sales pitch would say something different - I've forgotten two of their huge differentiators, their service bus and access control components.  They are branded under the name "AppFabric," which as usual is a name Microsoft is also using for something else completely different (a new true app server for Windows Server, including projects formerly code named Dublin and Velocity - think of it as a real WebLogic/WebSphere type app server plus memcache.)

Their service bus is an ESB.  As alluded to above, you're going to want to use it to do messaging.   You can also use Azure Queues, which is a little confusing because the ESB is also a message queue - I'm not clear on their intended differentiation really.  You can of course just load up an ESB yourself in any other IaaS cloud solution too, so if you really want one you could do e.g. Apache ServiceMix hosted on Amazon.  But, they are managing this one for you which is a plus.  You will need to use it to do many of the common things you'd want to do.

Their access control - is a mess.  Sorry, Microsoft guys.  The whole rest of the thing, I've managed to cut through the "Microsoft acronyms versus the rest of the world's terms and definitions" factor, but not here.   "You see, you use ACS's WIF STS to generate a SWT," says our Microsoft rep with a straight face.   They seem to be excited that it will use people's Microsoft Live IDs, so if you want people to have logins to your site and you don't want to manage any of that, it is probably nice.  It takes SAML tokens too, I think, though I'm not sure if the caveats around that end up equating to "Well, not really."  Anyway, their explanations have been incoherent so far and I'm not smelling anything I'm really interested in behind it.  But there's nothing to prevent you from just using LDAP and your own Internet SSO/federation solution.  I don't count this against Microsoft because no one else provides anything like this, so even if I ignore the Azure one it doesn't put it behind any other solution.

The Future

Microsoft has said they plan to add on some kind of VM/IaaS offering eventually because of the demand.  For us, the PaaS approach is a bit of a drawback - we want to do all kinds of things like "virus scan uploaded files," "run a good load balancer," "run an LDAP server", and other things that basically require more full OS access.  I think we may have an LDAP direction with the all-Java OpenDS, but it's a pain point in general.

I think a lot of their decisions that are a short term pain in the ass (no installs, no synchronous) are actually good in the long term.  If all developers knew how to develop async and did it by default, and if all software vendors, even Windows based ones, provided their product in a form that could just be "copy and run without admin privs" to install, the world would be a better place.  That's interesting in that "Sure it's hard to use now but it'll make the world better eventually" is usually heard from the other side of the aisle.

Conclusion

Azure's a pretty legit offering!  And I'm very impressed by their velocity.  I think it's fair to say that overall Azure isn't quite as good as Amazon except for specific use cases (you're writing it all in .NET by hand in Visual Studio) - but no one else is as good as Amazon either (believe me, I evaluated them) and Amazon has years of head start; Azure is brand new but already at about 80%! That puts them into the top 5 out of the gate.

Without an IaaS component, you still can't do everything under the sun in Azure.  But if you're not depending on much in the way of big third party software chunks, it's feasible; if you're doing .NET programming, it's very compelling.

Do note that I haven't focused too much on the attributes and limitations of cloud computing in general here - that's another topic - this article is meant to compare and contrast Azure to other cloud offerings so that people can understand its architecture.

I hope that was clear.  Feel free and ask questions in the comments and I'll try to clarify!

1Jul/090

Velocity 2009 – Hadoop Operations: Managing Big Data Clusters

Hadoop Operations: Managaing Big Data Clusters (see link on that page for preso) was given by Jeff Hammerbacher of Cloudera.

Other good references -
book: "Hadoop: The Definitive Guide"
preso: hadoop cluster management from USENIX 2009

Hadoop is an Apache project inspired by Google's infrastructure; it's software for programming warehouse-scale computers.

It has recently been split into three main subprojects - HDFS, MapReduce, and Hadoop Common - and sports an ecosystem of various smaller subprojects (hive, etc.).

Usually a hadoop cluster is a mess of stock 1 RU servers with 4x1TB SATA disks in them.  "I like my servers like I like my women - cheap and dirty," Jeff did not say.

HDFS:

  • Pools servers into a single hierarchical namespace
  • It's designed for large files, written once/read many times
  • It does checksumming, replication, compression
  • Access is from from Java, C, command line, etc.  Not usually mounted at the OS level.

MapReduce:

  • Is a fault tolerant data layer and API for parallel data processing
  • Has a key/value pair model
  • Access is via Java, C++, streaming (for scripts), SQL (Hive), etc
  • Pushes work out to the data

Subprojects:

  • Avro (serialization)
  • HBase (like Google BigTable)
  • Hive (SQL interface)
  • Pig (language for dataflow programming)
  • zookeeper (coordination for distrib. systems)

Facebook used scribe (log aggregation tool) to pull a big wad of info into hadoop, published it out to mysql for user dash, to oracle rac for internal...
Yahoo! uses it too.

Sample projects hadoop would be good for - log/message warehouse, database archival store, search team projects (autocomplete), targeted web crawls...
As boxes you can use unused desktops, retired db servers, amazon ec2...

Tools they use to make hadoop include subversion/jira/ant/ivy/junit/hudson/javadoc/forrest
It uses an Apache 2.0 license

Good configs for hadoop:

  • use 7200 rpm sata, ecc ram, 1U servers
  • use linux, ext3 or maybe xfs filesystem, with noatime
  • JBOD disk config, no raid
  • java6_14+

To manage it -

unix utes: sar, iostat, iftop, vmstat, nfsstat, strace, dmesg, friends

java utes: jps, jstack, jconsole
Get the rpm!  www.cloudera.com/hadoop

config: my.cloudera.com
modes - standalong, pseudo-distrib, distrib
"It's nice to use dsh, cfengine/puppet/bcfg2/chef for config managment across a cluster; maybe use scribe for centralized logging"

I love hearing what tools people are using, that's mainly how I find out about new ones!

Common hadoop problems:

  • "It's almost always DNS" - use hostnames
  • open ports
  • distrib ssh keys (expect)
  • write permissions
  • make sure you're using all the disks
  • don't share NFS mounts for large clusters
  • set JAVA_HOME to new jvm (stick to sun's)

HDFS In Depth

1.  NameNode (master)
VERSION file shows data structs, filesystem image (in memory) and edit log (persisted) - if they change, painful upgrade

2.  Secondary NameNode (aka checkpoint node) - checkpoints the FS image and then truncates edit log, usually run on a sep node
New backup node in .21 removes need for NFS mount write for HA

3.  DataNode (workers)
stores data in local fs
stored data into blk_<id> files, round robins through dirs
heartbeat to namenode
raw socket to serve to client

4.  Client (Java HDFS lib)
other stuff (libhdfs) more unstable

hdfs operator utilities

  • safe mode - when it starts up
  • fsck - hadoop version
  • dfsadmin
  • block scanner - runs every 3 wks, has web interface
  • balancer - examines ratio of used to total capacity across the cluster
  • har (like tar) archive - bunch up smaller files
  • distcp - parallel copy utility (uses mapreduce) for big loads
  • quotas

has users, groups, permissions - including x but there is no execution, but used for dirs
hadoop has some access trust issues - used through gateway cluster or in trusted env
audit logs - turn on in log4j.properties

has loads of Web UIs - on namenode go to /metrics, /logLevel, /stacks
non-hdfs access - HDFS proxy to http, or thriftfs
has trash (.Trash in home dir) - turn it on

includes benchmarks - testdfsio, nnbench

Common HDFS problems

  • disk capacity, esp due to log file sizes - crank up reserved space
  • slow but not dead disks and flapping NICS to slow mode
  • checkpointing and backing up metadata - monitor that it happens hourly
  • losing write pipeline for long lived writes - redo every hour is recommended
  • upgrades
  • many small files

MapReduce

use Fair Share or Capacity scheduler
distributed cache
jobcontrol for ordering

Monitoring - They use ganglia, jconsole, nagios and canary jobs for functionality

Question - how much admin resource would you need for hadoop?  Answer - Facebook ops team had 20% of 2 guys hadooping, estimate you can use 1 person/100 nodes

He also notes that this preso and maybe more are on slideshare under "jhammerb."

I thought this presentation was very complete and bad ass, and I may have some use cases that hadoop would be good for coming up!

25Jun/091

Cloud Computing Panel Discussion

Next up at the Cloud Computing and Virtualization Security half-day seminar was a Cloud Computing Panel moderated by Rich Mogull (Analyst/CEO at Securosis) with Josh Zachary (Rackspace), Jim Rymarczk (IBM), and Phil Agcaoili (Dell) participating in the panel.  My notes from the panel discussion are below:

Phil: Little difference between outsources of the past and today's Cloud Computing.  All of that stuff is sitting outside of your environment and we've been evolving toward that for a long time.

Rich: My impression is that there are benefits to outsourced hosting, but there are clearly areas that make sense and areas that don't.  This is fundamentally different from shared computing resources.  Very different applications for this.  Complexity goes up very quickly very quickly for security controls.  Where do you see the most value today?  Where do people need to be most cautious?

Jim: Internal virtualization is almost necessary, but it impacts almost every IT process.  Technology is still evolving and is far from advanced state.  Be pragmatic and find particular applications with a good ROI.

Josh: Understand what you are putting into a cloud environment.  Have a good understanding of what a provider can offer you in terms of sensitive data.  Otherwise you're putting yourself in a very bad situation.  A lot of promise.  Great for social networking and web development.  Not appropriate with enterprises with large amounts of IP and sensitive data.

Jim: We'll get there in 4-5 years.

Phil: Let supply chain experts do it for you and then interact with them.  Access their enviornment from anywhere.  Use a secure URL with a federated identity.  Your business will come back to you and say "We need to do this" and IT will be unable to assist them.  Use it as an opportunity to mobilize compliance and InfoSec and get involved.  It's going to come to use and we're just going to have to deal with it.  There's a long line of people with a "right to audit".  Don't think that someone is doing the right thing in this space, you have to ask.

Audience: What is the most likely channel for standards?

Phil: Cloud Security Alliance is a step in the right direction.  Want to come up with PCI DSS like checklists.  CSA is working with IEEE and NIST to work along with them.  Goal is to be able to feed the standards process, not become a standards body.

Rich: The market is anti-standards based.  If we get standardized, then all of the providers are only competing based on cost.

Jim: I think it'll happen.  We will see ISO groups for standards on cloud quality.

Audience: Moving data between multiple clouds.  How do you determine who gets paid?

Jim: There are proposals for doing that.  All of the resource parameters.

Phil: Should see standards based on federated identity.  Who is doing what and where.  That's where I've seen the most movement.  There is no ISO for SaaS.  Remapping how 27001 and 27002 apply to us as a software provider.

Audience: Two things that drive standards.  The market or monopoly (BetaMax).

Rich: We will have monopolistic ones and then 3rd parties that say they use those standards.

Audience: How can you really have an objective body create standards without being completely embedded in the technology?

Jim: You create a reference standard and the market drives that.

Phil: Gravity pulls us to things that work.  Uses SAML as an example.  It's the way the internet has always worked.  The strongest will survive and the right standards will manifest themselves.

Rich: What are some of things that you're dealing with internally (as consumers and providers) and the top suggestions for people stuck in this situation?

Jim: People who don't have all of the  requirements do public clouds.  If what you want is available (salesforce.com), it may be irresistible.

Josh: Solution needs to be appropriate to the need.  Consult with your attorney to make sure you contract is in line with what you're leveraging the provider for.  It's really about what you agree to with that provider and their responsibilities.

Phil: The hurricane is coming.  You can't scream into the wind, you gotta learn to run for cover.  Find the safe spot.

Audience: What industries do you see using this?  I don't see it with healthcare.

Phil: Mostly providers for us.  Outsourcing service desks.  Government.  Large states/local.

Josh: Small and medium retail businesses.  Get products out there at a significantly reduced cost.

Jim: Lots of financial institutions looking for ways to cut costs.  Healthcare industry as well (Mayo Clinic).  Broad interest across the whole market, but especially anywhere they're under extreme cost measures.

Rich: I run a small business that picked an elastic provider that couldn't pay for a full virtual hosting provider.  Doing shared hosting right now, but capable of growing to a virtual private server.  Have redundancy.  Able to go full-colocation if they need it.  Able to support growth, but start with the same instance to get there.

Audience: How does 3rd party transparency factor into financial uses?

Jim: Almost exclusively private clouds.  There are use cases playing out right now that will be repeatable patterns.  Use cases.

Phil: When the volume isn't there, offload to someone like Rackspace and they'll help you to grow.

Audience: Are there guidelines to contracts to make sure information doesn't just get outsourced to yet another party?

Phil: Your largest partners/vendors steal their contracts.  Use them as templates.

Audience: What recourse do you have that an audit is used to verify that security is not an issue?

Rich: Contracts.

Phil: Third party assessment (ie. the right to audit).  It's in our interest to verify they are secure.  It's a trend and we now have a long list of people looking to audit against us as a provider.  Hoping for an ISO to come up truly for the cloud.

Audience: Is cloud computing just outsourcing?

Rich: It's more than that.  For example, companies have internal clouds that aren't outsourced at all.

Josh: Most of the time it's leveraging resources more efficiently at hopefully a reduced cost.

Audience: How do I know you're telling me the truth about the resources I'm using?  What if I'm a bad guy who wants to exploit a competitor using the cloud?

Josh: We've seen guys create botnets using stolen credit cards.  What you're billed for is in your contract.

Jim: We've had this solved for decades on mainframes.  Precious resources propagated amongst users.  There's no technical reason we're not doing it today.

Rich: It depends what type of cloud you're using.  Some will tell you.

Josh: If you're worried about someone abusing you, why are you there in the first place?

Phil: For our service desk we meter this by how many calls, by location.  Monitor servers that were accessed/patched/etc.  Different service providers will have different levels.

Audience: Seeing some core issues at the heart of this.  For businesses, an assessment of core competencies.  Can you build a better data center with the cloud?  Second issue involves risk assessment.  Can you do a technical audit?  Can you pay for it legally?  How much market presence does the vendor have?  Who has responsibility for what?  Notion of transparency of control.  Seems like it distills down to those core basics.

Jim: I agree.

Rich: Well said.

Phil: Yes, yes, yes.

Audience: How do you write a contract for failed nation states, volatility, etc?  Do we say you can't put our stuff in these countries?

Phil: This is the white elephant in the room.  How can you ensure that my data is being protected the way I'd protect it myself.  It's amazing what other people do when they get a hold of that stuff.  This is the underlying problem that we have to solve.  "Moving from a single-family home to a multi-tenant condo.  How do we build that now?

Rich: You need to be comfortable with what you're putting out there.

Audience: To what extent is the military or federal government using cloud computing?

Jim: They're interested in finding ways, but they don't talk about how they're using it.

Audience - Vern: They're doing cloud computing using an internal private cloud already.  They bill back to the appropriate agency based on use.

Phil: Government is very wary of what's going on.

25Jun/091

Introduction to Cloud Computing and Virtualizaton Security

Today the Austin ISSA and ISACA chapters held a half-day seminar on Cloud Computing and Virtualization Security.  The introduction on cloud computing was given by Vern Williams.  My notes on this topic are below:

5 Key Cloud Characteristics

  • On-demand self-service
  • Ubiquitous network access
  • Location independent resource pooling
  • Rapid elasticity
  • Pay per use

3 Cloud Delivery Models

  • Software as a Service (SaaS): Providers applications over a network
  • Platform as a Service (PaaS): Deploy customer-created apps to a cloud
  • Infrastructure as a Service (IaaS): Rent processing, storage, etc

4 Cloud Deployment Models

  • Private cloud: Enterprise owned or leased
  • Community cloud: Shared infrastructure for a specific community
  • Public cloud: Sold to the public, Mega-scale infrastructure
  • Hybrid cloud: Composition of two or more clouds
  • Two types: internal and external
  • http://csrc.nist.com/groups/SNS/cloud-computing/index.html

Common Cloud Characteristics

  • Massive scale
  • Virtualization
  • Free software
  • Autonomic computing
  • Multi-tenancy
  • Geographically distributed systems
  • Advanced security technologies
  • Service oriented software

Pros

  • Lower central processing unit (CPU) density
  • Flexible use of resources
  • Rapid deployment of new servers
  • Simplified recovery
  • Virtual network connections

Cons

  • Complexity
  • Potential impact of a single component failure
  • Hypervisor security issues
  • Keeping virtual machine (VM) images current
  • Virtual network connections

Virtualization Security Concerns

  • Protecting the virtual fabric
  • Patching off-line VM images
  • Configuration Management
  • Firewall configurations
  • Complicating Audit and Forensics
1Oct/080

Amazon Web Services S3, EC2 and other AWS services

First Speaker: VP of Amazon Web Services - Adam Selipsky

Motivation for building AWS - Scaling Amazon.com through the 90's was
really rough.  10 years of growth caused a lot of headaches.

What if you could outsource IT Infrastructure?  What would this look like?
Needs:
Storage
Compute abilities
Database
Transactions
Middleware

Core Services:
Reliability
Scalability - Lots of companies have spiky business periods
Performance - CoLo facility and other silos in the past have shown that developers do not want slowness and wont accept it
Simplicity - No learning curve or as little as possible
Cost Effective - Prices are public and pay as you go.  No hidden fees.  Capital expenses cut way down for startups

Initial Suite of services: S3, EC2, SimpleDB, FPS, DevPay, SQS, Mechanical Turk

Cloud
Computing is a buzz word and allowing infrastructure to be managed by
someone else.  Time to market is huge since you dont have to buy boxes,
CoLo hosting, bandwidth, and more.

Second Speaker:  Jinesh Varia, Evangelist of AWS
Promise to see their roadmap for the next 2 years.
Amazon has 3 business units
Amazon.com, Amazon Services for Sellers, and Amazon Web Services
Spent 2 billion on infrastructure costs already for AWS

Analogy
- Electricity generated somewhere else doesnt really add any value.
There is a certain amount of undifferentiated services.  Server
Hosting, Bandwidth, hardware, contracts, moving facilities, ... Idea to
product delay is huge.

Example of Animoto.com

They own no hardware.  None.  Serverless startup.

They went from 40 servers to 5000 in 3 days.  Facebook app.  Signed 25,000 users up every hour

Use Cases
Media Sharing and Distribution
Batch and Parallel Processing
Backup and Archive and Recovery
Search Engines
Social Netowrking Apps
Financial Applications and Simulations

What do you need?
S3, EC2, SimpleDB, FPS, DevPay, SQS, Mechanical Turk

S3
50,000 Transactions Per Second is what S3 is running right now.
99.9% Uptime

EC2
Unlimited Compute power
Scale Capacity up or down.  Linux and OpenSolaris (uggh, Solaris) are accepted
Elastic Block Store is finally here!  Yay!

SimpleDB
Not a Relational, no SQL.  But highly available and highly accessible.  Index Data...

SQS
Acts as a glue to tie all services together.  Transient Buffer?  Not sure how I feel about that.

DevPay and FPS
Developers get to use Amazon's Billing Infrastructure.  Sounds lame and sort of pyrimad schemey

Mechanical Turk
Allows
you to get people on demand.  Perfect for high-volume micro tasks.
Human Intelligence tasks.  Outsource dummy work I guess...  Not sure.

Sample Architecture
Podango

He wrote a Cloud Architecture PDF

Future Roadmap

Focus on security features and certifications
Continued focus and operational excellence
US and international expansion
Localization of technical resources
Amazon
EC2 GA and SLA - Out of Beta and SLA delivered << This is really
good for us!  Now if gmail would get out of beta after 5 years!
Windows Server Support
Additional services

Amazon Start-Up Challenge is open.  100K

aws.amazon.com/blog

Jinesh Varia, jvaria@amazon.com

Customer Testimonials
Splunk
used AWS to host a development camp and start an instance.  Email instructions and SSH keys.  Free, Open Source.  DevCamp.
Fabulatr at @Google Code  It starts up an instance gets it ready, sends email with ssh key to user
Another Use Case - Sales Engineering - POC, Joint work with Support, A place to play, Splunk Live Demo
Splunk blog and there are some videos on blog.
Put splunk in your cloud

Resources
download.splunk.com

blogs.splunk.com/thewilde  -> Inside the Cloud Video

code.google.com/p/fabulatr

Rightscale, cant use elastic fox from iPhone, you can use RightScale

OtherInbox

Launched
on Monday.  Helps users manage inbox.  Emails from OnStar, Receipts
from Apple.  OtherInbox allows me to give out different addresses.
facebook@james.otherinbox.com
Seems like a cool app.
Use Google Docs to grab information ad hoc.
They use DB's on EBS in a Master/Slave relationship for SQL, formerly on EC2 w/o EBS, now EBS is awesome.
Built on Ruby on Rails > MVC and SproutCore (JavaScript framework)

austincloudcomputing.com

MyBaby Our Baby
Share, Organize, Save all of the videos and pictures for kids
Invite friends and family to your site, they get emails about your kids when you add content
Other people can add photos of your children and pictures from other parents (at the park, babysitter, ...)
Uses S3 only

Architecture for LB

Two Front End Load Balancing Proxy Servers that hit the right app servers.
Need
to read on Scalr (Pound)  HAProxy was also recommended.  He also
mentioned that Scalr is cool, but AWS is coming out with a LB and tool
for us to use.  He said to give it some time, but they would have
something for us!
http://aws.typepad.com/aws/2008/04/scalr-.html

GoDaddy vs AWS.  GoDaddy sucks...  but under all circumstances, "you need a geek" to get this running.

You
need a Linux System Administrator under all circumstances and a lot of
people seemed miffed by this.  I dont see what the big deal is and
under the AWS scenario, you don't need all the infrastructure
(hardware) needed before and you need a lot less people than the
traditional model.  You always still need someone who knows how to work
the systems, but now you need fewer and you really need people that are
linux admins but also web admins that know traditional web services and
applications.  There will never be a magic button that just spins up
servers ready to go for your unique app, Amazon makes it easier, but
you still need a geek...  They make the world work...

Amazon has a long track record for success and there is a lot of trust from Other Inbox.

5Aug/082

Cloud Headaches?

The industry is abuzz with people who are freaked out about the outages that Amazon and other cloud vendors have had.  "Amazon S3 Crash Raises Doubts Among Cloud Customers," says InformationWeek!

This is because people are going into cloud computing with retardedly high expectations.  This year at Velocity, Interop, etc. I've seen people just totally in love with cloud computing - Amazon's specifically but in general as well.  And it's a good concept for certain applications.  However, it is a computing system just like every other computing system devised previously by man.  And it has, and will have, problems.

Whether you are using in house systems, or a SaaS vendor, or building "in the cloud," you have the same general concerns.  Am I monitoring my systems?  What is my SLA?  What is my recourse if my system is not hitting it?  What's my DR plan?

Cloud computing is also being called "PaaS," or Platform as a Service.  It's a special case of SaaS.  And if you're a company relying on it, when you contract with a SaaS vendor you get SLAs established and figure out what the remedy is if they breach it.  If you are going into a relationship where you are just paying money for a cloud VM, storage, etc. and there is no enforceable SLA in the relationship, then you need to build the risk of likely and unremediable outages into your business plan.

I hate to break it to you, but the IT people working at Amazon, Google, etc. are not all that smarter than the IT people working with you.  So an unjustified faith in a SaaS or cloud vendor - "Oh, it's Amazon, I'm sure they'll never have an outage of any sort - their entire system or localized to my part - and if they do I'm sure the $100/month I'm paying them will cause them to give a damn about me" - is unreasonable on its face.

Clouds and cloud vendors are a good innovation.  But they're like every other computing innovation and vendor selling it to you.  They'll have bugs and failures.  But treating them differently is a failure on your part, not theirs.

Tagged as: , , 2 Comments
12Jun/082

Scalr project and AWS

http://code.google.com/p/scalr/

For those of us getting into amazon's Elastic Compute Cloud (ec2), this is a really cool idea.  The idea is that your load grows and a new node is ready to handle additional capacity.  Once load lessens, boxes are turned off.  Integrating this with box stats, response times, monitoring per service makes sense.

I wanted everyone to be thinking of the consumable computing model.  Pay as you go for what you use is really attractive.  No more do you have to have 10 boxes in your www cluster all day long if your spike is only during 8am to 3pm.   Now you can run the 10 boxes during those times and use less boxes during non peak times...  Pretty cool.  And cheap!