Web Admin Blog Real Web Admins. Real World Experience.

16Apr/103

Amazon Web Services – Convert To/From VMs?

In the recent Amazon AWS Newsletter, they asked the following:

Some customers have asked us about ways to easily convert virtual machines from VMware vSphere, Citrix Xen Server, and Microsoft Hyper-V to Amazon EC2 instances - and vice versa. If this is something that you're interested in, we would like to hear from you. Please send an email to aws-vm@amazon.com describing your needs and use case.

I'll share my reply here for comment!

This is a killer feature that allows a number of important activities.

1.  Product VMs.  Many suppliers are starting to provide third-party products in the form of VMs instead of software to ease install complexity, or in an attempt to move from a hardware appliance approach to a more-software approach.  This pretty much prevents their use in EC2.  <cue sad music>  As opposed to "Hey, if you can VM-ize your stuff then you're pretty close to being able to offer it as an Amazon AMI or even SaaS offering."  <schwing!>

2.  Leveraging VM Investments.  For any organization that already has a VM infrastructure, it allows for reduction of cost and complexity to be able to manage images in the same way.  It also allows for the much promised but under-delivered "cloud bursting" theory where you can run the same systems locally and use Amazon for excess capacity.  In the current scheme I could make some AMIs "mostly" like my local VMs - but "close" is not good enough to use in production.

3.  Local testing.  I'd love to be able to bring my AMIs "down to me" for rapid redeploy.  I often find myself having to transfer 2.5 gigs of software up to the cloud, install it, find a problem, have our devs fix it and cut another release, transfer it up again (2 hour wait time again, plus paying $$ for the transfer)...

4.  Local troubleshooting. We get an app installed up in the cloud and it's not acting quite right and we need to instrument it somehow to debug.  This process is much easier on a local LAN with the developers' PCs with all their stuff installed.

5.  Local development. A lot of our development exercises the Amazon APIs.  This is one area where Azure has a distinct advantage and can be a threat; in Visual Studio there is a "local Azure fabric" and a dev can write their app and have it running "in Azure" but on their machine, and then when they're ready deploy it up.  This is slightly more than VM consumption, it's VMs plus Eucalyptus or similar porting of the Amazon API to the client side, but it's a killer feature.

Xen or VMWare would be fine - frankly this would be big enough for us I'd change virtualization solutions to the one that worked with EC2.

I just asked one of our developers for his take on value for being able to transition between VMs and EC2 to include in this email, and his response is "Well, it's just a no-brainer, right?"  Right.

24Feb/100

A Case For Images

After speaking with Luke Kanies at OpsCamp, and reading his good and oft-quoted article "Golden Image or Foil Ball?", I was thinking pretty hard about the use of images in our new automated infrastructure.  He's pretty against them.  After careful consideration, however, I think judicious use of images is the right thing to do.

My top level thoughts on why to use images.

  1. Speed - Starting a prebuilt image is faster than reinstalling everything on an empty one.  In the world of dynamic scaling, there's a meaningful difference between a "couple minute spinup" and a "fifteen minute spinup."
  2. Reliability - The more work you are doing at runtime, the more there is to go wrong.  I bet I'm not the only person who has run the same compile and install on three allegedly identical Linux boxen and had it go wrong somehow on one of 'em.  And the more stuff you're pulling to build your image, the more failure points you have.
  3. Flexibility - Dynamically building from stem cell kinda makes sense if you're using 100% free open source and have everything automated.  What if, however, you have something that you need to install that just hasn't been scripted - or is very hard to script?  Like an install of some half-baked Windows software that doesn't have a command line installer and you don't have a tool that can do it?  In that case, you really need to do the manual install in non-realtime as part of a image build.  And of course many suppliers are providing software as images themselves nowadays.
  4. Traceability - What happens if you need to replicate a past environment?  Having the image is going to be a 100% effective solution to that, even likely to be sufficient for legal reasons.  "I keep a bunch of old software repo versions so I can mostly build a machine like it" - somewhat less so.

In the end, it's a question of using intermediate deliverables.  Do you recompile all the code and every third party package every time you build a server?  No, you often use binaries - it's faster and more reliable.  Binaries are the app guys' equivalent of "images."

To address Luke's three concerns from his article specifically:

  1. Image sprawl - if you use images, you eventually have a large library of images you have to manage.  This is very true - but you have to manage a lot of artifacts all up and down the chain anyway.  Given the "manual install" and "vendor supplied image" scenarios noted above, if you can't manage images as part of your CM system than it's just not a complete CM system.
  2. Updating your images - Here, I think Luke makes some not entirely valid assumptions.  He notes that once you're done building your images, you're still going to have to make changes in the operational environment ("bootstrapping").  True.  But he thinks you're not going to use the same tool to do it.  I'm not sure why not - our approach is to use automated tooling to build the images - you don't *want* to do it manually for sure - and Puppet/Chef/etc. works just fine to do that.  So if you have to update something at the OS level, you do that and let your CM system blow everything on top - and then burn the image.  Image creation and automated CM aren't mutually exclusive - the only reason people don't use automation to build their images is the same reason they don't always use automation on their live servers, which is "it takes work."  But to me, since you DO have to have some amount of dynamic CM for the runtime bootstrap as well, it's a good conservation of work to use the same package for both. (Besides bootstrapping, there's other stuff like moving content that shouldn't go on images.)
  3. Image state vs running state - This one puzzles me.  With images, you do need to do restarts to pull in image-based changes.  But with virtually all software and app changes you have to as well - maybe not a "reboot," but a "service restart," which is virtually as disruptive.  Whether you "reboot  your database server" or "stop and start your database server, which still takes a couple minutes", you are planning for downtime or have redundancy in place.  And in general you need to orchestrate the changes (rolling restarts, etc.) in a manner that "oh, pull that change whenever you want to Mr. Application Server" doesn't really work for.

In closing, I think images are useful.  You shouldn't treat them as a replacement for automated CM - they should be interim deliverables usually generated by, and always managed by, your automated CM.  If you just use images in an uncoordinated way, you do end up with a foil ball.  With sufficient automation, however, they're more like Russian nesting dolls, and have advantages over starting from scratch with every box.

25Jun/092

Virtualization Security Best Practices from a Customer’s and Vendor’s Perspective

The next session during the ISSA half-day seminar on Virtualization and Cloud Computing Security was on security best practices from a customer and vendor perspective.  It featured Brian Engle, CIO of Temple Inland, and Rob Randell, CISSP and Senior Security Specialist at VMware, Inc.  My notes from the presentation are below:

Temple Inland Implementation - Stage 1

Overcome Hurdles

  • Management skeptical of Windows virtualization

Don't Fear the Virtual World

  • First year:
    • Built out development only environment
    • Trained staff
    • Developed support processes
    • Showed hard dollar savings

Temple Inland - Stage 2

  • Build QA environment
  • Improve processes
  • Develop rapid provisioning
  • Demonstrate advanced functions
    • Vmotion
    • P2V Conversions

Temple Inland - Stage 3

First production environment

Temple-Inland Implementation

  • Prior to VMWare. Typical remote facility
    • Physical domain controller
    • Physical application/file server
    • Physical tape drive
  • New architecture
    • Single VMWare server
    • No tape drive
  • Desktops
    • Virtualize desktops through VMWare
    • No application issues like Citrix Metaframe
    • Quick deployment and repair

How Virtualization Affects Datacenter Security

  • Abstraction and Consolidation
    • +Capital and Operational Cost Savings
    • -New infrastructure layer to be secured
    • -Greater impact of attack or misconfiguration
  • Collapse of Switches and servers into one device
    • +Flexibility
    • +Cost-savings
    • -Lack of virtual network visibility
    • -No separation-by-default of administration

Temple-Inland split the teams so that there was a virtual network administration team within the server administration team.

How Virtualization Affects Datacenter Security

  • Faster deployment of servers
    • + IT responsiveness
    • -Lack of adequate planning
    • -Incomplete knowledge of current state of infrastructure
  • VM Mobility
    • +Improved Service Levels
    • -Identity divorced from physical location
  • VM Encapsulation
    • +Ease of business continuity
    • +Consistency of deployment
    • +Hardware Independence
    • -Outdated offline systems

Build anti-virus, client firewalls, etc into the offline images so that servers are up-to-date right when they are installed.

If something happens to a system, you can't just pull the plug anymore.  You have to have policies and processes in place.

With virtualization you can have a true "gold image" instead of having different images for all of the different types of hardware.

Security Advantages of Virtualization

  • Allows automation of many manual error prone processes
  • Cleaner and easier disaster recovery/business continuity
  • Better forensics capabilities
  • Faster recovery after an attack
  • Patching is safer and more effective
  • Better control over desktop resources
  • More cost effective security devices
  • App virtualization allows de-privileging of end users
  • Better lifecycle controls
  • Future: Security through VM Introspection

Gartner: "Like their physical counterparts, most security vulnerabilities will be introduced through misconfiguration"

What Not to Worry About

  • Hypervisor Attacks
    • ALL theoretical, highly complex attacks
    • Widely recognized by security community as being only of academic interest
  • Irrelevant Architectures
    • Apply only to hosted architecture (ie. Workstation) not bare-metal (ie. ESX)
    • Hosted architecture generally suitable only when you can trust the guest VM
  • Contrived Scenarios
    • Involved exploits where best practices around hardening, lockdown, desgin, for virtualization etc not followed or
    • Poor general IT infrastructure security is assumed

Are there any Hypervisor Attack Vectors?

There are currently no known hypervisor attack vectors to date that have lead to "VM Escape"

  • Architecture Vulnerability
    • Designed specifically with isolation in mind
  • Software Vulnerability - Possible like with any code written by humans
    • Mitigating Circumstances:
      • Small Code Footprint of Hypervisor (~21MB) is easier to audit
      • If a software vulnerability is found, exploit difficulty will be very high
        • Purpose build for virtualization only
        • Non-interactive environment
        • Less code for hackers to leverage
    • Ultimately depends on VMWare security response and patching

Concern: Virtualizing the DMZ/Mixing Trust Zones

Three Primary Configurations

  • Physical separation of trust zones
  • Virtual separation of trust zones with physical security devices
  • Fully collapsing all servers and security devices into a VI3 infrastructure

Also applies to PCI requirement

Physical Separation of Trust Zones

Advantages

  • Simpler, less complex configuration
  • Less change to physical environment
  • Little change to separation of duties
  • Less change in staff knowledge requirements
  • Smaller chance of misconfiguration

Disadvantages

  • Lower consolidation and utilization of resources
  • Higher cost

Virtual Separation of Trust Zones with Physical Security Devices

Advantages

  • Better utilization of resources
  • Take full advantage of virtualization benefits
  • Lower cost

Disadvantages (can be mitigated)

  • More complexity
  • Greater chance of misconfiguration

Getting more toward "the cloud" where web zone, app zone, and DB zone are all virtualized on the same system, but still using physical firewalls.

Fully Collapsed Trust Zones Including Security Devices

Advantages

  • Full utilization of resources, replacing physical security devices with virtual
  • Lowest-cost option
  • Management of entire DMZ and network from a single management workstation

Disadvantages

  • Greatest complexity, which in turn creates highest chance of misconfiguration
  • Requirement for explicit configuration to define separation of duties to help mitigate risk of misconfiguration; also requires regualar audits of configurations
  • Potential loss of certain functionality, such as VMotion (being mitigated by vendors and VMsafe)

How do we secure our Virtual Infrastructure?

Use the principles of Information Security

  • Hardening and lockdown
  • Defense in depth
  • Authorization, authentication, and accounting
  • Separation of duties and least privileges
  • Administrative controls

Protect your management interfaces (VCenter)!  They are the keys to the kingdom.

Fundamental Design Principles

  • Isolate all management networks
  • Disable all unneeded services
  • Tightly regualte all administrative access

Summary

  • Define requirements and ensure vendor/product can deliver
    • Consider culture, capability, maturity, architecture and security needs
  • Implement under controlled conditions using a defined methodology
    • Use the opportunity to improve control deficiencies in existing physical server areas if possible
    • Implement processes for review and validation of controls to prevent the introduction of weaknesses
  • Round corners where your control environment allows
    • Sustain sound practices that maintain required controls
    • Leverage the technology to achieve efficiency and improve scale
25Jun/091

Introduction to Cloud Computing and Virtualizaton Security

Today the Austin ISSA and ISACA chapters held a half-day seminar on Cloud Computing and Virtualization Security.  The introduction on cloud computing was given by Vern Williams.  My notes on this topic are below:

5 Key Cloud Characteristics

  • On-demand self-service
  • Ubiquitous network access
  • Location independent resource pooling
  • Rapid elasticity
  • Pay per use

3 Cloud Delivery Models

  • Software as a Service (SaaS): Providers applications over a network
  • Platform as a Service (PaaS): Deploy customer-created apps to a cloud
  • Infrastructure as a Service (IaaS): Rent processing, storage, etc

4 Cloud Deployment Models

  • Private cloud: Enterprise owned or leased
  • Community cloud: Shared infrastructure for a specific community
  • Public cloud: Sold to the public, Mega-scale infrastructure
  • Hybrid cloud: Composition of two or more clouds
  • Two types: internal and external
  • http://csrc.nist.com/groups/SNS/cloud-computing/index.html

Common Cloud Characteristics

  • Massive scale
  • Virtualization
  • Free software
  • Autonomic computing
  • Multi-tenancy
  • Geographically distributed systems
  • Advanced security technologies
  • Service oriented software

Pros

  • Lower central processing unit (CPU) density
  • Flexible use of resources
  • Rapid deployment of new servers
  • Simplified recovery
  • Virtual network connections

Cons

  • Complexity
  • Potential impact of a single component failure
  • Hypervisor security issues
  • Keeping virtual machine (VM) images current
  • Virtual network connections

Virtualization Security Concerns

  • Protecting the virtual fabric
  • Patching off-line VM images
  • Configuration Management
  • Firewall configurations
  • Complicating Audit and Forensics
12Jun/082

Scalr project and AWS

http://code.google.com/p/scalr/

For those of us getting into amazon's Elastic Compute Cloud (ec2), this is a really cool idea.  The idea is that your load grows and a new node is ready to handle additional capacity.  Once load lessens, boxes are turned off.  Integrating this with box stats, response times, monitoring per service makes sense.

I wanted everyone to be thinking of the consumable computing model.  Pay as you go for what you use is really attractive.  No more do you have to have 10 boxes in your www cluster all day long if your spike is only during 8am to 3pm.   Now you can run the 10 boxes during those times and use less boxes during non peak times...  Pretty cool.  And cheap!