After speaking with Luke Kanies at OpsCamp, and reading his good and oft-quoted article “Golden Image or Foil Ball?“, I was thinking pretty hard about the use of images in our new automated infrastructure.  He’s pretty against them.  After careful consideration, however, I think judicious use of images is the right thing to do.

My top level thoughts on why to use images.

  1. Speed – Starting a prebuilt image is faster than reinstalling everything on an empty one.  In the world of dynamic scaling, there’s a meaningful difference between a “couple minute spinup” and a “fifteen minute spinup.”
  2. Reliability – The more work you are doing at runtime, the more there is to go wrong.  I bet I’m not the only person who has run the same compile and install on three allegedly identical Linux boxen and had it go wrong somehow on one of ’em.  And the more stuff you’re pulling to build your image, the more failure points you have.
  3. Flexibility – Dynamically building from stem cell kinda makes sense if you’re using 100% free open source and have everything automated.  What if, however, you have something that you need to install that just hasn’t been scripted – or is very hard to script?  Like an install of some half-baked Windows software that doesn’t have a command line installer and you don’t have a tool that can do it?  In that case, you really need to do the manual install in non-realtime as part of a image build.  And of course many suppliers are providing software as images themselves nowadays.
  4. Traceability – What happens if you need to replicate a past environment?  Having the image is going to be a 100% effective solution to that, even likely to be sufficient for legal reasons.  “I keep a bunch of old software repo versions so I can mostly build a machine like it” – somewhat less so.

In the end, it’s a question of using intermediate deliverables.  Do you recompile all the code and every third party package every time you build a server?  No, you often use binaries – it’s faster and more reliable.  Binaries are the app guys’ equivalent of “images.”

To address Luke’s three concerns from his article specifically:

  1. Image sprawl – if you use images, you eventually have a large library of images you have to manage.  This is very true – but you have to manage a lot of artifacts all up and down the chain anyway.  Given the “manual install” and “vendor supplied image” scenarios noted above, if you can’t manage images as part of your CM system than it’s just not a complete CM system.
  2. Updating your images – Here, I think Luke makes some not entirely valid assumptions.  He notes that once you’re done building your images, you’re still going to have to make changes in the operational environment (“bootstrapping”).  True.  But he thinks you’re not going to use the same tool to do it.  I’m not sure why not – our approach is to use automated tooling to build the images – you don’t *want* to do it manually for sure – and Puppet/Chef/etc. works just fine to do that.  So if you have to update something at the OS level, you do that and let your CM system blow everything on top – and then burn the image.  Image creation and automated CM aren’t mutually exclusive – the only reason people don’t use automation to build their images is the same reason they don’t always use automation on their live servers, which is “it takes work.”  But to me, since you DO have to have some amount of dynamic CM for the runtime bootstrap as well, it’s a good conservation of work to use the same package for both. (Besides bootstrapping, there’s other stuff like moving content that shouldn’t go on images.)
  3. Image state vs running state – This one puzzles me.  With images, you do need to do restarts to pull in image-based changes.  But with virtually all software and app changes you have to as well – maybe not a “reboot,” but a “service restart,” which is virtually as disruptive.  Whether you “reboot  your database server” or “stop and start your database server, which still takes a couple minutes”, you are planning for downtime or have redundancy in place.  And in general you need to orchestrate the changes (rolling restarts, etc.) in a manner that “oh, pull that change whenever you want to Mr. Application Server” doesn’t really work for.

In closing, I think images are useful.  You shouldn’t treat them as a replacement for automated CM – they should be interim deliverables usually generated by, and always managed by, your automated CM.  If you just use images in an uncoordinated way, you do end up with a foil ball.  With sufficient automation, however, they’re more like Russian nesting dolls, and have advantages over starting from scratch with every box.