Web Admin Blog http://www.webadminblog.com Real Web Admins. Real World Experience. Fri, 12 Mar 2010 19:57:40 +0000 http://wordpress.org/?v=2.9.2 en hourly 1 Before DevOps, Don’t You Need OpsOps? http://www.webadminblog.com/index.php/2010/03/12/before-devops-dont-you-need-opsops/ http://www.webadminblog.com/index.php/2010/03/12/before-devops-dont-you-need-opsops/#comments Fri, 12 Mar 2010 19:57:40 +0000 Ernest http://www.webadminblog.com/?p=427 From the "sad but true" files comes an extremely insightful point apparently discussed over beer by the UK devops crew recently - that we are talking about dev and ops collaboration but the current state of collaboration among ops teams is pretty crappy.

This resonates deeply with me.  I've seen that problem in spades.  I think in general that a lot of the discussion about the agile ops space is too simplistic in that it seems tuned to organizations of "five guys, three of whom are coders and two of whom are operations" and there's no differentiation.  In real life, there's often larger orgs and a lot of differentiation that causes various collaboration challenges.  Some people refer to this as Web vs Enterprise, but I don't think that's strictly true; once your Web shop grows from 5 guys to 200 it runs afoul of this too - it's a simple scalability and organizational engineering problem.

As an aside, I don't even like the "Ops" term - a sysadmin team can split into subgroups that do systems engineering, release management, and operational support...  Just saying "Ops" seems to me to create implications of not being a partner in the initial design and development of the overall system/app/service/site/whatever you want to call it.

Ops Verticals

Here, we have a large Infrastructure department.  Originally, it was completely siloed by technology verticals, and there's a lot of subgroups.  Network, UNIX, Windows, DBA, Lotus Notes, Telecom, Storage, Data Center...  Some ten plus years ago when the company launched their Web site in earnest, they quickly realized that wasn't going to work out.  You had the buck-passing behavior described in the blog posts above that made issues impossible to solve in a timely fashion, plus it made collaboration with devs/business nearly impossible.  Not only did you need like 8 admins to come involve themselves in your project, but they did not speak similar enough languages - you'd have some crusty UNIX admin yelling "WHAT ABOUT THE INODES" until the business analyst started to cry.

Dev Silos

But are our developers here better off?  They are siloed by business unit.  Just among the Web developers there's the eCommerce developers, eCRM, Product Advisors, Community, Support, Content Management...  On the one hand, they are able to be very agile in creating solutions inside their specific niche.  On the other hand, they are all working within the same system environment, and they don't always stay on the same page in terms of what technologies they are using. "Well, I'm sure THAT team bought a lovely million dollar CMS, but we're going to buy our own different million dollar CMS.   No, you don't get more admin resource."  Over time, they tried to produce architecture groups and other cross-team initiatives to try to rein in the craziness, with mixed but overall positive results.

Plugging the Dike

What we did was create a Web Administration group (Web Ops, whatever you want to call it) that was holistically responsible for Web site uptime, performance, and security.  Running that team was my previous gig, did it for five years.  That group was more horizontally focused and would serve as an interface to the various technology verticals; it worked closely with developers in system design during development, coordinated the release process, and involved devs in troubleshooting during the production phase.

BizOps?

In fact, we didn't just partner with the developers - we partnered with the business owners of our Web site too, instead of tolerating the old model of "Business collaborates with the developers, who then come and tell ops what to do."  This was a remarkably easy sell really.  The company lost money every minute the Web site was down, and it was clear that the dev silos weren't going to be able to fix that any more than the ops silos were.  So we quickly got a seat at the same table.

Results

This was a huge success.  To this day, our director of Web Marketing is one of the biggest advocates of the Web operations team.  Since then, other application administration (our word for this cross-disciplinary ops) teams have formed along the same model.  The DevOps collaboration has been good overall - with certain stresses coming from the Web Ops team's role as gatekeeper and process enforcement.  Ironically, the biggest issues and worst relationships were within Infrastructure between the ops teams!

OpsOps - The Fly In The Ointment

The ops team silos haven't gone down quietly.  To this day the head DBA still says "I don't see a good reason for you guys [WebOps] to exist."  I think there's a common "a thing is just the sum of its parts" mindset among admins for whatever reason.  There are also turf wars arising from the technology silo division and the blurring of technology lines by modern tech.  I tried again and again to pitch "collaborative system administration."  But the default sysadmin behavior is to say "these systems are mine and I have root on them.  Those are your systems and you have root on them.  Stay on your side of the line and I'll stay on mine."

Fun specific Catch-22 situations we found ourselves in:

  • Buying a monitoring tool that correlates events across all the different tiers to help root-cause production problems - but the DBAs refusing to allow it on "their" databases.
  • Buying a hardware load balancer - we were going to manage it, not the network team, and it wasn't a UNIX or Windows server, so we couldn't get anyone to rack and jack it (and of course we weren't allowed to because "Why would a webops person need server room access, that's what the other teams are for").

Some of the problem is just attitude, pure and simple.  We had problems even with collaboration inside the various ops teams!  We'd work with one DBA to design a system and then later need to get support from another DBA, who would gripe that "no one told/consulted them!"  Part of the value of the agile principles that "DevOps" tries to distill is just a generic "get it into your damn head you need to be communicating and working together and that needs to be your default mode of operation." I think it's great to harp on that message because it's little understood among ops.  For every dev group that deliberately ostracizes their ops team, there's two ops teams who don't think they need to talk to the devs - in the end, it's mostly our fault.

Part of the problem is organizational.  I also believe (and ITIL, I think, agrees with me) that the technology-silo model has outlived its usefulness.  I'd like to see admin teams organized by service area with integral DBAs, OS admins, etc.  But people are scared of this for a couple reasons.  One is that those admins might do things differently from area to area (the same problem we have with our devs) - this could be mitigated by "same tech" cross-org standards/discussions.  The other is that this model is not the cheapest.  You can squeeze every last penny out if you only have 4 Windows admins and they're shared by 8 functional areas.  Of course, you are cutting off your nose to spite your face because you lose lots more in abandoned agility, but frankly corporate finance rules (minimize G&A spending) are a powerful driver here.

If nothing else, there's not "one right organization" - I'd be tempted to reorg everyone from verticals into horizontals, let that run for 5 years, and then reorg back the other way, just to keep the stratification from setting in.

Specialist vs Generalist

One other issue.  The Web Ops team we created required us to hire generalists - but generalists that knew their stuff in a lot of different areas.  It became very hard to hire for that position and training took months before someone was at all effective.  Being a generalist doesn't scale well.  Specialization is inevitable and, indeed, desirable (as I think pretty much anything in the history of anything demonstrates).  You can mitigate that with some cross-training and having people be generalists in some areas, but in the end, once you get past that "three devs, two ops, that's the company" model, specialization is needed.

That's why I think one of the common definitions of DevOps - all ops folks learning to be developers or vice versa - is fundamentally flawed.  It's not sustainable.  You either need to hire all expensive superstars that can be good at both, or you hire people that suck at both.

What you do is have people with varying mixes.  In my current team we have a continuum of pure ops people, ops folks doing light dev, devs doing light ops, and pure devs.  It's good to have some folks who are generalizing and some who are specializing.  It's not specializing that is bad, it's specialists who don't collaborate that are bad.

Conclusion

So I've shared a lot of experiences and opinions above but I'm not sure I have a brilliant solution to the problem.  I do think we need to recognize that Ops/Ops collaboration is an issue that arises with scale and one potentially even harder to overcome than Dev/Ops collaboration.  I do think stressing collaboration as a value and trying to break down organizational silos may help.  I'd be happy to hear other folks' experiences and thoughts!

]]>
http://www.webadminblog.com/index.php/2010/03/12/before-devops-dont-you-need-opsops/feed/ 0
Defining Agile Operations and DevOps http://www.webadminblog.com/index.php/2010/03/11/defining-agile-operations-and-devops/ http://www.webadminblog.com/index.php/2010/03/11/defining-agile-operations-and-devops/#comments Thu, 11 Mar 2010 19:55:22 +0000 Ernest http://www.webadminblog.com/?p=424 I recently read a great blog post by Scott Wilson that was talking about the definitions of Agile Operations, DevOps, and related terms.  (Read the comments too, there's some good discussion.)  From what I've heard so far, there are a bunch of semi-related terms people are using around this whole "new thing of ours."

The first is DevOps, which has two totally different frequently used definitions.

1.  Developers and Ops working closely together - the "hugs and collaboration" definition

2.  Operations folks uptaking development best practices and writing code for system automation

The second is Agile Operations, which also has different meanings.

1.  Same as DevOps, whichever definition of that I'm using

2.  Using agile principles to run operations - process techniques, like iterative development or even kanban/TPS kinds of process stuff.  Often with a goal of "faster!"

3.  Using automation - version control, automatic provisioning/control/monitoring.  Sometimes called "Infrastructure Automation" or similar.

This leads to some confusion, as most of these specific elements can be implemented in isolation.  For example, I think the discussion at OpsCamp about "Is DevOps an antipattern" was predicated on an assumption that DevOps meant only DevOps definition #2, "ops guys trying to be developers," and made the discussion somewhat odd to people with other assumed definitions.

I have a proposed set of definitions.  To explain it, let's look at Agile Development and see how it's defined.

Agile development, according to wikipedia and the agile manifesto, consists of a couple different "levels" of thing.  To sum up the wikipedia breakdown,

  • Agile Principles - like "business/users and developers working together."  These are the core values that inform agile, like collaboration, people over process, software over documentation, and responding to change over planning.
  • Agile Methods - specific process types.  Iterations, Lean, XP, Scrum.  "As opposed to waterfall."
  • Agile Practices - techniques often found in conjunction with agile development, not linked to a given method flavor, like test driven development, continuous integration, etc.

I believe the different parts of Agile Operations that people are talking about map directly to these three levels.

  • Agile Operations Principles includes things like dev/ops collaboration (DevOps definition 1 above); things like James Turnbull's 4-part model seem to be spot on examples of trying to define this arena.
  • Agile Operations Methods includes process you use to conduct operations - iterations, kanban, stuff you'd read in Visible Ops; Agile Operations definition #2 above.
  • Agile Operations Practices includes specific techniques like automated build/provisioning, monitoring, anything you'd have a "toolchain" for.  This contains DevOps definition #2 and Agile Operations definition #3 above.

I think it's helpful to break them up along the same lines as agile development, however, because in the end some of those levels should merge once developers understand ops is part of system development too...  There shouldn't be a separate "user/dev collaboration" and "dev/ops collaboration," in a properly mature model it should become a "user/dev/ops collaboration," for example.

I think the dev2ops guys' "People over Process over Tools" diagram mirrors this about exactly - the people being one of the important agile principles, process being a large part of the methods, and tools being used to empower the practices.

What I like about that diagram, and why I want to bring this all back to the Agile Manifesto discussion, is that the risk of having various sub-definitions increases the risk that people will implement the processes or tools without the principles in mind, which is definitely an antipattern.  The Agile guys would tell you that iterations without collaboration is likely to not work out real well.

And it happens in agile development too - there are some teams here at my company that have adopted the methods and/or tools of agile but not its principles, and the results are suboptimal.

Therefore I propose that "Agile Operations" is an umbrella term for all these things, and we keep in mind the principles/methods/practices differentiation.

If we want to call the principles "devops" for short and some of the practices "infrastructure automation" for short I think that would be fine...   Although dev/ops collaboration is ONE of the important principles - but probably not the entirety; and infrastructure automation is one of the important practices, but there are probably others.

]]>
http://www.webadminblog.com/index.php/2010/03/11/defining-agile-operations-and-devops/feed/ 0
Upcoming Free Velocity WebOps Web Conference http://www.webadminblog.com/index.php/2010/03/11/upcoming-free-velocity-webops-web-conference/ http://www.webadminblog.com/index.php/2010/03/11/upcoming-free-velocity-webops-web-conference/#comments Thu, 11 Mar 2010 14:34:49 +0000 Ernest http://www.webadminblog.com/?p=421 O'Reilly's Velocity conference is the only generalized Web ops and performance conference out there.  We really like it; you can go to various other conferences and have 10-20% of the content useful to you as a Web Admin, or you can go here and have most of it be relevant!

They've been doing some interim freebie Web conferences and there's one coming up.  Check it out.  They'll be talking about performance functionality in Google Webmaster Tools, mySQL, Show Slow, provisioning tools, and dynaTrace's new AJAX performance analysis tool.

O'Reilly Velocity Online Conference: "Speed and Stability"
Thursday, March 17; 9:00am PST
Cost: Free

]]>
http://www.webadminblog.com/index.php/2010/03/11/upcoming-free-velocity-webops-web-conference/feed/ 0
Microsoft Azure for Dummies – or for Smarties? http://www.webadminblog.com/index.php/2010/03/05/microsoft-azure-for-dummies-or-for-smarties/ http://www.webadminblog.com/index.php/2010/03/05/microsoft-azure-for-dummies-or-for-smarties/#comments Fri, 05 Mar 2010 14:49:19 +0000 Ernest http://www.webadminblog.com/?p=411 What Is Microsoft Azure?

I'm going to attempt to explain Microsoft Azure in "normal Web person" language.  Like many of you, I am more familiar with Linux/open source type solutions, and like many of you, my first forays into cloud computing have been with Amazon Web Services.  It can often be hard for people not steeped in Redmondese to understand exactly what the heck they're talking about when Microsoft people try to explain their offerings.  (I remember a time some years ago I was trying to get a guy to explain some new Microsoft data access thing with the usual three letter acronym name.  I asked, "Is it a library?  A language?  A protocol?  A daemon?  Branding?  What exactly is this thing you're trying to get me to uptake?"  The reply was invariably "It's an innovative new way to access data!"  Sigh.  I never did get an answer and concluded "Never mind.")

Microsoft has released their new cloud offering, Azure.  Our company is a close Microsoft partner since we use a lot of their technologies in developing our company's desktop software products, so as "cloud guy" I've gotten some in depth briefings and even went to PDC this year to learn more (some of my friends who have known me over the course of my 15 years of UNIX administration were horrified).  "Cloud computing" is an overloaded enough term that it's not highly descriptive and it took a while to cut through the explanations to understand what Azure really is.  Let me break it down for you and explain the deal.

Point of Comparison: Amazon (IaaS)

In Amazon EC2, as hopefully everyone knows by now, you are basically given entire dynamically-provisioned, hourly-billed virtual machines that you load OSes on and install software and all that.  "Like servers, but somewhere out in the ether."  Those kinds of cloud offerings (e.g. Amazon, Rackspace, most of them really) are called Infrastructure As A Service (IaaS).  You're responsible for everything you normally would be, except for the data center work.  Azure is not an IaaS offering but still bears a lot of similarities to Amazon; I'll get into details later.

Point of Comparison: Google App Engine (PaaS)

Take Google's App Engine as another point of comparison.  There, you just upload your Python or Java application to their portal and "it runs on the Web."  You don't have access to the server or OS or disk or anything.  And it "magically" scales for you.  This approach is called Platform as a Service (PaaS).   They provide the full platform stack, you only provide the end application.  On the one hand, you don't have to mess with OS level stuff - if you are just a Java programmer, you don't have to know a single UNIX (or Windows) command to transition your app from "But it works in Eclipse!" to running on a Web server on the Internet.  On the other hand, that comes with a lot of limitations that the PaaS providers have to establish to make everything play together nicely.  One of our early App Engine experiences was sad - one of our developers wrote a Java app that used a free XML library to parse some XML.  Well, that library had functionality in it (that we weren't using) that could write XML to disk.  You can't write to disk in App Engine, so its response was to disallow the entire library.  The app didn't work and had to be heavily rewritten.  So it's pretty good for code that you are writing EVERY SINGLE LINE OF YOURSELF.  Azure isn't quite as restrictive as App Engine, but it has some of that flavor.

Azure's Model

Windows Azure falls between the two.  First of all, Azure is a real "hosted cloud" like Amazon Web Services, like most of us really think about when we think cloud computing; it's not one of these on premise things that companies are branding as "cloud" just for kicks. That's important to say because it seems like nowadays the larger the company, the more they are deliberately diluting the term "cloud" to stick their products under its aegis.  Microsoft isn't doing that, this is a "cloud offering" in the classical (where classical means 2008, I guess) sense.

However, in a number of important ways it's not like Amazon.  I'd definitely classify it as a PaaS offering.  You upload your code to "Roles" which are basically containers that run your application in a Windows 2008(ish) environment.  (There are two types - a "Web role" has a stripped down IIS provided on it, a "Worker role" doesn't - the only real difference between the two.)  You do not have raw OS access, and cannot do things like write to the registry.  But, it is less restrictive than App Engine.  You can bundle up other stuff to run in Azure - even run Java apps using Apache Tomcat.  You have to be able to install whatever you want to run "xcopy only" - in other words, no fancy installers, it needs to be something you could just copy the files to a Windows PC, without administrative privilege, and run a command from the command line and have it work.  Luckily, Tomcat/Java fits that description. They have helper packs to facilitate doing this with Tomcat, memcached, and Apache/PHP/MediaWiki.  At PDC they demoed Domino's Pizza running their Java order app on it and a Wordpress blog running on it.  So it's not only for .NET programmers.  Managed code is easier to deploy, but you can deploy and run about anything that fits the "copy and run command line" model.

I find this approach a little ironic actually.  It's been a lot easier for us to get the Java and open source (well, the ones with Windows ports) parts of our infrastructure running on Azure than Windows parts!  Everybody provides Windows stuff with an installer, of course, and you can't run installers on Azure.  Anyway, in its core computing model it's like Google App Engine - it's more flexible than that (g00d) but it doesn't do automatic scaling (bad).  If it did autoscaling I'd be willing to say "It's better than App Engine in every way."

In other ways, it's a lot like Amazon.  They offer a variety of storage options - blobs (like S3), tables (like mySQL), queues (like SQS), drives (like EBS).  They have an integral CDN.  They do hourly billing.  Pricing is pretty similar to Amazon - it's hard to totally equate apples to apples, but Azure compute is $0.12/hr and an Amazon small Windows image compute is $0.12/hr (Coincidence?  I think not.).  And you have to figure out scaling and provisioning yourself on Amazon too - or pay a lot of scratch to one of the provisioning companies like RightScale.

What's Unique and Different

Well, the largest thing that I've already mentioned is the PaaS approach.  If you need OS level access, you're out of luck;  if you don't want to have to mess with OS management, you're in luck!  So to the first order of magnitude, you can think of Azure as "like Amazon Web Services, but the compute uses more of a Google App Engine model."

But wait, there's more!

One of the biggest things that Azure brings to the table is that, using Visual Studio, you can run a local Azure "fabric" on your PC, which means you can develop, test, and run cloud apps locally without having to upload to the cloud and incur usage charges.  This is HUGE.  One of the biggest pains about programming for Amazon, for instance, is that if you want to exercise any of their APIs, you have to do it "up there."  Also, you can't move images back and forth between Amazon and on premise.  Now, there are efforts like EUCALYPTUS that try to overcome some of this problem but in the end you pretty much just have to throw in the towel and do all dev and test up in the cloud.  Amazon and Eclipse (and maybe Xen) - get together and make it happen!!!!

Here's something else interesting.  In a move that seems more like a decision from a typical cranky cult-of-personality open source project, they have decided that proper Web apps need to be asynchronous and message-driven, and by God that's what you're going to do.  Their load balancers won't do sticky sessions (only round robin) and time out all connections between all tiers after 60 seconds without exception.  If you need more than that, tough - rewrite your app to use a multi-tier message queue/event listener model.  Now on the one hand, it's hard for me to disagree with that - I've been sweating our developers, telling them that's the correct best-practice model for scalability on the Web.  But again you're faced with the "Well what if I'm using some preexisting software and that's not how it's architected?" problem.  This is the typical PaaS pattern of "it's great, if you're writing every line of code yourself."

In many ways, Azure is meant to be very developer friendly.  In a lot of ways that's good.  As a system admin, however, I wince every time they go on about "You can deploy your app to Azure just by right clicking in Visual Studio!!!"  Of course, that's not how anyone with a responsibly controlled production environment would do it, but it certainly does make for fast easy adoption in development.   The curve for a developer who is "just" a C++/Java/.NET/whatever wrangler to get up and going on an IaaS solution like Amazon is pretty large comparatively; here, it's "go sign up for an account and then click to deploy from your IDE, and voila it's running on the Intertubes."  So it's a qualified good - it puts more pressure on you as an ops person to go get the developers to understand why they need to utilize your services.  (In a traditional server environment, they have to go through you to get their code deployed.)  Often, for good or ill, we use the release process as a touchstone to also engage developers on other aspects of their code that need to be systems engineered better.

Now, that's my view of the major differences.  I think the usual Azure sales pitch would say something different - I've forgotten two of their huge differentiators, their service bus and access control components.  They are branded under the name "AppFabric," which as usual is a name Microsoft is also using for something else completely different (a new true app server for Windows Server, including projects formerly code named Dublin and Velocity - think of it as a real WebLogic/WebSphere type app server plus memcache.)

Their service bus is an ESB.  As alluded to above, you're going to want to use it to do messaging.   You can also use Azure Queues, which is a little confusing because the ESB is also a message queue - I'm not clear on their intended differentiation really.  You can of course just load up an ESB yourself in any other IaaS cloud solution too, so if you really want one you could do e.g. Apache ServiceMix hosted on Amazon.  But, they are managing this one for you which is a plus.  You will need to use it to do many of the common things you'd want to do.

Their access control - is a mess.  Sorry, Microsoft guys.  The whole rest of the thing, I've managed to cut through the "Microsoft acronyms versus the rest of the world's terms and definitions" factor, but not here.   "You see, you use ACS's WIF STS to generate a SWT," says our Microsoft rep with a straight face.   They seem to be excited that it will use people's Microsoft Live IDs, so if you want people to have logins to your site and you don't want to manage any of that, it is probably nice.  It takes SAML tokens too, I think, though I'm not sure if the caveats around that end up equating to "Well, not really."  Anyway, their explanations have been incoherent so far and I'm not smelling anything I'm really interested in behind it.  But there's nothing to prevent you from just using LDAP and your own Internet SSO/federation solution.  I don't count this against Microsoft because no one else provides anything like this, so even if I ignore the Azure one it doesn't put it behind any other solution.

The Future

Microsoft has said they plan to add on some kind of VM/IaaS offering eventually because of the demand.  For us, the PaaS approach is a bit of a drawback - we want to do all kinds of things like "virus scan uploaded files," "run a good load balancer," "run an LDAP server", and other things that basically require more full OS access.  I think we may have an LDAP direction with the all-Java OpenDS, but it's a pain point in general.

I think a lot of their decisions that are a short term pain in the ass (no installs, no synchronous) are actually good in the long term.  If all developers knew how to develop async and did it by default, and if all software vendors, even Windows based ones, provided their product in a form that could just be "copy and run without admin privs" to install, the world would be a better place.  That's interesting in that "Sure it's hard to use now but it'll make the world better eventually" is usually heard from the other side of the aisle.

Conclusion

Azure's a pretty legit offering!  And I'm very impressed by their velocity.  I think it's fair to say that overall Azure isn't quite as good as Amazon except for specific use cases (you're writing it all in .NET by hand in Visual Studio) - but no one else is as good as Amazon either (believe me, I evaluated them) and Amazon has years of head start; Azure is brand new but already at about 80%! That puts them into the top 5 out of the gate.

Without an IaaS component, you still can't do everything under the sun in Azure.  But if you're not depending on much in the way of big third party software chunks, it's feasible; if you're doing .NET programming, it's very compelling.

Do note that I haven't focused too much on the attributes and limitations of cloud computing in general here - that's another topic - this article is meant to compare and contrast Azure to other cloud offerings so that people can understand its architecture.

I hope that was clear.  Feel free and ask questions in the comments and I'll try to clarify!

]]>
http://www.webadminblog.com/index.php/2010/03/05/microsoft-azure-for-dummies-or-for-smarties/feed/ 0
A Case For Images http://www.webadminblog.com/index.php/2010/02/24/a-case-for-images/ http://www.webadminblog.com/index.php/2010/02/24/a-case-for-images/#comments Wed, 24 Feb 2010 21:39:19 +0000 Ernest http://www.webadminblog.com/?p=408 After speaking with Luke Kanies at OpsCamp, and reading his good and oft-quoted article "Golden Image or Foil Ball?", I was thinking pretty hard about the use of images in our new automated infrastructure.  He's pretty against them.  After careful consideration, however, I think judicious use of images is the right thing to do.

My top level thoughts on why to use images.

  1. Speed - Starting a prebuilt image is faster than reinstalling everything on an empty one.  In the world of dynamic scaling, there's a meaningful difference between a "couple minute spinup" and a "fifteen minute spinup."
  2. Reliability - The more work you are doing at runtime, the more there is to go wrong.  I bet I'm not the only person who has run the same compile and install on three allegedly identical Linux boxen and had it go wrong somehow on one of 'em.  And the more stuff you're pulling to build your image, the more failure points you have.
  3. Flexibility - Dynamically building from stem cell kinda makes sense if you're using 100% free open source and have everything automated.  What if, however, you have something that you need to install that just hasn't been scripted - or is very hard to script?  Like an install of some half-baked Windows software that doesn't have a command line installer and you don't have a tool that can do it?  In that case, you really need to do the manual install in non-realtime as part of a image build.  And of course many suppliers are providing software as images themselves nowadays.
  4. Traceability - What happens if you need to replicate a past environment?  Having the image is going to be a 100% effective solution to that, even likely to be sufficient for legal reasons.  "I keep a bunch of old software repo versions so I can mostly build a machine like it" - somewhat less so.

In the end, it's a question of using intermediate deliverables.  Do you recompile all the code and every third party package every time you build a server?  No, you often use binaries - it's faster and more reliable.  Binaries are the app guys' equivalent of "images."

To address Luke's three concerns from his article specifically:

  1. Image sprawl - if you use images, you eventually have a large library of images you have to manage.  This is very true - but you have to manage a lot of artifacts all up and down the chain anyway.  Given the "manual install" and "vendor supplied image" scenarios noted above, if you can't manage images as part of your CM system than it's just not a complete CM system.
  2. Updating your images - Here, I think Luke makes some not entirely valid assumptions.  He notes that once you're done building your images, you're still going to have to make changes in the operational environment ("bootstrapping").  True.  But he thinks you're not going to use the same tool to do it.  I'm not sure why not - our approach is to use automated tooling to build the images - you don't *want* to do it manually for sure - and Puppet/Chef/etc. works just fine to do that.  So if you have to update something at the OS level, you do that and let your CM system blow everything on top - and then burn the image.  Image creation and automated CM aren't mutually exclusive - the only reason people don't use automation to build their images is the same reason they don't always use automation on their live servers, which is "it takes work."  But to me, since you DO have to have some amount of dynamic CM for the runtime bootstrap as well, it's a good conservation of work to use the same package for both. (Besides bootstrapping, there's other stuff like moving content that shouldn't go on images.)
  3. Image state vs running state - This one puzzles me.  With images, you do need to do restarts to pull in image-based changes.  But with virtually all software and app changes you have to as well - maybe not a "reboot," but a "service restart," which is virtually as disruptive.  Whether you "reboot  your database server" or "stop and start your database server, which still takes a couple minutes", you are planning for downtime or have redundancy in place.  And in general you need to orchestrate the changes (rolling restarts, etc.) in a manner that "oh, pull that change whenever you want to Mr. Application Server" doesn't really work for.

In closing, I think images are useful.  You shouldn't treat them as a replacement for automated CM - they should be interim deliverables usually generated by, and always managed by, your automated CM.  If you just use images in an uncoordinated way, you do end up with a foil ball.  With sufficient automation, however, they're more like Russian nesting dolls, and have advantages over starting from scratch with every box.

]]>
http://www.webadminblog.com/index.php/2010/02/24/a-case-for-images/feed/ 0
A XSS Vulnerability in Almost Every PHP Form I’ve Ever Written http://www.webadminblog.com/index.php/2010/02/23/a-xss-vulnerability-in-almost-every-php-form-ive-ever-written/ http://www.webadminblog.com/index.php/2010/02/23/a-xss-vulnerability-in-almost-every-php-form-ive-ever-written/#comments Wed, 24 Feb 2010 02:30:16 +0000 Josh http://www.webadminblog.com/?p=401 I've spent a lot of time over the past few months writing an enterprise application in PHP.  Despite what some people may say, I believe that PHP is as secure or insecure as the developer who is writing the code.  Anyway, I'm at the point in my development lifecycle where I decided that it was ready to run an application vulnerability scanner against it.  What I found was interesting and I think it's worth sharing with you all.

Let me preface this by saying that I'm the guy who gives the training to our developers on the OWASP Top 10, writing secure code, etc.  I'd like to think that I have a pretty good handle on programming best practices, input validation, and HTML encoding.  I built all kinds of validation into this application and thought that the vulnerability scan would come up empty.  For the most part I was right, but there was one vulnerability, one flaw in particular, that found it's way into every form in my application.  In fact, I realized that I've made this exact same mistake in almost every PHP form that I've ever written.  Talk about a humbling experience.

So here's what happened.  I created a simple page with a form where the results of that form are submitted back to the page itself for processing.  Let's assume it looks something like this:

<html>
 <body>
  <?php
  if (isset($_REQUEST['submitted']) && $_REQUEST['submitted'] == '1') {
    echo "Form submitted!";
  }
  ?>
  <form action="<?php echo $_SERVER['PHP_SELF']; ?>">
   <input type="hidden" name="submitted" value="1" />
   <input type="submit" value="Submit!" />
  </form>
 </body>
</html>

It looks fairly straightforward, right? The problem has to do with that $_SERVER['PHP_SELF'] variable. The intent here is that PHP will display the path and name of the current page so that the form knows to submit back to the same page.  The problem is that $_SERVER['PHP_SELF'] can actually be manipulated by the user.  Let's say as the user I change the URL from http://www.webadminblog.com/example.php to http://www.webadminblog.com/example.php"><script>alert('xss');</script>.  This will end the form action part of the code and inject a javascript alert into the page.  This is the very definition of cross site scripting.  I can't believe that with as long as I've been writing in PHP and as long as I've been studying application security, I've never realized this.  Fortunately, there are a couple of different ways to fix this.  First, you could use the HTML entities or HTML special character functions to sanitize the user input like this:

htmlentities($_SERVER['PHP_SELF]);

htmlspecialchars($_SERVER['PHP_SELF]);

This fix would still allow the user to manipulate the URL, and thus, what is displayed on the page, but it would render the javascript invalid.  The second way to fix this is to use the script name variable instead like this:

$_SERVER['SCRIPT_NAME'];

This fix would just echo the full path and filename of the current file.    Yes, there are other ways to fix this.  Yes, my code example above for the XSS exploit doesn't do anything other than display a javascript alert.  I just wanted to draw attention to this issue because if it's found it's way into my code, then perhaps it's found it's way into yours as well.  Happy coding!

]]>
http://www.webadminblog.com/index.php/2010/02/23/a-xss-vulnerability-in-almost-every-php-form-ive-ever-written/feed/ 6
Agile Operations http://www.webadminblog.com/index.php/2010/02/17/agile-operations/ http://www.webadminblog.com/index.php/2010/02/17/agile-operations/#comments Wed, 17 Feb 2010 19:59:42 +0000 Ernest http://www.webadminblog.com/?p=398 It's funny.  When we recently started working on an upgrade of our Intranet social media platform, and we were trying to figure out how to meld the infrastructure-change-heavy operation with the need for devs, designers, and testers to be able to start working on the system before "three months from now," we broached the idea of "maybe we should do that in iterations!"  First, get the new wiki up and working.  Then, worry about tuning, switching the back end database, etc.  Very basic, but it got me thinking about the problem in terms of "hey, Infrastructure still operates in terms of waterfall, don't we."

Then when Peco and I moved over to NI R&D and started working on cloud-based systems, we quickly realized the need for our infrastructure to be completely programmable - that is, not manually tweaked and controlled, but run in a completely automated fashion.  Also, since we were two systems guys embedded in a large development org that's using agile, we were heavily pressured to work in iterations along with them.  This was initially a shock - my default project plan has, in traditional fashion, months worth of evaluating, installing, and configuring various technology components before anything's up and running.   But as we began to execute in that way, I started to see that no, really, agile is possible for infrastructure work - at least "mostly."  Technologies like cloud computing help, but there's still a little more up front work required than with programming - but you can get mostly towards an agile methodology (and mindset!).

Then at OpsCamp last month, we discovered that there's been this whole Agile Operations/Automated Infrastructure/devops movement thing already in progress we hadn't heard about.  I don't keep in touch with The Blogosphere (tm) enough I guess.  Anyway, turns out a bunch of other folks have suddenly come to the exact same conclusion and there's exciting work going on re: how to make operations agile, automate infrastructure, and meld development and ops work.

So if  you also hadn't been up on this, here's a roundup of some good related core thoughts on these topics for your reading pleasure!

]]>
http://www.webadminblog.com/index.php/2010/02/17/agile-operations/feed/ 0
Enterprise Systems vs. Agility http://www.webadminblog.com/index.php/2010/02/09/enterprise-systems-vs-agility/ http://www.webadminblog.com/index.php/2010/02/09/enterprise-systems-vs-agility/#comments Tue, 09 Feb 2010 22:06:18 +0000 Ernest http://www.webadminblog.com/?p=396 I was recently reading a good Cameron Purdy post where he talks about his eight theses regarding why startups or students can pull stuff off that large enterprise IT shops can't.

My summary/trenchant restatement of his points:

  1. Changing existing systems is harder than making a custom-built new one (version 2 is harder)
  2. IT veterans overcomplicate new systems
  3. The complexity of a system increases exponentially the work needed to change it (versions 3 and 4 are way way harder)
  4. Students/startups do fail a lot, you just don't see those
  5. Risk management steps add friction
  6. Organizational overhead (paperwork/meetings) adds friction
  7. Only overconservative goons work in enterprise IT anyway
  8. The larger the org, the more conflict

Though I suspect #1 and #3 are the same, #2 and #5 are the same, and #6 and #8 are the same, really.

I've been thinking about this lately with my change from our enterprise IT Web site to a new greenfield cloud-hosted SaaS product in our R&D organization.  It's definitely a huge breath of fresh air to be able to move fast.  My observations:

Complexity

The problem of systems complexity (theses #1 and #3) is a very real one.  I used to describe our Web site as having reached "system gridlock."  There were hundreds of apps running dozens to a server with poorly documented dependencies on all kinds of stuff.  You would go in and find something that looked "wrong" - an Apache config, script, load balancer rule, whatever - but if you touched it some house of cards somewhere would come tumbling down.  Since every app developer was allowed to design their own app in its own tightly coupled way, we had to implement draconian change control and release processes in an attempt to stem the tide of people lining up to crash the Web site.

We have a new system design philosophy for our new gig which I refer to as "sharing is the devil."  All components are separated and loosely coupled.  Using cloud computing for hardware and open source for software makes it easy and affordable to have a box that does "only one thing."  In traditional compute environments there's pressure to "use up all that CPU before you add more", which results in a penny wise, pound foolish strategy of consolidation.  More and more apps and functions get crunched closer together and when you go back to pull them out you discover that all kinds of new connections and dependencies have formed unbidden.

Complication

Overcomplicating systems (#2 and #5) can be somewhat overcome by using agile principles.  We've been delving heavily into doing not just our apps but also our infrastructure according to an agile methodology.  It surfaces your requirements - frankly, systems people often get away with implementing whatever they want, without having a spec let alone one open to review.  Also, it makes you prioritize.  "Whatever you can get done in this two week iteration, that's what you'll have done, and it should be working."  It forces focus on what is required to get things to work and delays more complex niceties till later as there's time.

Conservatism

Both small and large organizations can suffer from #6 and #8.  That's mostly a mindset issue.  I like to tell the story about how we were working on a high level joint IT/business vision for our Web site.  We identified a number of "pillars" of the strategy we were developing - performance, availability, TCO, etc.  I had identified agility as one, but one of the application directors just wasn't buying into it.  "Agility, that's weird, how do we measure that, we should just forget about it."  I finally had to take all the things we had to the business head of the Web and say "of these, which would you say is the single most important one?"  "Agility, of course," he said, as I knew he would.  I made it a point to train my staff that "getting it done" was the most important thing, more important than risk mitigation or crossing all the t's and dotting all the i's.  That can be difficult if the larger organization doesn't reward risk and achievement over conservatism, but you can work on it.

]]>
http://www.webadminblog.com/index.php/2010/02/09/enterprise-systems-vs-agility/feed/ 0
OpsCamp Debrief http://www.webadminblog.com/index.php/2010/02/05/opscamp-debrief/ http://www.webadminblog.com/index.php/2010/02/05/opscamp-debrief/#comments Fri, 05 Feb 2010 15:14:13 +0000 Ernest http://www.webadminblog.com/?p=381 I went to OpsCamp this last weekend here in Austin, a get-togther for Web operations folks specifically focusing on the cloud, and it was a great time!  Here's my after action report.

The event invite said it was in the Spider House, a cool local coffee bar/normal bar.  I hadn't been there before, but other people that had said "That's insane!  They'll never fit that many people!  There's outside seating but it's freezing out!"  That gave me some degree of trepidation, but I still racked out in time to get downtown by 8 AM on a Saturday (sigh!).  Happily, it turned out that the event was really in the adjacent music/whatnot venue also owned by Spider House, the United States Art Authority, which they kindly allowed us to use for free!  There were a lot of people there; we weren't overfilling the place but it was definitely at capacity, there were near 100 people there.

I had just hears of OpsCamp through word of mouth, and figured it was just going to be a gathering of local Austin Web ops types.  Which would be entertaining enough, certainly.  But as I looked around the room I started recognizing a lot of guys from Velocity and other major shows; CEOs and other high ranked guys from various Web ops related tool companies.  Sponsors included John Willis and Adam Jacob (creator of Chef) from Opscode , Luke Kanies from Reductive Labs (creator of Puppet), Damon Edwards and Alex Honor from DTO Solutions (formerly ControlTier), Mark Hinkle and Matt Ray from Zenoss, Dave Nielsen (CloudCamp), Michael Coté (Redmonk), Bitnami, Spiceworks, and Rackspace Cloud.  Other than that, there were a lot of random Austinites and some guys from big local outfits (Dell, IBM).

You can read all the tweets about the event if you swing that way.

OpsCamp kinda grew out of an earlier thing, BarCampESM, also in Austin two years ago.  I never heard about that, wish I had.

How It Went

I had never been to an "unconference" before.  Basically there's no set agenda, it's self-emergent.  It worked pretty well.  I'll describe the process a bit for other noobs.

First, there was a round of lightning talks.  Brett from Rackspace noted that "size matters," Bill from Zenoss said "monitoring is important," and Luke from Reductive claimed that "in 2-4 years 'cloud' won't be a big deal, it'll just be how people are doing things - unless you're a jackass."

Then it was time for sessions.  People got up and wrote a proposed session name on a piece of paper and then went in front of the group and pitched it, a hand-count of "how many people find this interesting" was taken.

Candidates included:

  • service level to resolution
  • physical access to your cloud assets
  • autodiscovery of systems
  • decompose monitoring into tool chain
  • tool chain for automatic provisioning
  • monitoring from the cloud
  • monitoring in the cloud - widely dispersed components
  • agent based monitoring evolution
  • devops is the debil - change to the role of sysadmins
  • And more

We decided that so many of these touched on two major topics that we should do group discussions on them before going to sessions.  They were:

  • monitoring in the cloud
  • config mgmt in the cloud

This seemed like a good idea; these are indeed the two major areas of concern when trying to move to the cloud.

Sadly, the whole-group discussions, especially the monitoring one, were unfruitful.  For a long ass time people threw out brilliant quips about "Why would you bother monitoring a server anyway" and other such high-theory wonkery.  I got zero value out of these, which was sad because the topics were crucially interesting - just too unfocused; you had people coming at the problem 100 different ways in sound bytes.  The only note I bothered to write down was that "monitoring porn" (too many metrics) makes it hard to do correlation.  We had that problem here, and invested in a (horrors) non open-source tool, Opnet Panorama, that has an advanced analytics and correlation engine that can make some sense of tens of thousands of metrics for exactly that reason.

Sessions

There were three sessions.  I didn't take many notes in the first one because, being a Web ops guy, I was having to work a release simultaneously with attending OpsCamp :-P

The second was interesting.  Adam Jacob from Opscode moderated a talk on "DevOps - Is It The Devil?"  That's my version of the title, I think he said "anti-pattern".  Anyway, opinionson devops were mixed, as were opinions on what it means exactly.  Is it business alignment?  Sysadmins getting into the product code?  Better automation on the sysadmin side?  I have lots of opinions on this for later blog posts.  Also see the "dev2ops" blog for related info.

Adam says there'll be more on this at Velocity, they're planning an unconference the day after on this topic.

The third session was kinda cool.  I forget what it was supposed to be about, but what it turned into was a Mafia-style sitdown between all the major players that came including Luke, Adam, and Damon to talk about how to work together, in that a comprehensive model for automated infrastructure would be of joint value to everyone.  Big thoughts:

Controltier did a previous diagram and white paper showing how some of the tools fit together - it is well regarded and it helped me personally when I first started trying to figure out the CM landscape.

It's really here where big companies like HP and IBM beat out open source.  Their software isn't better by any stretch of the imagination.  I've personally used HP Deploy Management, for example, and it's really not as good as some of the open source offerings.  But when they come in, they are able to provide you a comprehensive picture of how everything fits together, what you need, etc. that makes doing business with them easier.

My corollary - guys, work together to get better.  You shouldn't be worried about your piece of the miniature current pie, you should be looking to cut into the IBM/CA/HP business and get part of the huge pie.

The products need APIs so they can be integrated.  Especially, people need to be able to integrate their system provisioning and monitoring off the same configs.

In the end, everyone told DTO they trust them to take a first shot at an architecture diagram and would be happy to edit.  Woot!

Random Tips

"DevOps" - new buzzword for the new role Ops folks are finding themselves in, this resonated with us as we're having to combine ops and dev in our new cloud projects.

Visible Ops - A good book on ops recommended highly to us.  This, I think?

git - I wasn't all that enthused about this new revision control system ("yet another one," I thought), but Luke went on about it for a long time and I think I see some of its cooler points now.

One of the organizers (forget which one) will start an "opsforum" google group so we can further collaborate online.  This is great - one of the biggest problems in the Web ops space is that there's no good single place to go to bring all this under one umbrella.  We've had the Velocity conference for a couple years and now we have OpsCamp, but between events it's all following people's blogs, no real community.

Here in Austin there's other semi related entities like the LPSA austin chapter, cactus (unix), and geekaustin.org but they're more Austin only and not focused quite on point for Web ops.

cobbler, a Linux install server - I hadn't heard of it before.  (I'll be honest, I try not to stay down at the OS level too much any more...)  But it sounds cool.

ControlTier, a cool open source automation tool/company we met at Velocity and liked, has changed focus somewhat and corporately has become DTO Solutions, more of a consultancy around the whole automation area.  Seems like a good move for them.

People are getting a little disgruntled with Velocity - it seems to be leaning real heavy towards the front end performance thing and losing any meaningful focus on operations.  I tend to agree - it needs a wider focus - performance overall, not just front end, and more ops stuff.  And I love open source but let's get the Splunks etc. of the world there too.

After Party

We tried to pay Spider House back adequately for providing the venue gratis by drinking the rest of the OpsCamp budget away there.  Mmm, Jameson.

Notable open source evangelist whurley showed up for dinner at Ruby's later; he even got Luke to loosen up a bit.  Talking over drinks and dinner revealed that for many of them, it was their first time in Texas "besides the airports." I thought it was a little funny that Texas still provokes a somewhat-joking fearfulness amongst visitors.  Being a native Texan, I have to admit on some level that pleases me.  Allow me to quote from Ulysses S. Grant's memoir:

"The journey was hazardous on account of Indians, and there were white men in Texas whom I would not have cared to meet in a secluded place."

I hope all the visitors had a good time in Austin, and I am excited to have some more OpsCamps!  I think they're planning for it to be yearly, but I'd be happy to have an "Austin only" thing more frequently.  A number of admins I know couldn't make it but would totally be down for such a thing.

]]>
http://www.webadminblog.com/index.php/2010/02/05/opscamp-debrief/feed/ 6
Come To OpsCamp! http://www.webadminblog.com/index.php/2010/01/25/come-to-opscamp/ http://www.webadminblog.com/index.php/2010/01/25/come-to-opscamp/#comments Tue, 26 Jan 2010 00:05:29 +0000 Ernest http://www.webadminblog.com/?p=366 Next weekend, Jan 30 2009, there's a Web Ops get-together here in Austin called OpsCamp!  It'll be a Web ops "unconference" with a cloud focus.  Right up our alley!  We hope to see you there.

]]>
http://www.webadminblog.com/index.php/2010/01/25/come-to-opscamp/feed/ 0