Web Admin Blog Real Web Admins. Real World Experience.

9Feb/100

Enterprise Systems vs. Agility

I was recently reading a good Cameron Purdy post where he talks about his eight theses regarding why startups or students can pull stuff off that large enterprise IT shops can't.

My summary/trenchant restatement of his points:

  1. Changing existing systems is harder than making a custom-built new one (version 2 is harder)
  2. IT veterans overcomplicate new systems
  3. The complexity of a system increases exponentially the work needed to change it (versions 3 and 4 are way way harder)
  4. Students/startups do fail a lot, you just don't see those
  5. Risk management steps add friction
  6. Organizational overhead (paperwork/meetings) adds friction
  7. Only overconservative goons work in enterprise IT anyway
  8. The larger the org, the more conflict

Though I suspect #1 and #3 are the same, #2 and #5 are the same, and #6 and #8 are the same, really.

I've been thinking about this lately with my change from our enterprise IT Web site to a new greenfield cloud-hosted SaaS product in our R&D organization.  It's definitely a huge breath of fresh air to be able to move fast.  My observations:

Complexity

The problem of systems complexity (theses #1 and #3) is a very real one.  I used to describe our Web site as having reached "system gridlock."  There were hundreds of apps running dozens to a server with poorly documented dependencies on all kinds of stuff.  You would go in and find something that looked "wrong" - an Apache config, script, load balancer rule, whatever - but if you touched it some house of cards somewhere would come tumbling down.  Since every app developer was allowed to design their own app in its own tightly coupled way, we had to implement draconian change control and release processes in an attempt to stem the tide of people lining up to crash the Web site.

We have a new system design philosophy for our new gig which I refer to as "sharing is the devil."  All components are separated and loosely coupled.  Using cloud computing for hardware and open source for software makes it easy and affordable to have a box that does "only one thing."  In traditional compute environments there's pressure to "use up all that CPU before you add more", which results in a penny wise, pound foolish strategy of consolidation.  More and more apps and functions get crunched closer together and when you go back to pull them out you discover that all kinds of new connections and dependencies have formed unbidden.

Complication

Overcomplicating systems (#2 and #5) can be somewhat overcome by using agile principles.  We've been delving heavily into doing not just our apps but also our infrastructure according to an agile methodology.  It surfaces your requirements - frankly, systems people often get away with implementing whatever they want, without having a spec let alone one open to review.  Also, it makes you prioritize.  "Whatever you can get done in this two week iteration, that's what you'll have done, and it should be working."  It forces focus on what is required to get things to work and delays more complex niceties till later as there's time.

Conservatism

Both small and large organizations can suffer from #6 and #8.  That's mostly a mindset issue.  I like to tell the story about how we were working on a high level joint IT/business vision for our Web site.  We identified a number of "pillars" of the strategy we were developing - performance, availability, TCO, etc.  I had identified agility as one, but one of the application directors just wasn't buying into it.  "Agility, that's weird, how do we measure that, we should just forget about it."  I finally had to take all the things we had to the business head of the Web and say "of these, which would you say is the single most important one?"  "Agility, of course," he said, as I knew he would.  I made it a point to train my staff that "getting it done" was the most important thing, more important than risk mitigation or crossing all the t's and dotting all the i's.  That can be difficult if the larger organization doesn't reward risk and achievement over conservatism, but you can work on it.

5Feb/104

OpsCamp Debrief

I went to OpsCamp this last weekend here in Austin, a get-togther for Web operations folks specifically focusing on the cloud, and it was a great time!  Here's my after action report.

The event invite said it was in the Spider House, a cool local coffee bar/normal bar.  I hadn't been there before, but other people that had said "That's insane!  They'll never fit that many people!  There's outside seating but it's freezing out!"  That gave me some degree of trepidation, but I still racked out in time to get downtown by 8 AM on a Saturday (sigh!).  Happily, it turned out that the event was really in the adjacent music/whatnot venue also owned by Spider House, the United States Art Authority, which they kindly allowed us to use for free!  There were a lot of people there; we weren't overfilling the place but it was definitely at capacity, there were near 100 people there.

I had just hears of OpsCamp through word of mouth, and figured it was just going to be a gathering of local Austin Web ops types.  Which would be entertaining enough, certainly.  But as I looked around the room I started recognizing a lot of guys from Velocity and other major shows; CEOs and other high ranked guys from various Web ops related tool companies.  Sponsors included John Willis and Adam Jacob (creator of Chef) from Opscode , Luke Kanies from Reductive Labs (creator of Puppet), Damon Edwards and Alex Honor from DTO Solutions (formerly ControlTier), Mark Hinkle and Matt Ray from Zenoss, Dave Nielsen (CloudCamp), Michael Coté (Redmonk), Bitnami, Spiceworks, and Rackspace Cloud.  Other than that, there were a lot of random Austinites and some guys from big local outfits (Dell, IBM).

You can read all the tweets about the event if you swing that way.

OpsCamp kinda grew out of an earlier thing, BarCampESM, also in Austin two years ago.  I never heard about that, wish I had.

How It Went

I had never been to an "unconference" before.  Basically there's no set agenda, it's self-emergent.  It worked pretty well.  I'll describe the process a bit for other noobs.

First, there was a round of lightning talks.  Brett from Rackspace noted that "size matters," Bill from Zenoss said "monitoring is important," and Luke from Reductive claimed that "in 2-4 years 'cloud' won't be a big deal, it'll just be how people are doing things - unless you're a jackass."

Then it was time for sessions.  People got up and wrote a proposed session name on a piece of paper and then went in front of the group and pitched it, a hand-count of "how many people find this interesting" was taken.

Candidates included:

  • service level to resolution
  • physical access to your cloud assets
  • autodiscovery of systems
  • decompose monitoring into tool chain
  • tool chain for automatic provisioning
  • monitoring from the cloud
  • monitoring in the cloud - widely dispersed components
  • agent based monitoring evolution
  • devops is the debil - change to the role of sysadmins
  • And more

We decided that so many of these touched on two major topics that we should do group discussions on them before going to sessions.  They were:

  • monitoring in the cloud
  • config mgmt in the cloud

This seemed like a good idea; these are indeed the two major areas of concern when trying to move to the cloud.

Sadly, the whole-group discussions, especially the monitoring one, were unfruitful.  For a long ass time people threw out brilliant quips about "Why would you bother monitoring a server anyway" and other such high-theory wonkery.  I got zero value out of these, which was sad because the topics were crucially interesting - just too unfocused; you had people coming at the problem 100 different ways in sound bytes.  The only note I bothered to write down was that "monitoring porn" (too many metrics) makes it hard to do correlation.  We had that problem here, and invested in a (horrors) non open-source tool, Opnet Panorama, that has an advanced analytics and correlation engine that can make some sense of tens of thousands of metrics for exactly that reason.

Sessions

There were three sessions.  I didn't take many notes in the first one because, being a Web ops guy, I was having to work a release simultaneously with attending OpsCamp :-P