Want to hear me spout off more about DevOps? Well, here's your chance; I did an interview with Damon Edwards of DTO and they've posted it on the dev2ops blog!
"I say this as somebody who about 15 years ago chose system administration over development. But system administration and system administrators have allowed themselves to lag in maturity behind what the state of the art is. These new technologies are finally causing us to be held to account to modernize the way we do things. And I think that’s a welcome and healthy challenge."
From the "sad but true" files comes an extremely insightful point apparently discussed over beer by the UK devops crew recently - that we are talking about dev and ops collaboration but the current state of collaboration among ops teams is pretty crappy.
- Internal Borders by Graham Bleach
- DevOps is a good cause, but what about OpsOps? by the Build Doctor
- DevOps, SecOps, DBAOps, NetOps by Kris Buytaert
This resonates deeply with me. I've seen that problem in spades. I think in general that a lot of the discussion about the agile ops space is too simplistic in that it seems tuned to organizations of "five guys, three of whom are coders and two of whom are operations" and there's no differentiation. In real life, there's often larger orgs and a lot of differentiation that causes various collaboration challenges. Some people refer to this as Web vs Enterprise, but I don't think that's strictly true; once your Web shop grows from 5 guys to 200 it runs afoul of this too - it's a simple scalability and organizational engineering problem.
As an aside, I don't even like the "Ops" term - a sysadmin team can split into subgroups that do systems engineering, release management, and operational support... Just saying "Ops" seems to me to create implications of not being a partner in the initial design and development of the overall system/app/service/site/whatever you want to call it.
Here, we have a large Infrastructure department. Originally, it was completely siloed by technology verticals, and there's a lot of subgroups. Network, UNIX, Windows, DBA, Lotus Notes, Telecom, Storage, Data Center... Some ten plus years ago when the company launched their Web site in earnest, they quickly realized that wasn't going to work out. You had the buck-passing behavior described in the blog posts above that made issues impossible to solve in a timely fashion, plus it made collaboration with devs/business nearly impossible. Not only did you need like 8 admins to come involve themselves in your project, but they did not speak similar enough languages - you'd have some crusty UNIX admin yelling "WHAT ABOUT THE INODES" until the business analyst started to cry.
But are our developers here better off? They are siloed by business unit. Just among the Web developers there's the eCommerce developers, eCRM, Product Advisors, Community, Support, Content Management... On the one hand, they are able to be very agile in creating solutions inside their specific niche. On the other hand, they are all working within the same system environment, and they don't always stay on the same page in terms of what technologies they are using. "Well, I'm sure THAT team bought a lovely million dollar CMS, but we're going to buy our own different million dollar CMS. No, you don't get more admin resource." Over time, they tried to produce architecture groups and other cross-team initiatives to try to rein in the craziness, with mixed but overall positive results.
Plugging the Dike
What we did was create a Web Administration group (Web Ops, whatever you want to call it) that was holistically responsible for Web site uptime, performance, and security. Running that team was my previous gig, did it for five years. That group was more horizontally focused and would serve as an interface to the various technology verticals; it worked closely with developers in system design during development, coordinated the release process, and involved devs in troubleshooting during the production phase.
In fact, we didn't just partner with the developers - we partnered with the business owners of our Web site too, instead of tolerating the old model of "Business collaborates with the developers, who then come and tell ops what to do." This was a remarkably easy sell really. The company lost money every minute the Web site was down, and it was clear that the dev silos weren't going to be able to fix that any more than the ops silos were. So we quickly got a seat at the same table.
This was a huge success. To this day, our director of Web Marketing is one of the biggest advocates of the Web operations team. Since then, other application administration (our word for this cross-disciplinary ops) teams have formed along the same model. The DevOps collaboration has been good overall - with certain stresses coming from the Web Ops team's role as gatekeeper and process enforcement. Ironically, the biggest issues and worst relationships were within Infrastructure between the ops teams!
OpsOps - The Fly In The Ointment
The ops team silos haven't gone down quietly. To this day the head DBA still says "I don't see a good reason for you guys [WebOps] to exist." I think there's a common "a thing is just the sum of its parts" mindset among admins for whatever reason. There are also turf wars arising from the technology silo division and the blurring of technology lines by modern tech. I tried again and again to pitch "collaborative system administration." But the default sysadmin behavior is to say "these systems are mine and I have root on them. Those are your systems and you have root on them. Stay on your side of the line and I'll stay on mine."
Fun specific Catch-22 situations we found ourselves in:
- Buying a monitoring tool that correlates events across all the different tiers to help root-cause production problems - but the DBAs refusing to allow it on "their" databases.
- Buying a hardware load balancer - we were going to manage it, not the network team, and it wasn't a UNIX or Windows server, so we couldn't get anyone to rack and jack it (and of course we weren't allowed to because "Why would a webops person need server room access, that's what the other teams are for").
Some of the problem is just attitude, pure and simple. We had problems even with collaboration inside the various ops teams! We'd work with one DBA to design a system and then later need to get support from another DBA, who would gripe that "no one told/consulted them!" Part of the value of the agile principles that "DevOps" tries to distill is just a generic "get it into your damn head you need to be communicating and working together and that needs to be your default mode of operation." I think it's great to harp on that message because it's little understood among ops. For every dev group that deliberately ostracizes their ops team, there's two ops teams who don't think they need to talk to the devs - in the end, it's mostly our fault.
Part of the problem is organizational. I also believe (and ITIL, I think, agrees with me) that the technology-silo model has outlived its usefulness. I'd like to see admin teams organized by service area with integral DBAs, OS admins, etc. But people are scared of this for a couple reasons. One is that those admins might do things differently from area to area (the same problem we have with our devs) - this could be mitigated by "same tech" cross-org standards/discussions. The other is that this model is not the cheapest. You can squeeze every last penny out if you only have 4 Windows admins and they're shared by 8 functional areas. Of course, you are cutting off your nose to spite your face because you lose lots more in abandoned agility, but frankly corporate finance rules (minimize G&A spending) are a powerful driver here.
If nothing else, there's not "one right organization" - I'd be tempted to reorg everyone from verticals into horizontals, let that run for 5 years, and then reorg back the other way, just to keep the stratification from setting in.
Specialist vs Generalist
One other issue. The Web Ops team we created required us to hire generalists - but generalists that knew their stuff in a lot of different areas. It became very hard to hire for that position and training took months before someone was at all effective. Being a generalist doesn't scale well. Specialization is inevitable and, indeed, desirable (as I think pretty much anything in the history of anything demonstrates). You can mitigate that with some cross-training and having people be generalists in some areas, but in the end, once you get past that "three devs, two ops, that's the company" model, specialization is needed.
That's why I think one of the common definitions of DevOps - all ops folks learning to be developers or vice versa - is fundamentally flawed. It's not sustainable. You either need to hire all expensive superstars that can be good at both, or you hire people that suck at both.
What you do is have people with varying mixes. In my current team we have a continuum of pure ops people, ops folks doing light dev, devs doing light ops, and pure devs. It's good to have some folks who are generalizing and some who are specializing. It's not specializing that is bad, it's specialists who don't collaborate that are bad.
So I've shared a lot of experiences and opinions above but I'm not sure I have a brilliant solution to the problem. I do think we need to recognize that Ops/Ops collaboration is an issue that arises with scale and one potentially even harder to overcome than Dev/Ops collaboration. I do think stressing collaboration as a value and trying to break down organizational silos may help. I'd be happy to hear other folks' experiences and thoughts!
I recently read a great blog post by Scott Wilson that was talking about the definitions of Agile Operations, DevOps, and related terms. (Read the comments too, there's some good discussion.) From what I've heard so far, there are a bunch of semi-related terms people are using around this whole "new thing of ours."
The first is DevOps, which has two totally different frequently used definitions.
1. Developers and Ops working closely together - the "hugs and collaboration" definition
2. Operations folks uptaking development best practices and writing code for system automation
The second is Agile Operations, which also has different meanings.
1. Same as DevOps, whichever definition of that I'm using
2. Using agile principles to run operations - process techniques, like iterative development or even kanban/TPS kinds of process stuff. Often with a goal of "faster!"
3. Using automation - version control, automatic provisioning/control/monitoring. Sometimes called "Infrastructure Automation" or similar.
This leads to some confusion, as most of these specific elements can be implemented in isolation. For example, I think the discussion at OpsCamp about "Is DevOps an antipattern" was predicated on an assumption that DevOps meant only DevOps definition #2, "ops guys trying to be developers," and made the discussion somewhat odd to people with other assumed definitions.
I have a proposed set of definitions. To explain it, let's look at Agile Development and see how it's defined.
- Agile Principles - like "business/users and developers working together." These are the core values that inform agile, like collaboration, people over process, software over documentation, and responding to change over planning.
- Agile Methods - specific process types. Iterations, Lean, XP, Scrum. "As opposed to waterfall."
- Agile Practices - techniques often found in conjunction with agile development, not linked to a given method flavor, like test driven development, continuous integration, etc.
I believe the different parts of Agile Operations that people are talking about map directly to these three levels.
- Agile Operations Principles includes things like dev/ops collaboration (DevOps definition 1 above); things like James Turnbull's 4-part model seem to be spot on examples of trying to define this arena.
- Agile Operations Methods includes process you use to conduct operations - iterations, kanban, stuff you'd read in Visible Ops; Agile Operations definition #2 above.
- Agile Operations Practices includes specific techniques like automated build/provisioning, monitoring, anything you'd have a "toolchain" for. This contains DevOps definition #2 and Agile Operations definition #3 above.
I think it's helpful to break them up along the same lines as agile development, however, because in the end some of those levels should merge once developers understand ops is part of system development too... There shouldn't be a separate "user/dev collaboration" and "dev/ops collaboration," in a properly mature model it should become a "user/dev/ops collaboration," for example.
I think the dev2ops guys' "People over Process over Tools" diagram mirrors this about exactly - the people being one of the important agile principles, process being a large part of the methods, and tools being used to empower the practices.
What I like about that diagram, and why I want to bring this all back to the Agile Manifesto discussion, is that the risk of having various sub-definitions increases the risk that people will implement the processes or tools without the principles in mind, which is definitely an antipattern. The Agile guys would tell you that iterations without collaboration is likely to not work out real well.
And it happens in agile development too - there are some teams here at my company that have adopted the methods and/or tools of agile but not its principles, and the results are suboptimal.
Therefore I propose that "Agile Operations" is an umbrella term for all these things, and we keep in mind the principles/methods/practices differentiation.
If we want to call the principles "devops" for short and some of the practices "infrastructure automation" for short I think that would be fine... Although dev/ops collaboration is ONE of the important principles - but probably not the entirety; and infrastructure automation is one of the important practices, but there are probably others.
It's funny. When we recently started working on an upgrade of our Intranet social media platform, and we were trying to figure out how to meld the infrastructure-change-heavy operation with the need for devs, designers, and testers to be able to start working on the system before "three months from now," we broached the idea of "maybe we should do that in iterations!" First, get the new wiki up and working. Then, worry about tuning, switching the back end database, etc. Very basic, but it got me thinking about the problem in terms of "hey, Infrastructure still operates in terms of waterfall, don't we."
Then when Peco and I moved over to NI R&D and started working on cloud-based systems, we quickly realized the need for our infrastructure to be completely programmable - that is, not manually tweaked and controlled, but run in a completely automated fashion. Also, since we were two systems guys embedded in a large development org that's using agile, we were heavily pressured to work in iterations along with them. This was initially a shock - my default project plan has, in traditional fashion, months worth of evaluating, installing, and configuring various technology components before anything's up and running. But as we began to execute in that way, I started to see that no, really, agile is possible for infrastructure work - at least "mostly." Technologies like cloud computing help, but there's still a little more up front work required than with programming - but you can get mostly towards an agile methodology (and mindset!).
Then at OpsCamp last month, we discovered that there's been this whole Agile Operations/Automated Infrastructure/devops movement thing already in progress we hadn't heard about. I don't keep in touch with The Blogosphere (tm) enough I guess. Anyway, turns out a bunch of other folks have suddenly come to the exact same conclusion and there's exciting work going on re: how to make operations agile, automate infrastructure, and meld development and ops work.
So if you also hadn't been up on this, here's a roundup of some good related core thoughts on these topics for your reading pleasure!
- Automated Infrastructure enables Agile Operations
- Virtualized/Abstracted Administration
- Agile Manifesto Co-Author On Agile Operations
- Agile Web Operations Blog
- DevOpsdays - one coming to the US in 2010, they say.
- Building an Automated Infrastructure
- Agile Manifesto - if you're not a developer and only have a vague impression of what "agile" is
- Extreme Automated Infrastructure
This presentation, entitled "Security in Agile Development: Breaking the Waterfall Mindset of the Security Industry" was by Dave Wichers, member of the OWASP board and cofounder and COO of Aspect Security.
Manifesto for Agile Software Development
Individuals and interactions over processes and tools. Working software over comprehensive documentation. Customer collaboration over contract negotiation. Responding to change over following a plan.
- Agile practices test driven development, pair programming, and doing the simplest thing.
- Planning Sprint (Sprint 0) - define user stories
- Develop in sprints and focus on what the customer wants first in short iterative development cycles
Assurance is the Goal
- "Assurance is the level of confidence that software functions as intended and is free of vulnerabilities, either intentionally or unintentionally designed or inserted as part of the software" - DOD
- Can agile software development methods generate assurance?
- "test-driven development places (functional) assurance squarely at the heart of development" - Johan Peters
Waterfall Security is "Breadth First"
- Build assurance layer-by-layer
- Challenges are problem space is very large, difficult to prioritize, ...
Agile vs Security
- Where to insert security activities?
Security in Agile (nice chart here)
- Add Threat Modeling and Stakeholder Security Stories at the beginning between the Story FInding/Initial Estimation
- Do periodic security sprints (if needed) between writing the story and scenario and implementing functionality and acceptance tests
- Do some independent expert testing and security architecture review support in the quality assurance phase
- Add Application Security Assurance Review between system testing and release phases
Key Agile Security Enablers
- Standard Security Controls: See the OWASP Enterprise Security API (ESAPI) Project
- Secure Coding Standards: How to properly use your standard security controls. How to avoid common security flaws. Automated code analysis.
- Developer Security Training: How to use your standard controls and avoid common flaws
- Support from Security Expers: Even with training and standard controls, security is hard. Access to security experts and independent testing/analysis is key. Ideally, a security expert would be on the team (but usually not possible).
Planning Sprint (Sprint 0)
- Identify StakeholdersL Ask them what thier most important security concerns are. Work with them on the basic security controls required based on system purpose, environment, existence of such mechanisms, etc
- Confidentiality: Who is allowed to access what data and how? How important is protecting this data? Regulatory requirements?
- Integrity: What data must be protected and to what degree?
- Availability: How important is system availability? Can we define an SLA?
Planning Sprint: Capture Risks in Stakeholder Security StoriesAssurance is the level of confidence
- As a User...I want to be the only one who can access my account so that I can keep my information private.
- As a User...I want my personal information encrypted in storage and transit so that it doesn't get stolen by attackers.
- As a Manager...I want to be the only one who can edit Employee salaries so that I can prevent fraud.
- As a Business Owner...I want all security critical actions logged, so that attacks can be noticed and diagnosed.
Building Assurance "Depth First"
- Identify most important security concerns and their required security mechanisms
- Within sprints, or in periodic security sprints develop test methods for them and their use, configure/implement/analyze these security mechanisms, and run the tests
Implement Stakeholder Security Stories
- Security stories are implemented just like other stories. Test-driven development (unit test cases come before the code). Continuous reviews and inspection (pair programming/constant information reviews)
Test Cases for Security Controls
- Security "requirements" are defined by developing test cases. Unit tests can test both positive (functional) and negative (not broken) aspects of security mechanisms. Tests are repeatable, providing full regression testing. But not true penetration testing or analysis.
- Real experience with test driven development. The OWASP Enterprise Security API.
- Results in significant increase in assurance
Test Cases for Security Stories
- Functional test cases. Typical unit testing by developers. Verify presence and proper function of security control. May include simple tests with a browser.
- Security test cases. Check for best practices. Test for common pitfalls. Hopefully, most come with your standard security controls.
- Test cases provide strong assurance evidence
- Independent security testing. Verifies that functional and security tests were performed. Provides additional specialized security testing expertise.
Periodic Security Sprints
- As necessary, build/integrate related security controls. Implemente highest priority related security controls first. Leveraging your standard security components is key. Building significant new security controls is hard. Security sprints may even be completely avoided if sufficient standard components are available.
- Examples: Authentication, sessions, authorization, validation, canonicalization, encoding, error handling, logging, intrusion detection
Perform Agile Security Reviews
- Security reviews: verify all are in place and complete. Threat model, security stories, security controls, test cases, test results. Notice: Most are standard agile artifacts, not just add-on security deliverables.
- Application code review and penetration testing. Added for critical applications to increase assurance. Manual (tool supported), automated, or both. Within security sprints and/or predeployment testing.
Example: Agile Access Control
- With standard access control components, just make sure "isAuthorized() is called where needed both in presentation layer and business logic. Stay focused on implementing the functionality
- Define user stories aroudn who can do what. Configure your policy for what is most important first. Define and restrict what normal users can do. Policy can be both declarative and programmatic.
- How do you test proper implementation? Develop policy specific test cases to make sure policy is enforced properly.
Security in Agile Summary
- Agile can generate assurance well, possibly better
- Approach is depth-first, not breadth-first
- Getting the right stakeholder security stories is key
- In traditional security, assurance comes primarily from expert security reviews at successive stages of development. In agile security, assurance comes from managing the key risks to the security stakeholders.