The Velocity 2008 Conference Experience – Part III

Jun.23, 2008 in Application Performance Management, Conferences, High Availability, Velocity 2008

In the afternoon, we move into full session mode. There’s two tracks, and I can only cover one, but that’s what I have Peco and Robert around for! Well, that and to have someone to outdrink. (Ooo burn!) They’ll be posting their writeups at some point as well – you can go to the Velocity schedule page to see the other sessions and to the presentations page to get slides where they exist.

First afternoon session: My panel! I am on the “Measuring Performance“ panel with Steve Souders, Ryan Breen of Gomez, Bill Scott of Netflix, and Scott Ruthfield from whitepages.com (a fellow Rice U/Lovetteer!) It went well. We talked about end user performance monitoring, all the other kinds of tools you can use and their drawbacks, and about “newfangled” monitoring of perf w/AJAX, SOA, RIAs, etc. No questions; not sure if the audience liked it or not. But I did get a number of people saying “good work” later so I’ll declare victory. 🙂

“Actionable Logging for Smoother Operation and Faster Recovery,” by Mandi Walls of AOL. It’s a quick 30 minute session. Logging should be actionable – concise, express symptoms. Anything logged is something fixable. It should be giving you less downtime – shorter time to resolution. Logging takes resources, so make it worth it.

Filter down your logs to be concise and actionable. Production logging has different goals from dev/QA logging. You’re looking for problem diagnosis and recovery, and then statistics and monitoring. Insight into what the app’s doing.

You need a standard log file location. On our UNIX servers, the UNIX team gives us “/opt/apps” as the place where we can put stuff and gets cranky about any files outside of that. We make everyone log to one place – /opt/apps/logs/<appname> for this reason. Makes it easy to manage disk space, rotate logs, run “find”s, etc.

Roll your logs and have a standard file naming format. We prefer log.YYYYMMDD[HHMMSS] because it’s then sorted in date order.

You want standard, good timestamps, formats, etc. Ideally. Hard to do in practice, which is why at NI we use Splunk for log file management – it can detect/be told about different formats, timestamps, etc. and it’ll do this for you. Have a standard, that’s fine, but most 3p software and some of your programmers won’t follow it.

Use log levels. Don’t log too much or not enough, and standards for levels help with that. Log lines should be helpful – what program module? What were the variables at hand?

Don’t log passwords, usernames, etc. Splunk has facilities to automatically suppress these by the way. I don’t own stock in them or anything, I’m just sayin’.

Logs are often the first line of information for troubleshooting, so the better it is, the better you can recover quickly.

My take on this session – all pretty basic, but solid. Logging 101.

Third session, another 30 minute quickie, is by Goranka Bjedov from Google, on stress, load, and performance testing in QA. She focuses on the back end, as opposed to Steve’s client side focus. She analyzes scalability, bottlenecks, probable issues, etc. and feeds them to ops.

QA is not brain surgery, she says, and it should be expected for them to provide this kind of information. And you don’t have to perfectly reproduce the production environment for it. You can learn 80% of it on a modest server under modest load. She totally eliminates the network, which “someone else should be looking at” (who?).

Tests aren’t 100% reproducible. You have to go statistical – run the tests several times and see averages and deviation. She prefers JMeter, The Grinder, and FunkLoad – consider OpenSTA in Windows. She finds they are as good as LoadRunner etc. They use log replay, not sure with what tool.

And that’s it! She writes about performance on the Google blog. I’ll check it out!

This session needed slides – “performance testing is easy” and “use open source” aren’t much to get out of one of these sessions.

Next, another longer 45 minute session – “Incident Command for IT: What We Can Lean From The Fire Department,” by Brent Chapman from Great Circle.

The core idea is that public safety agencies all deal with emergencies all the time. What are some best practices we can glean from them? They organize on the fly, coordinate efforts of multiple agencies, and evolve the organization as the incident progresses.

Example: a car hits a fire hydrant. You have fire, ambulance, police, water, power people all involved and in a specific order, and it’s a time critical event. Another example is SoCal wildfires. Obvious IT analogies (data center outage…).

So an “Incident Command System” was developed to address questions like this. It’s a set of standard tools for command, control, and coordination of incidents. Started in SoCal but has evolved into a national standard.

ICS recommends a modular, scalable org structure, consisting of command, ops, logistics, planning, and admin sections. Can be one person until more folks show up. Command section plans. Operations section does the work, and assists command in development of a consolidated action plan. It’s usually the largest. Planning maintains status & plans. Logistics section gets stuff. Admin/finance pays, tracks costs, etc. Sections are created/grown as needed.

The senior-most first responder is usually incident commander and transfer of command is explicit. Delegates work as necessary and possible.

Maintain a manageable span of control. Each supervisor should have 3-7 subordinates (5 ideal). New levels are created as needed.

Unity of command. In an incident each person has one boss, period. Matrixes have to be avoided in an emergency.

Transfers of responsibility are always explicit, and more senior arriving doesn’t necessarily take over.

Clear communications. All comms have to be clear and complete (no code). Talk directly to resources when possible, traversing the tree to get to them (keeping management informed).

Consolidated action plans. Command communicates high level action plan per operational period (hour to shift to day to whatever). Write it down, especially if it crosses organizational or specialty boundaries.

Management by objective. Tell people what to accomplish, not how.

Comprehensive resource management. All assets & personnel tracked via Admin section. Sign in and be assigned.

Designated incident facilities – a command post. And a staging area for resources.

Then he walks through a case study involving one of two data centers going offline. Hopefully the slides’ll be available because this is a lot of typing. It’s engaging though. We have tended to “roughly” follow this model in practice just by instinct – like I always make sure there’s one person who “has the ball” during an incident (command). I think one of the biggest takeaways is to understand as first on you’re mainly Command – and Ops, and Status – until you spin it off explicitly. Too many ops folks just do the ops and don’t do command or status.

In closing, you should practice ICS and use it for planned events like moves/upgrades. Download preso from www.greatcircle.com.

We do some things like that on our team. I am disappointed that this is basically a “what if” preso, not something he’s implemented in IT organizations… Seems like more of an Ignite candidate.

Now to try to hunt down treats… Apparently the Marriott staff brought out some snacks out in the hall during the session, and quickly took them away before the break started. Boo.

Tags: Conferences, performance, velocity, velocity08, velocityconf08

One Comment on “The Velocity 2008 Conference Experience – Part III”

Proxy Kid
May 11th, 2009 at 12:43 am
My teacher talked about this in school last week.

Welcome to WebAdminBlog!

This blog site is run by Josh Sokol, the Founder and CEO of SimpleRisk, a free tool for Governance, Risk Management, and Compliance. Josh is a former Web Admin and Information Security Program Owner of National Instruments.

Categories
Recent Posts
Recent Comments
devops
Links
Security
Tags
21ct agile amazon analysis application appsec attack aws browser cloud Cloud Computing code Conferences data devops ec2 firewall google hansen internet lynxeon malware Management network Operations owasp PCI performance project rsnake SaaS secure Security strategies velocity velocity08 velocityconf velocityconf08 velocityconf09 Virtualization vpn vulnerability waf web wifi
Categories
- Advertising (2)
- Application Performance Management (14)
- Automation (4)
- Browsers (4)
- Cloud Computing (9)
  - Elastic Compute Cloud (3)
- Conferences (64)
  - BSides Austin 2013 (1)
  - BSides Austin 2016 (1)
  - OWASP AppSec DC 2009 (16)
  - OWASP AppSec NYC 2008 (18)
  - OWASP LASCON 2017 (1)
  - OWASP LASCON 2018 (1)
  - TRISC 2009 (8)
  - Velocity 2008 (8)
  - Velocity 2009 (8)
- Content Management (2)
- Featured (3)
- Green Computing (1)
- High Availability (1)
- Log Management (2)
- Management (4)
- Monitoring (4)
- Networking (12)
  - Firewalls (4)
  - NetFlow (4)
- Operating Systems (2)
  - Linux (2)
  - Mac OSX (1)
  - Unix (2)
- Operations (11)
- Popular (2)
- SaaS (2)
- Sarcasm (1)
- Search (1)
  - Enterprise Search (1)
- Security (75)
  - Access Management (1)
  - Capture the Flag (4)
  - Cloud Computing (4)
  - Compliance (1)
  - Disaster Recovery (1)
  - Malware (4)
  - Metrics (2)
  - OWASP (2)
  - PCI (2)
  - Phishing (2)
  - Physical (1)
  - Risk Management (2)
  - Virtualization (3)
  - Web Application Security (32)
    - Dynamic Analysis (1)
    - Static Analysis (1)
  - Wireless Networks (5)
- Service-Oriented Architecture (1)
- Software and Tools (15)
  - Crashplan (1)
  - Drobo (1)
  - GRC (1)
- Training (2)
- Uncategorized (1)
- Virtualization (4)

Blogroll
- Agile Operations Blog
- Agile Testing
- Agile Web Operations
- Amazon Web Services Blog
- dev2ops – Web Ops at Scale
- Gilligan on Data Web Analytics pro tips
- ISSA Home The Information Systems Security Association (ISSA)® is a not-for-profit, international organization of information security professionals and practitioners.
- Kitchen Soap, A WebOps Blog
- Michael Howard's Blog Software security guy at Microsoft.
- National Instruments Home The majority of the contributers here are current or past NI employees.
- OWASP Home The Open Web Application Security Project (OWASP) is a worldwide free and open community focused on improving the security of application software.
- RSnake's Blog ha.ckers.org web application security lab
- Server Fault
- Steve Souders’ Blog Google High Performance Guru
- The Madstop
- The Open Minded Enterprise
- The Simple Logic
- Transparent Uptime blog
Archives
- March 2019
- October 2017
- April 2016
- January 2016
- December 2015
- May 2015
- November 2014
- August 2014
- June 2014
- May 2014
- October 2013
- September 2013
- August 2013
- May 2013
- March 2013
- February 2013
- October 2012
- May 2011
- April 2011
- December 2010
- July 2010
- June 2010
- April 2010
- March 2010
- February 2010
- January 2010
- November 2009
- September 2009
- July 2009
- June 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
Tag Cloud
21ct agile amazon analysis application appsec attack aws browser cloud Cloud Computing code Conferences data devops ec2 firewall google hansen internet lynxeon malware Management network Operations owasp PCI performance project rsnake SaaS secure Security strategies velocity velocity08 velocityconf velocityconf08 velocityconf09 Virtualization vpn vulnerability waf web wifi

Web Admin Blog

Real Web Admins. Real World Experience.