Let's say that you go to the same restaurant at least once a week for an entire year. The staff is always friendly, the menu always has something that sounds appealing, and the food is always good enough to keep you coming back for more. The only real drawback is that it usually takes a solid half-hour to get your food, but you've learned to find something else to do while you're waiting because it's always been worth the wait. Today you go into the same restaurant, but now the staff goes out of their way to service you, the menu has twice as much selection as before, the food is literally the best thing you've ever tasted, and it was on your table just the way you like it within 30 seconds of placing your order. This is my initial impression of the newly released version of 21CT's LYNXeon software (version 2.29).
I'll be honest. Before we upgraded to the new version I had mixed feelings. On one hand, I loved the data that the LYNXeon platform was giving me. The ability to comb through NetFlow data and find potentially malicious patterns in it was unlike any other security tool that I've experienced. On the other hand, the queries sometimes ran for half an hour or more before I had any results to analyze. I learned to save my queries for when I knew my computer would be sitting idle for a while. It was a burden that I was willing to undertake for the results, but a burden nonetheless. We upgraded to LYNXeon 2.29 less than a week ago, but already I can tell that this is a huge leap in the right direction for 21CT's flagship network pattern analysis software. Those same queries that used to take 30 minutes now take 30 seconds or less to complete. The reason being is a massive overhaul of the database layer of the platform. By switching to a grid-based, column-oriented, database structure for storing and querying data, the product was transformed from a pack mule into a thoroughbred.
Enhanced performance wasn't the only feature that found it's way into the 2.29 release. They also refactored the way that LYNXeon consumes data as well. While the old platform did a fairly good job of consuming NetFlow data, adding in other data sources to your analytics was a challenge to say the least; usually requiring custom integration work to make it happen. The new platform has added the concept of a connector with new data types and a framework around how to ingest these different types of data. It may still require some assistance from support in order to consume data types other than NetFlow, but it's nowhere near the level of effort it was before the upgrade. We were up and running with the new version of LYNXeon, consuming NetFlow, IPS alerts, and alerts from our FireEye malware prevention system, in a few hours. The system is capable of adding DNS queries, HTTP queries, and so much more. What this amounts to is that LYNXeon is now a flexible platform that can allow you to consume data from many different security tools and then visualize and correlate them in one place. Kinda like a SIEM, but actually useful.
As with any tool, I'm sure that LYNXeon 2.29 won't be without it's share of bugs, but overall the new platform is a huge improvement over the old and with what I've seen so far I gotta say that I'm impressed. 21CT is undoubtedly moving in the right direction and I'm excited to see what these guys do with the platform going forward. That's my first impression of the 21CT LYNXeon 2.29 release.
O'Reilly's Velocity conference is the only generalized Web ops and performance conference out there. We really like it; you can go to various other conferences and have 10-20% of the content useful to you as a Web Admin, or you can go here and have most of it be relevant!
They've been doing some interim freebie Web conferences and there's one coming up. Check it out. They'll be talking about performance functionality in Google Webmaster Tools, mySQL, Show Slow, provisioning tools, and dynaTrace's new AJAX performance analysis tool.
O'Reilly Velocity Online Conference: "Speed and Stability"
Thursday, March 17; 9:00am PST
Dave Artz has put together a simple Webcast tutorial on how to use webpagetest.org to measure and fix up your Web site. If all this talk about Web performance is a bit overwhelming, it's a great novice tutorial. He walks through the entire process visually and explains each metric. Great job Dave!
We've reached the last couple sessions at Velocity 2008. Read me! Love me!
We hear about Capacity Planning with John Allspaw of Flickr. He says: No benchmarks! Use real production data. (How? We had to develop a program called WebReplay to do this because no one had anything. We're open sourcing it soon, stay tuned.)
Use "safety factors" (from traditional engineering). Aka a reserve, overhead, etc.
They use squid a bunch. At NI we've been looking at Oracle's WebCache - mainly because it supports ESIs and we're thinking that may be a good way to go. There's a half assed ESI plugin for squid but we hear it doesn't work; apparently Zope paid for ESI support in squid 3.0 but no traction on that in 4 years best as we can tell. But I'd be happy not to spend the money.
OK, now we're to the final stretch of presentations for Day One.
"Cadillac or Nascar: A Non-Religious Investigation of Modern Web Technologies," by Akara and Shanti from Sun.
Web20kit is a new reference architecture from Sun to evaluate modern Web technologies. It's implemented in PHP, JavaEE, and Ruby. It'll be open sourced in the fall.
It uses a web/app server - apache, glassfish, and mongrel - with a cache (memcached), a db (mySQL), an object store (NFS/MogileFS), a driver, and a geocoder. The sample app is a social event calendar with a good bit of AJAX frontend.
I apologize for any lack of coherence in this writeup, but I was at the back of the hall, the mike wasn't turned up enough, and there were accents to drill through.
In the afternoon, we move into full session mode. There's two tracks, and I can only cover one, but that's what I have Peco and Robert around for! Well, that and to have someone to outdrink. (Ooo burn!) They'll be posting their writeups at some point as well - you can go to the Velocity schedule page to see the other sessions and to the presentations page to get slides where they exist.
First afternoon session: My panel! I am on the "Measuring Performance" panel with Steve Souders, Ryan Breen of Gomez, Bill Scott of Netflix, and Scott Ruthfield from whitepages.com (a fellow Rice U/Lovetteer!) It went well. We talked about end user performance monitoring, all the other kinds of tools you can use and their drawbacks, and about "newfangled" monitoring of perf w/AJAX, SOA, RIAs, etc. No questions; not sure if the audience liked it or not. But I did get a number of people saying "good work" later so I'll declare victory. 🙂
"Actionable Logging for Smoother Operation and Faster Recovery," by Mandi Walls of AOL. It's a quick 30 minute session. Logging should be actionable - concise, express symptoms. Anything logged is something fixable. It should be giving you less downtime - shorter time to resolution. Logging takes resources, so make it worth it.
Filter down your logs to be concise and actionable. Production logging has different goals from dev/QA logging. You're looking for problem diagnosis and recovery, and then statistics and monitoring. Insight into what the app's doing.
You need a standard log file location. On our UNIX servers, the UNIX team gives us "/opt/apps" as the place where we can put stuff and gets cranky about any files outside of that. We make everyone log to one place - /opt/apps/logs/<appname> for this reason. Makes it easy to manage disk space, rotate logs, run "find"s, etc.
Just two more keynotes till lunch, but these are larger ones (the previous speakers were 15 minutes apiece; these are 45). I'll try to take good notes; every conference always says they're going to make all the slides available afterwards but at best they usually get a 50% success rate on that.
First, Luiz Barroso from Google speaks on energy efficient operations. Now, server usage is only about 1% of total electricity consumption, but it doubled between 2000 and 2005. Measuring computing energy efficiency is harder than measuring a refrigerator or the like. Efficiency is defined as work done/energy used in physics terms. Efficiency for IT can be broken down into computing efficiency (work done/chip energy), server efficiency (chip energy/server energy) and server room efficiency (server energy/server room energy). Surveys show an average PUE (1/server room efficiency) of 1.83, and power supplies dissipate 25% of the power going to servers uselessly, more in PCs. Servers have poor (computing) energy efficiency in their most common usage range.
How do we address this? First, the power provisioning problem in the data center. Energy isn't the largest cost - building the center itself takes $10-$22 per watt, but the 10 year power is $9/watt. Efficiency saves on both. According to the uptime institute, the average cost breakdown is datacenter - 28%, electricity - 22%, hardware - 50%. (Software dwarfs this in many shops, I'll note.)
I'm starting out the first year of Velocity, the new O'Reilly-sponsored Web Performance and Operations Conference, watching robots dance to Beck on a video screen. The conference's tagline is "fast, scalable, resilient, available," which is just about identical to our Web Systems' team's charter. (And our reputation with the ladies!)
For a long time, we've had to bottom-feed off of developer conferences, general interest conferences, etc. to address Web site operational issues; it's great to see a conference specifically targeted at this growing area. The conference staff noted that the demand was way above what was expected, and were scurrying about to ensure they had enough materials. By rough headcount in the first keynote I'd estimate 400 attendees, with more arriving over time as West Coast standard wakeup time (10 AM, for the record) comes along.