<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Web Admin Blog &#187; performance</title>
	<atom:link href="http://www.webadminblog.com/index.php/tag/performance/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.webadminblog.com</link>
	<description>Real Web Admins.  Real World Experience.</description>
	<lastBuildDate>Wed, 25 May 2011 03:02:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Upcoming Free Velocity WebOps Web Conference</title>
		<link>http://www.webadminblog.com/index.php/2010/03/11/upcoming-free-velocity-webops-web-conference/</link>
		<comments>http://www.webadminblog.com/index.php/2010/03/11/upcoming-free-velocity-webops-web-conference/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 14:34:49 +0000</pubDate>
		<dc:creator>Ernest</dc:creator>
				<category><![CDATA[Application Performance Management]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Operations]]></category>
		<category><![CDATA[ops]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[velocity]]></category>

		<guid isPermaLink="false">http://www.webadminblog.com/?p=421</guid>
		<description><![CDATA[O'Reilly's Velocity conference is the only generalized Web ops and performance conference out there.  We really like it; you can go to various other conferences and have 10-20% of the content useful to you as a Web Admin, or you can go here and have most of it be relevant! They've been doing some interim [...]]]></description>
			<content:encoded><![CDATA[<p>O'Reilly's <a href="http://en.oreilly.com/velocity2010">Velocity conference</a> is the only generalized Web ops and performance conference out there.  We really like it; you can go to various other conferences and have 10-20% of the content useful to you as a Web Admin, or you can go here and have most of it be relevant!</p>
<p>They've been doing some interim freebie Web conferences and there's one coming up.  Check it out.  They'll be talking about performance functionality in Google Webmaster Tools, mySQL, Show Slow, provisioning tools, and dynaTrace's new AJAX performance analysis tool.</p>
<p><a href="http://conferences.oreilly.com/velocityonline">O'Reilly Velocity Online Conference: "Speed and Stability"</a><br />
Thursday, March 17; 9:00am PST<br />
Cost: Free</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webadminblog.com/index.php/2010/03/11/upcoming-free-velocity-webops-web-conference/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Optimizing Web Performance with AOL Pagetest</title>
		<link>http://www.webadminblog.com/index.php/2008/07/11/optimizing-web-performance-with-aol-pagetest/</link>
		<comments>http://www.webadminblog.com/index.php/2008/07/11/optimizing-web-performance-with-aol-pagetest/#comments</comments>
		<pubDate>Fri, 11 Jul 2008 20:51:58 +0000</pubDate>
		<dc:creator>Ernest</dc:creator>
				<category><![CDATA[Application Performance Management]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://webadminblog.com/?p=28</guid>
		<description><![CDATA[Dave Artz has put together a simple Webcast tutorial on how to use webpagetest.org to measure and fix up your Web site.  If all this talk about Web performance is a bit overwhelming, it's a great novice tutorial.  He walks through the entire process visually and explains each metric.  Great job Dave!]]></description>
			<content:encoded><![CDATA[<p>Dave Artz has put together a <a href="http://www.artzstudio.com/2008/07/optimizing-web-performance-with-aol-pagetest/" target="_blank">simple Webcast tutorial</a> on how to use <a href="http://test.webpagetest.org:8080/" target="_blank">webpagetest.org</a> to measure and fix up your Web site.  If all this talk about Web performance is a bit overwhelming, it's a great novice tutorial.  He walks through the entire process visually and explains each metric.  Great job Dave!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webadminblog.com/index.php/2008/07/11/optimizing-web-performance-with-aol-pagetest/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Velocity 2008 Conference Experience &#8211; Part VII</title>
		<link>http://www.webadminblog.com/index.php/2008/06/24/the-velocity-2008-conference-experience-part-vii/</link>
		<comments>http://www.webadminblog.com/index.php/2008/06/24/the-velocity-2008-conference-experience-part-vii/#comments</comments>
		<pubDate>Wed, 25 Jun 2008 00:28:48 +0000</pubDate>
		<dc:creator>Ernest</dc:creator>
				<category><![CDATA[Application Performance Management]]></category>
		<category><![CDATA[Automation]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Velocity 2008]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[velocity]]></category>
		<category><![CDATA[velocity08]]></category>
		<category><![CDATA[velocityconf08]]></category>

		<guid isPermaLink="false">http://webadminblog.com/?p=24</guid>
		<description><![CDATA[We've reached the last couple sessions at Velocity 2008. Read me! Love me! We hear about Capacity Planning with John Allspaw of Flickr. He says: No benchmarks! Use real production data. (How? We had to develop a program called WebReplay to do this because no one had anything. We're open sourcing it soon, stay tuned.) [...]]]></description>
			<content:encoded><![CDATA[<p>We've reached the last couple sessions at Velocity 2008.  Read me!  Love me!</p>
<p>We hear about <strong><a href="http://en.oreilly.com/velocity2008/public/schedule/detail/3208" target="_blank">Capacity Planning</a> </strong>with John Allspaw of Flickr.  He says: No benchmarks!  Use real production data.   (How?   We had to develop a program called WebReplay to do this because no one had anything.  We're open sourcing it soon, stay tuned.)</p>
<p>Use "safety factors" (from traditional engineering).  Aka a reserve, overhead, etc.</p>
<p>They use squid a bunch.  At NI we've been looking at Oracle's WebCache - mainly because it supports ESIs and we're thinking that may be a good way to go.  There's a half assed ESI plugin for squid but we hear it doesn't work; apparently Zope paid for ESI support in squid 3.0 but no traction on that in 4 years best as we can tell.  But I'd be happy not to spend the money.</p>
<p><span id="more-24"></span></p>
<p>Anyway.  You should do forecasting.  Assuming it's linear which it never is.  But you can take output from your stuff (ganglia, whatever) and <a href="http://www.unipress.waw.pl/fityk/" target="_blank">fityk</a> can give you a curve fit.</p>
<p>They use lots of <a href="http://www.nagios.org/ " target="_blank">nagios</a> for monitoring - about 10 checks per host.</p>
<p>Determine a ceiling, and high/low water marks, and alert outside the water marks.</p>
<p>Then have a simple capacity dashboard for everything - how close to ceiling you are.</p>
<p>Horizontal scaling is all well and good, but sometimes you should do some vertical by upgrading.  He calls it "diagonal."  By upgrading image proc servers, they got same CPU usage but 3x more work out of them.  (We saw the same when we upgraded our Java app servers from Sun V440s to Dell 2850s a year or two ago - 50% performance improvement.)  In their case, they also got faster processing time, less power usage, and less rack space.</p>
<p>Memcached.  You turn it on, and the DBs go idle!   Yay.  But then your Web servers heat up as they become the bottleneck.   So beware the wandering bottleneck.</p>
<p>Stupid Capacity Tricks!   Before Puppet and Capistrano there was dsh (distributed shell).  Ooo, I want it.</p>
<p>Shut Shit Off - they have software switches to disable various features when needed.  (We have a lot of those switches at NI, but they're not documented and under the control of business units not ops - sad.)  Their programmers are good, they put flags in config files in order of importance to turn things on/off, read on the fly.</p>
<p>Host an outage page NOT in your datacenter, and use it - users appreciate knowing what's up.</p>
<p>Bake dynamic into static.  Some Yahoo! properties have a big red button to bake/unbake at will.  Bye to DDoS attacks.</p>
<p>And at the end, a plaintive "We're Hiring..."  Like everyone else here.  Man I need some good Web ops people.  I have two open spots.  We're hiring too!!!</p>
<p>Question: You do lots of mini-code pushes (20/day).  How the heck do you manage that and keep the site up?   He says - culture is the biggest thing.  They have devs that think like ops and don't do retarded things.  They're ganglia addicted and they're the ones hitting the big red buttons.  Then less important are some technical parts, like a one button deploy and verbose logging of changes.</p>
<p>He uses more dirty words than I do.  Boss.</p>
<p>Artur Bergman of Wikia speaks again, on  <strong>Squid vs Varnish</strong>.</p>
<p>PHP is a pig and wikitext is hard to parse, so they need caching.  A hit is 8 ms and a miss is 200 ms, and they have a 75% hit rate.  You have to get the cache hits up, by making more cachable.  Ooo, they're playing around with ESI he says!</p>
<p>They decided to force caching for anonymous users.  They've only gone up to 30 seconds, but no complaints.  They ignore if-modified-since and purge. Be careful about vary-accept on encoding because there's an annoying browser bug with misplaced commas.</p>
<p>Mediawiki lives and dies by squid and puts cache control in the code, which is bad because developers are stupid.</p>
<p>Squid - the slide actually says "Me hates it" and "Still a piece of shit."  Awesome.</p>
<p>Varnish - he loves varnish.  He nearly cried when he read the source code (C).  But it's a little unstable.  He got it up to 65k hps (squid doing 2800).</p>
<p>Varnish has some "novel" techniques.  Its control language is VCL.  (Side note, they monitor with lvs).  This gets compiled down to C at runtime.  So you can put assembly in if you want.  Lawdy.  It segfaults from time to time under load and they're helping fix it.  In a month or two he'll have it crackin'!</p>
<p>And the last one - <strong>Puppet,</strong> by Luke Kanies, Puppet developer.</p>
<p>Automation tools are old and bad, and especially because they're SSH based.  (Agree!)  And also because there's not many people who cross the chasm between sysadmin and developer.  They decided they had to solve the problem and create something a billion times better than anything (where anything is cfengine).  Either you can manage many machines with little effort, or you can't.  You want to be able to.  So this required abstraction.  He's using the analogy of C scaring the bejeezus out of assembly programmers - a good analogy.</p>
<p>It's sad you have to do it, but he goes into why a more powerful tool should not scare people and put them all out of work.  Developers seem to have gotten over this but not sysadmins.  it's stupid especially because "we're understaffed" is the #1 thing I hear out of all of ours.</p>
<p>So they implemented Puppet with the metaphor of resources and resource providers, hiding all the file/command/UNIX admin stuff.  (Well, kinda.)  It's easily extensible.</p>
<p>The Web 2.0 crowd has made "microformats."  Your infrastructure can use that idea too.  Catch up with the times - if you're proud of doing something developers have been doing for 10 years (like moving to version control in Subversion) then you're behind.  (Use <a href="http://git.or.cz/" target="_blank">git</a>!)  Anyway, you have to use polymorphism (overloading) to make a system like Puppet understand ssh on system 1 vs 2 vs 3.</p>
<p>Also, have one solution per problem.  Not multiple.  And most of the problems you face are NOT unique to you or your organization - so using a common tool like this can benefit from the network effect.</p>
<p>And the third big principle (were there only two before?) is completeness.  Everything that matters in your config should be in the config.  Not some minimal set.  Relationships are important (dependencies).  You can do things like have a service subscribe to a file and restart when it changes, for example.</p>
<p>Puppet is mainly used as a central config management tool.  Each host gets a resource catalog.  Machines get put in classes and they get lists of resources.</p>
<p>Puppet clients retrieve their resource catalog, determine order, check em, fix, and repeat every 30 minutes.  "Like cfengine but sexier!"  The completeness approach means clean management through the lifecycle - a freshly kickstarted box doesn't end up different.  You just kickstart enough to run puppet and use it to do everything.  So all boxes are kept 100% up to date without artifacts.</p>
<p>And it has reporting underway too!  They're planning to charge for that to make some mooonay!  Google, Stanford, Sony, Rackspace all use Puppet.</p>
<p>Why Puppet vs Capistrano?  Cap is SSH in Ruby. Not something for yoru whole infrastructure.</p>
<p>Why Puppet vs cfengine?  More open dev community and better.</p>
<p>What about Puppet slowness?  It scales like HTTPS.</p>
<p>Puppet: Is XMLRPC but moving to REST.  Uses certs and SSL, not keypairs.  It's in Ruby.  He's had to learn to be a developer in the process.  It's also an API to the systems.  It supports VMs well and can get into the guts of the VMs unlike pure VM provisioning tools.  Buy me!  it's open source but he sells support/trainin/addons.  Discovery to come!  There's nagios integration of some sort.  Vertebra, like capistrano, is an ad hoc change tool - Puppet isn't (though you can use relsh for that).</p>
<p>That's the last session - wrapup later once I power up my laptop and get some booze in me!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webadminblog.com/index.php/2008/06/24/the-velocity-2008-conference-experience-part-vii/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Velocity 2008 Conference Experience &#8211; Part IV</title>
		<link>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-iv/</link>
		<comments>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-iv/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 04:55:28 +0000</pubDate>
		<dc:creator>Ernest</dc:creator>
				<category><![CDATA[Application Performance Management]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Velocity 2008]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[velocity]]></category>
		<category><![CDATA[velocity08]]></category>
		<category><![CDATA[velocityconf08]]></category>

		<guid isPermaLink="false">http://webadminblog.com/?p=20</guid>
		<description><![CDATA[OK, now we're to the final stretch of presentations for Day One. "Cadillac or Nascar: A Non-Religious Investigation of Modern Web Technologies," by Akara and Shanti from Sun. Web20kit is a new reference architecture from Sun to evaluate modern Web technologies. It's implemented in PHP, JavaEE, and Ruby. It'll be open sourced in the fall. [...]]]></description>
			<content:encoded><![CDATA[<p>OK, now we're to the final stretch of presentations for Day One.</p>
<p>"<a href="http://en.oreilly.com/velocity2008/public/schedule/detail/21978/public/schedule/detail/2197" target="_blank"><strong>Cadillac or Nascar: A Non-Religious Investigation of Modern Web Technologies</strong></a>," by Akara and Shanti from Sun.</p>
<p>Web20kit is a new reference architecture from Sun to evaluate modern Web technologies. It's implemented in PHP, JavaEE, and Ruby. It'll be open sourced in the fall.</p>
<p>It uses a web/app server - apache, glassfish, and mongrel - with a cache (memcached), a db (mySQL), an object store (NFS/MogileFS), a driver, and a geocoder. The sample app is a social event calendar with a good bit of AJAX frontend.</p>
<p>I apologize for any lack of coherence in this writeup, but I was at the back of the hall, the mike wasn't turned up enough, and there were accents to drill through.</p>
<p><span id="more-20"></span></p>
<p>The Java EE implementation used servlets, JSPs, JPA, the Whalin memcached client, and localFS/NFS/distributed FS on GlassFish. PHP used UnixODBC/PDO, the PECL memcache client, and same storage on apache/lighttpd. The Rails framework isn't using memcached or distributed FS yet. The AJAX used was JSON. (I have no idea what JAMM or AMMP are, I think he's making them up.)</p>
<p>Admittedly, PHP is more of a language, RoR more of a framework, with Java in between, with increasing amounts of object-relational mapping provided, with the obvious pros &amp; cons of any abstraction layer.</p>
<p>Damn, it's hot in here. And packed. And I can't understand half of what this guy is saying.</p>
<p>The lady comes up for the testing results. Ah, louder and more legible. For PHP, throughput vs users grows linearly, but the network usage outstrips CPU usage; it's the bottleneck. For Java, scaling is linear with a single process. The Java Persistence API eased development with its O/R mapping and built in caching. Rails is underway, but prelim results are that thin is better than mongrel; JRuby is better than Ruby, and on Solaris Ruby in Cool Stack 1.3 (Sun compiled open source) gives 40% improvement.</p>
<p>There are some memcached results in pretty graphs but I'm not clear what they mean. In fact, larger memcacheds were faster. There are performance issues in the client libraries - memcached server is good but clients don't scale. For Java, Whalin and Spy both suck but Spy might be better. In PHP, the PECL client is most common but it's unstable so people roll their own.</p>
<p>mySQL scaling is good and 5.1 is way ass better than 5.0. As in 75% CPU usage reduction.</p>
<p>Apache/PHP Tuning Tips - tune TCP time_wait, don't load apache modules you don't need, tune listenbacklog (8092), serverlimit (2048), maxclients (2048), PHP - turn odd safe mode and increase realpath_cache_size if you have lots of files.</p>
<p>Glassfish tuning - eap up to 3 GB for 32bit, GC use parallel, increase http threads to 128, for a JPA provider use eclipselink not toplink, and run your web container in production mode (you have to redeploy when you change this).</p>
<p>Memcached tuning - ensure network processing is distributed across CPUs. Bind memcached to CPUs not processing interrupts. Run memcached 1.2.5 with 4 threads and use in 64-bitmode for large cache sizes (preferential to more memcached procs). Needs horizontal scaling.</p>
<p>mySQL Tuning - tune your queries. Use joins over subqueries (unlike oracle), use limits. Innodb - avoid too frequent cache flushes (innodb_flush_log_at)trx_commit = 2). Use separate read/write dbs to avoid trashing your query cache.</p>
<p>Conclusion - network is starting to be a bottleneck at 1 Gb. Use link aggregation to get around till 10 Gb is there.</p>
<p>Also note Faban, an open source benchmark development kit.</p>
<p>More in the Sun blogs.</p>
<p>I'm a little dubious about this whole thing - not sure what the goal is.  A lot of performance has to do with how you code things, so I'm not sure you can do a fair compare between languages/frameworks by just coding a similar app in all of them...  I think it's more useful to compare sub-parts (so not Rails vs PHP, but Whalin vs Spy.)</p>
<p><a href="http://en.oreilly.com/velocity2008/public/schedule/detail/3632" target="_blank"><strong>Improving Netflix Performance</strong></a>, with Bill Scott. (Yes, from Netflix.) Rather than do a 'big bang" they put a good measurement framework in place and did incremental improvements. There's a bunch of specific points you can measure, from the unload() of the previous page, to other events, to when something appears... They put these together into a bunch of measurement intervals. The slides are very interesting here.</p>
<p>They also correct for server vs client clocks. They segment by various metrics - browser, bandwidth, etc. They made a firebug panel to show these values.</p>
<p>Then they did analysis. They made some changes that "should" have been good, like changing images to css sprites, and that degraded performance due to old event handlers - they moved those and then it was good.  The lesson is test and verify.</p>
<p>Gzip was a win, 13-25% user perf improvement and halved outbound network traffic. Scared the crap out of their network team as a result, heh.</p>
<p>They refactored the Netflix queue with mixed results. He has an interesting graph on browser speed - Aafari fastest, IE slowest.</p>
<p>In conclusion - use YSlow optimizations but test them!</p>
<p>Next, it's back to back new browser wars. Mike Connor from Mozilla about Firefox 3 and Christian Stockwell from Microsoft on IE8!</p>
<p><strong>Firefox3</strong>! They put in a lot of performance enhancements (and "human performance" enhancements). Goals of 3: safer, faster, better. Enforces plugin security more. Faster JS execution. Awesomebar, aka the new location bar with typeahead, learning, search.. Download manager. Poor guy's nervous. OK, I'm bored.</p>
<p><a href="http://en.oreilly.com/velocity2008/public/schedule/detail/3290" target="_blank"><strong>IE8</strong></a>! Navigating to the top 100 sites in IE8 shows that most of the work is done in layout and rendering (70%) - less so in marshalling, DOM, and JScript, and very little in CSS and HTML. So they couldn't just make "the JScript engine" or "the HTML render" faster. So they did work on the JScript engine, but also unblocked script downloads, increased the connection limit, reduced marshalling costs, and decreased memory usage. Tried to fix "known bad" issues like 1x1 transparent pngs and hover effects. Also, they have dev tools included in Beta 1 (unclear what this means). "Performance Analyzer Tools" as part of the SDK will give you the time-spent breakdown on your own site!</p>
<p>OK, at this point I am giving up on the small, hot room with bad sound and going back to the big ballroom. Which is fine, because it's time for the <a href="http://en.oreilly.com/velocity2008/public/schedule/detail/2401" target="_blank"><strong>Performance Metrics Panel</strong></a>!</p>
<p>John Rauser is moderating, with Peter Sevcik (NetForecast), Eric Goldsmith (AOL), Eric Schurman (Microsoft), and Vik Chaudary (Keynote).</p>
<p>What metrics are best for end user experience? Use percentiles. The distribution of performance times is not a normal distribution - it has a long tail. Use median rather than mean in all cases. You need to see and capture the tail. How do you settle on a specific metric? Some are using a metric that's a "munged together" combo of the percentiles, because you don't want to miss effects - like by doing something that benefits the 99th percentile time kinds of users but hoses the 20th percentile types of users.</p>
<p>Some slides on Apdex, the application performance index (see apdex.org). You bucketize user experiences into those in good time T that are satisfied, then classify up to 4T as tolerating. Beyond that is unacceptable. Apdex = (satisfied + (tolerating/2)) / total samples. But is this too simplistic? It's certainly easy to calculate. But it isn't very sensitive to changes due to the bucketizing. Natty shirt guy (unclear which guy he is) prefers the munged percentiles to something as simplistic as Apdex. Oh, it's Eric from Microsloth.</p>
<p>My personal feel is that one number's not enough. Even our current NI SLA "2 seconds, global" is getting too simplistic. Apdex seems like more of a management "number to keep them quiet," and that's how they're describing it too. Although Apdex guy makes a good point that the number's portable across companies so you can use it to have discussions outside with advertisers etc. - but that begs the question of "T tampering."  Hell, SPEC benchmarks are BS and they're a lot more rigorous.</p>
<p>There's an entertaining discussion of the old "8 second rule" (useful standard! hogwash!) and the newer "2 second rule" (useful standard!  hogwash!). There's comments that "but lots of people don't hit 2s, and Amazon didn't used to hit 8s, so it must not be true" but I'm not sure that's relevant.</p>
<p>Anyway, being a EE and having a token amount of statistics and visualization experience this whole discussion makes me sad. Peco leans over to me and says "If Tufte were here he'd slap these guys." (Referring to <a href="http://en.wikipedia.org/wiki/Edward_tufte" target="_blank">Edward Tufte</a>, author of <a href="http://www.edwardtufte.com/tufte/books_vdqi" target="_blank">The Visual Display of Quantitative Information</a> - and he would.)  Man, we need just one decent statistician and visualization guy to come to the Web performance world and set everybody right.  Whenever I see something other than a simplistic line chart in the Web world I get a chubby.  (<a href="http://www.opnet.com/solutions/application_performance/panorama_qa.html" target="_blank">Opnet Panorama</a>, with its deviations and histograms, is about as good as it gets for ITers.)</p>
<p>The Keynote guy says there's an iGoogle index widget that'll show your numbers. He's showing a nice Google Maps mashup too... I'm not sure where you go to see this.</p>
<p>Question: What's the relationship between performance and availability?  Should you commingle them? Well, everyone has some kind of timeout... Turning poor performance into an availability hit. Yeah, we had a problem with that once, Keynote changed their cutoff and we were in a frenzy of trying to figure out why the hell our perf numbers changed.  There's discussion of "well does it really matter, do ops people really care..." Hell yeah we do. Every tool we have has a timeout that hits availability. Whether you combo the metrics or not, the two are related - it's dangerous to hide the relation.</p>
<p>Question: Metrics for active vs passive. Passive gives you breadth and you can find classes of problem synthetic can't give you. For some reason everyone's talking Jiffy and probes rather than network RUM. Though our network RUM is causing us to tear our hair out right now (long story) I'm not sure JavaScript beacons are safe/accurate.</p>
<p>Question (Peco!): What about errors? Well, 404s don't count towards performance, 500s do.  Not really satisfying...  Shouldn't things like 404 be somehow factored in?</p>
<p>I hear there's an open bar. I'm off like a prom dress!</p>
<p>After some booze and a trip to the In-N-Out burger (My first! Double double, animal style!) was the traditional O'Reilly Ignite superfast presentation session.</p>
<ul>
<li>Animoto scaling from 8 to 3500 servers in 3.5 days using <a href="http://www.rightscale.com/m/" target="_blank">RightScale</a>, an Amazon EC2 provisioning manager.</li>
<li>Porn scaling!  Gamelink does adult video hosting.  Streaming servers are hard to cache.  Do it locally, doing it on EC2/S3 etc. ends up costing a lot for network xfer.  Moving a TB per week - how do you get it there?  Metro Ethernet in their case.  And for storage - SANs too expensive, etc.  And CDNs shy from adult content.  Windows Media but moving to Flash, but that's copyable.  Network all gigabit backplane.  Video streams reset when a stream fails.  See also <a href="http://www.retina.net/tech">www.retina.net/tech</a>.</li>
<li>Freebase, a free creative commons database.  Infochimps.org, Project Gutenberg, etc.  Open data!  Public.resource.org.</li>
<li>Buy SAN, make profit.  whitepages.com does 15MM searches/day.  Nasty data, but read only, and little caching.  Database sprawl threatened.  Wanted to move to a SAN but heard expensive, tricky.  iSCSI?  They used <a href="http://www.equallogic.com/" target="_blank">EqualLogic</a> iSCSI.  Cheap and fast.  55% TCO win.  Snapshots, replication.</li>
<li>John Bryce of <a href="http://www.mosso.com/" target="_blank">Mosso</a> about rewriting the plane in flight.  Management web app customers use to create stuff in a cloud.  To a distributed provisioning system.  Staff was increasing faster than functionality.  Planned for new provisioning then new panel then new features.  Fix things as they come up (don't keep high interest technical debt).  Estimating a complete overhaul is hard.  Refactor in each release.  Release in parallel/beta sites.  Fight for your users.</li>
<li><a href="http://merbivore.com/" target="_blank">merb</a>.  Rails is easy but sucks scalability-wise.  Merb is easy and better for enterprise.  Modular, easily tested, stable interface.  Very similar to rails...</li>
<li><a href="http://www.slideshare.net/dmc500hats/startup-metrics-for-pirates-long-version" target="_blank">Startup Metrics for Pirates</a>!  Focus on a small set of good conversion metrics. Big ass preso - check it out at the Slideshare link provided.   Web 2.0 model - 1.  Drive traffic.  3.  Profit.</li>
<li>Jos from RIPE NCC, a regional internet registry.  ARIN for Europe/Russia.  IPv4 D-Day is in a couple years.  IPv6 is not coming in time.  But it needs to.  Get your IPv6 shit together.  No first mover incentive, so need to create demand.  Like free porn on IPv6.  IPv6experiment.com!</li>
</ul>
<p>That was interesting, and I got to hobnob with my favorite CEO cutie from <a href="http://www.slideshare.com">slideshare.com</a>.  Now, to the Bat-Bed!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-iv/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Velocity 2008 Conference Experience &#8211; Part III</title>
		<link>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-iii/</link>
		<comments>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-iii/#comments</comments>
		<pubDate>Mon, 23 Jun 2008 23:27:02 +0000</pubDate>
		<dc:creator>Ernest</dc:creator>
				<category><![CDATA[Application Performance Management]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[High Availability]]></category>
		<category><![CDATA[Velocity 2008]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[velocity]]></category>
		<category><![CDATA[velocity08]]></category>
		<category><![CDATA[velocityconf08]]></category>

		<guid isPermaLink="false">http://webadminblog.com/?p=19</guid>
		<description><![CDATA[In the afternoon, we move into full session mode.  There's two tracks, and I can only cover one, but that's what I have Peco and Robert around for!  Well, that and to have someone to outdrink.  (Ooo burn!)  They'll be posting their writeups at some point as well - you can go to the Velocity [...]]]></description>
			<content:encoded><![CDATA[<p>In the afternoon, we move into full session mode.  There's two tracks, and I can only cover one, but that's what I have Peco and Robert around for!  Well, that and to have someone to outdrink.  (Ooo burn!)  They'll be posting their writeups at some point as well - you can go to the Velocity <a href="http://en.oreilly.com/velocity2008/public/schedule/grid" target="_blank">schedule page</a> to see the other sessions and to the <a href="http://en.oreilly.com/velocity2008/public/schedule/proceedings" target="_blank">presentations page</a> to get slides where they exist.</p>
<p>First afternoon session: My panel! I am on the <strong>"<a href="http://en.oreilly.com/velocity2008/public/schedule/detail/3639" target="_blank">Measuring Performance</a>"</strong> panel with Steve Souders, Ryan Breen of Gomez, Bill Scott of Netflix, and Scott Ruthfield from whitepages.com (a fellow Rice U/Lovetteer!) It went well.  We talked about end user performance monitoring, all the other kinds of tools you can use and their drawbacks, and about "newfangled" monitoring of perf w/AJAX, SOA, RIAs, etc.  No questions; not sure if the audience liked it or not.  But I did get a number of people saying "good work" later so I'll declare victory. <img src='http://www.webadminblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><strong>"<a href="http://en.oreilly.com/velocity2008/public/schedule/detail/4386" target="_blank">Actionable Logging for Smoother Operation and Faster Recovery</a>,</strong>" by Mandi Walls of AOL.  It's a quick 30 minute session. Logging should be actionable - concise, express symptoms. Anything logged is something fixable. It should be giving you less downtime - shorter time to resolution. Logging takes resources, so make it worth it.</p>
<p>Filter down your logs to be concise and actionable. Production logging has different goals from dev/QA logging. You're looking for problem diagnosis and recovery, and then statistics and monitoring. Insight into what the app's doing.</p>
<p>You need a standard log file location. On our UNIX servers, the UNIX team gives us "/opt/apps" as the place where we can put stuff and gets cranky about any files outside of that. We make everyone log to one place - /opt/apps/logs/&lt;appname&gt; for this reason. Makes it easy to manage disk space, rotate logs, run "find"s, etc.</p>
<p><span id="more-19"></span></p>
<p>Roll your logs and have a standard file naming format. We prefer log.YYYYMMDD[HHMMSS] because it's then sorted in date order.</p>
<p>You want standard, good timestamps, formats, etc. Ideally. Hard to do in practice, which is why at NI we use Splunk for log file management - it can detect/be told about different formats, timestamps, etc. and it'll do this for you. Have a standard, that's fine, but most 3p software and some of your programmers won't follow it.</p>
<p>Use log levels. Don't log too much or not enough, and standards for levels help with that. Log lines should be helpful - what program module? What were the variables at hand?</p>
<p>Don't log passwords, usernames, etc. Splunk has facilities to automatically suppress these by the way.  I don't own stock in them or anything, I'm just sayin'.</p>
<p>Logs are often the first line of information for troubleshooting, so the better it is, the better you can recover quickly.</p>
<p>My take on this session - all pretty basic, but solid. Logging 101.</p>
<p>Third session, another 30 minute quickie, is by Goranka Bjedov from Google, on <strong>stress, load, and performance testing in QA</strong>. She focuses on the back end, as opposed to Steve's client side focus. She analyzes scalability, bottlenecks, probable issues, etc. and feeds them to ops.</p>
<p>QA is not brain surgery, she says, and it should be expected for them to provide this kind of information. And you don't have to perfectly reproduce the production environment for it. You can learn 80% of it on a modest server under modest load. She totally eliminates the network, which "someone else should be looking at" (who?).</p>
<p>Tests aren't 100% reproducible. You have to go statistical - run the tests several times and see averages and deviation. She prefers <a href="http://jakarta.apache.org/jmeter/index.html" target="_blank">JMeter</a>, <a href="http://grinder.sourceforge.net/" target="_blank">The Grinder</a>, and <a href="http://funkload.nuxeo.org/" target="_blank">FunkLoad</a> - consider <a href="http://www.opensta.org/" target="_blank">OpenSTA</a> in Windows. She finds they are as good as LoadRunner etc. They use log replay, not sure with what tool.</p>
<p>And that's it!  She <a href="http://googletesting.blogspot.com/2007/10/performance-testing.html" target="_blank">writes about performance on the Google blog</a>. I'll check it out!</p>
<p>This session needed slides - "performance testing is easy" and "use open source" aren't much to get out of one of these sessions.</p>
<p>Next, another longer 45 minute session - <strong>"<a href="http://en.oreilly.com/velocity2008/public/schedule/detail/1525" target="_blank">Incident Command for IT: What We Can Lean From The Fire Department</a></strong>," by Brent Chapman from Great Circle.</p>
<p>The core idea is that public safety agencies all deal with emergencies all the time. What are some best practices we can glean from them? They organize on the fly, coordinate efforts of multiple agencies, and evolve the organization as the incident progresses.</p>
<p>Example: a car hits a fire hydrant. You have fire, ambulance, police, water, power people all involved and in a specific order, and it's a time critical event. Another example is SoCal wildfires. Obvious IT analogies (data center outage...).</p>
<p>So an "Incident Command System" was developed to address questions like this. It's a set of standard tools for command, control, and coordination of incidents. Started in SoCal but has evolved into a national standard.</p>
<p>ICS recommends a modular, scalable org structure, consisting of command, ops, logistics, planning, and admin sections. Can be one person until more folks show up. Command section plans. Operations section does the work, and assists command in development of a consolidated action plan. It's usually the largest. Planning maintains status &amp; plans. Logistics section gets stuff. Admin/finance pays, tracks costs, etc. Sections are created/grown as needed.</p>
<p>The senior-most first responder is usually incident commander and transfer of command is explicit. Delegates work as necessary and possible.</p>
<p>Maintain a manageable span of control. Each supervisor should have 3-7 subordinates (5 ideal). New levels are created as needed.</p>
<p>Unity of command. In an incident each person has one boss, period. Matrixes have to be avoided in an emergency.</p>
<p>Transfers of responsibility are always explicit, and more senior arriving doesn't necessarily take over.</p>
<p>Clear communications. All comms have to be clear and complete (no code). Talk directly to resources when possible, traversing the tree to get to them (keeping management informed).</p>
<p>Consolidated action plans. Command communicates high level action plan per operational period (hour to shift to day to whatever). Write it down, especially if it crosses organizational or specialty boundaries.</p>
<p>Management by objective. Tell people what to accomplish, not how.</p>
<p>Comprehensive resource management. All assets &amp; personnel tracked via Admin section. Sign in and be assigned.</p>
<p>Designated incident facilities - a command post. And a staging area for resources.</p>
<p>Then he walks through a case study involving one of two data centers going offline. Hopefully the slides'll be available because this is a lot of typing. It's engaging though. We have tended to "roughly" follow this model in practice just by instinct - like I always make sure there's one person who "has the ball" during an incident (command). I think one of the biggest takeaways is to understand as first on you're mainly Command - and Ops, and Status - until you spin it off explicitly. Too many ops folks just do the ops and don't do command or status.</p>
<p>In closing, you should practice ICS and use it for planned events like moves/upgrades. Download preso from <a href="http://www.greatcircle.com">www.greatcircle.com</a>.</p>
<p>We do some things like that on our team.  I am disappointed that this is basically a "what if" preso, not something he's implemented in IT organizations...  Seems like more of an Ignite candidate.</p>
<p>Now to try to hunt down treats...  Apparently the Marriott staff brought out some snacks out in the hall during the session, and quickly took them away before the break started.  Boo.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-iii/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Velocity 2008 Conference Experience &#8211; Part II</title>
		<link>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-ii/</link>
		<comments>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-ii/#comments</comments>
		<pubDate>Mon, 23 Jun 2008 21:06:11 +0000</pubDate>
		<dc:creator>Ernest</dc:creator>
				<category><![CDATA[Application Performance Management]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Green Computing]]></category>
		<category><![CDATA[Velocity 2008]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[velocity]]></category>
		<category><![CDATA[velocity08]]></category>
		<category><![CDATA[velocityconf08]]></category>

		<guid isPermaLink="false">http://webadminblog.com/?p=18</guid>
		<description><![CDATA[Just two more keynotes till lunch, but these are larger ones (the previous speakers were 15 minutes apiece; these are 45).  I'll try to take good notes; every conference always says they're going to make all the slides available afterwards but at best they usually get a 50% success rate on that. First, Luiz Barroso [...]]]></description>
			<content:encoded><![CDATA[<p>Just two more keynotes till lunch, but these are larger ones (the previous speakers were 15 minutes apiece; these are 45).  I'll try to take good notes; every conference always says they're going to make all the slides available afterwards but at best they usually get a 50% success rate on that.</p>
<p>First, Luiz Barroso from Google speaks on <strong>energy efficient operations</strong>. Now, server usage is only about 1% of total electricity consumption, but it doubled between 2000 and 2005.  Measuring computing energy efficiency is harder than measuring a refrigerator or the like.  Efficiency is defined as work done/energy used in physics terms. Efficiency for IT can be broken down into computing efficiency (work done/chip energy), server efficiency (chip energy/server energy) and server room efficiency (server energy/server room energy). Surveys show an average PUE (1/server room efficiency) of 1.83, and power supplies dissipate 25% of the power going to servers uselessly, more in PCs. Servers have poor (computing) energy efficiency in their most common usage range.</p>
<p>How do we address this?  First, the power provisioning problem in the data center. Energy isn't the largest cost - building the center itself takes $10-$22 per watt, but the 10 year power is $9/watt.  Efficiency saves  on both. According to the uptime institute, the average cost breakdown is datacenter - 28%, electricity - 22%, hardware - 50%. (Software dwarfs this in many shops, I'll note.)</p>
<p><span id="more-18"></span></p>
<p>To provision efficiently, consolidate to the minimum number of servers (duh). Also, measure power use, don't trust nameplates. Study trends and investigate oversubscription potential (ICSA '07 article on this). So you can provision more tightly.</p>
<p>They did a six month study at Google. It was a model that measured power at the rack, PDU (500-800 machines), and cluster (5k machines) levels and characterized four different workloads over 5k servers. They wanted to find the potential of various energy saving techniques. Because of scaling, they found that the larger the group, the more oversubscription - a given rack may be at peak power 50% of the time, but a PDU only 20, and a cluster 10. Also, different workloads have different power consumption requirements and thus mixing workloads is more efficient. So don't fix oversubscription at the small grain (rack) level, but at the datacenter level. In other words, you don't need to provision power enough to run everything at peak usage - do less.  Profile app power usage and mix workloads, and manage the risk of overload by having some lower-priority "victim" workload.</p>
<p>This is of interest to us at NI; we're even now building out more data center space in our HQ and will be building one in our new third manufacturing site.</p>
<p>Now, he switches to talking about "energy-proportional computing." Servers aren't often very idle in real structures. High performance and high availability requires load balancing and wide data distribution mean no "idle," but lots of "low activity." And you have to overprovision, you can't target 90% utilization on the Web. They created GFS to distribute data, which is replica based. Reads are load balanced but writes have to go to all replicas. So "sleep" or "power down" functionality is not real useful for servers. Don't focus on efficiency at peak - you're seldom at peak. Power efficiency is generally worse when a server is underutilized. There's a new SPECPower benchmark, interesting, and it shows performance to power ratios dropping sharply with lowered target load.</p>
<p>So Luiz wants machines that scale power use linearly! Basically, current server power usage scales less-than-linear with workload. So at low workload, it's still using a buttload of power. Of the components (CPU, RAM, Disk, other) the CPUs are actually doing OK at scaling. But that means that CPU power schemes (DVS) are becoming diminishing returns. Idle CPUs consume less than 30% of their peak energy, but RAM - 50%; disks - 75%; networking - 85%. Energy proportionality would save them lots (doesn't affect peak).  Now there's nothing *you* can do about proportionality, unless you're making computers.  But you can harass your suppliers.  (Easy to say if you're someone like Google; for most of us when we talk to our suppliers about stuff like this they just chuckle and give us a swirlie.)</p>
<p>In conclusion - write fast code! All the infrastructure work in the world can have about a 50% effect, but software engineering impact is almost without bound. (I actually drill this point into our new programmers in hew hire training.)  Consider reduction of all energy-related costs and bug suppliers about proportionality. And join up at <a href="http://www.climatesaverscomputing.org/" target="_blank">climatesaverscomputing.org</a>!</p>
<p>The second big keynote is by Javier Soltero of <a href="http://www.hyperic.com/" target="_blank">Hyperic</a>. <strong>Cloud computing </strong>is nice, but you're not going to move over to it 100%. And clouds add complexity, like any abstraction. So you are faced with questions - is the problem my app, or is it the cloud? If you can't get the visibility, you can't trust it. Hyperic started to try to solve this with HypericHQ. They put up <a href="http://www.cloudstatus.com/" target="_blank">http://www.cloudstatus.com/</a>, a status of the AWS cloud, and will be adding clouds as they go. You can go see metrics like EC2 instance deployment latency (about a minute on average, for the record). So the site is kinda like Keynote for the cloud. Spiffy enough. Not too much more to write about it though.</p>
<p>A humorous note, that their bank cut them off because they were transferring a penny back and forth between accounts every minute with their payment system monitoring. Synthetic monitoring often can have "unintended side effects" of this sort.  Caveat monitor!</p>
<p>And now - lunch.  More to come!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Velocity 2008 Conference Experience &#8211; Part I</title>
		<link>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-i/</link>
		<comments>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-i/#comments</comments>
		<pubDate>Mon, 23 Jun 2008 17:57:33 +0000</pubDate>
		<dc:creator>Ernest</dc:creator>
				<category><![CDATA[Application Performance Management]]></category>
		<category><![CDATA[Velocity 2008]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[velocity]]></category>
		<category><![CDATA[velocity08]]></category>
		<category><![CDATA[velocityconf08]]></category>

		<guid isPermaLink="false">http://webadminblog.com/?p=17</guid>
		<description><![CDATA[I'm starting out the first year of Velocity, the new O'Reilly-sponsored Web Performance and Operations Conference, watching robots dance to Beck on a video screen. The conference's tagline is "fast, scalable, resilient, available," which is just about identical to our Web Systems' team's charter.  (And our reputation with the ladies!) For a long time, we've [...]]]></description>
			<content:encoded><![CDATA[<p>I'm starting out the first year of <a href="http://en.oreilly.com/velocity2008/public/content/home" target="_blank">Velocity</a>, the new O'Reilly-sponsored Web Performance and Operations Conference, watching robots dance to Beck on a video screen. The conference's tagline is "fast, scalable, resilient, available," which is just about identical to our Web Systems' team's charter.  (And our reputation with the ladies!)</p>
<p>For a long time, we've had to bottom-feed off of developer conferences, general interest conferences, etc. to address Web site operational issues; it's great to see a conference specifically targeted at this growing area. The conference staff noted that the demand was way above what was expected, and were scurrying about to ensure they had enough materials. By rough headcount in the first keynote I'd estimate 400 attendees, with more arriving over time as West Coast standard wakeup time (10 AM, for the record) comes along.</p>
<p><span id="more-17"></span></p>
<p>Steve Souders and Jesse Robbins, the conference chairs, kick us off with a brief pep talk, and quickly introduce the first speaker - Bill Coleman, the "B" in BEA and the man who invented Solaris, who is talking about <a href="http://en.oreilly.com/velocity2008/public/schedule/detail/4596" target="_blank"><strong>green data centers</strong></a>.  His talk starts off about the huge and increasing complexity of computer systems. And then a semi-pointless digression into "Web 2.0!" Then he talks about getting to a true "cloud" or "dial tone" computing model where resources are there when needed, but not consuming resources when not. Dynamic provisioning/powering down based on utilization. On the mainframe, there's a policy manager called LPARS that lets you manage different priority jobs and run at a high, set utilization. We need the same thing in the distributed world. And... he's done. Hrm, I was hoping for some "here's the solution."</p>
<p>Next, two Keynote guys talking about their new tool, <a href="http://en.oreilly.com/velocity2008/public/schedule/detail/4586" target="_blank"><strong>KITE 2.0</strong></a>. We have used KITE 1.0, being Keynote customers. It's a nice Web page performance analysis tool. Now, I know there's a bunch of those, but the huge benefit for us is that our performance SLAs are defined by Keynote monitors and KITE uses the exact same technology and can upload the scripts to Keynote, so in terms of a tool to distribute to random internal programmers and designers, it's perfect. KITE 1.0 was very polished, and in 2.0 they're adding some fun things like quick instant tests for free from 5 global cities, and bursting (running the same test back to back many times). Basically, in KITE you can record a transaction, play it (via IE integration) and it gives you a lovely waterfall of each page. You can record it as a script and save it to replay. KITE was Keynote customer only, but now 2.0 (out early August) will be free to all, which is awesome.</p>
<p>Following this is a pretty anticipated announcement from Scott Ruthfield of whitepages.com, an open source performance tool called "<strong><a href="http://en.oreilly.com/velocity2008/public/schedule/detail/4404" target="_blank">Jiffy</a>.</strong>" They do about 500 searches per second for personal information there. They use Gomez, and their graphs have shown the same thing that Souders' book is about, which is that the server time is by far the smallest part of their performance - it's all front end. Their search results page integrates mapping, targeted ads, etc. "You can't manage what you can't measure." The problem with synthetic measurements are the granularity - Gomez hits them a couple times in 20 minutes, but that misses 99% of their traffic. So they want something akin to real user monitoring, but they came at it from a different perspective than the network-heavy RUM vendors existing are.  They want to measure everything, no page performance impact, and near real time.</p>
<p><a href="http://billwscott.com/jiffyext/" target="_blank">Jiffy</a> (linked at code.whitepages.com as of NOW) consists of JavaScript page tagging, Apache config to log the hits, database schema and reporting, and a Firebug plugin for consuming the data. With Jiffy you "mark" where you want timing to start, and then "measure" the elapsed time since the mark. In this way it's like any other page tagging (WebTrends, etc.) solution. This is my one concern with it, however - when we implemented page tagging at NI we went through a substantial process to validate tag logs versus our server logs, and there ended up being some very large bodies of omitted data that could not be attributed to any of the about dozen "known" reasons why page tagging and logs should show different information; in fact WebTrends ended up missing a large percentage of traffic.  But the "trends" are still there, just not all of your data.  That may be OK or may not be depending on how you try to use it (and what exactly is causing the missing data).  More on this later.</p>
<p>Next is conference circuit fave Artur Bergman from Wikia, who heads up their Web operations there. He is talking about the <strong>value of performance and reliability </strong>to the customer and your brand. He has a good point about user expectations - World of Warcraft has a lot of downtime and it's an expectation set with users. The guy who runs WoWWiki has a much lower tolerance for downtime! So the power of setting expectations is strong.</p>
<p>Operations is about efficient use of resources, end user performance, and reliability. Bad operations wastes R&amp;D money and cost of sale. Why do we not all know cost per page, or per page view? Isn't your margin based on that? How do we make sound business decisions about operations?</p>
<p>Wikia [NB: changed from "Wikipedia" per comment below] was having performance problems and spawned a project to address them. The ad networks were a big problem ("Ad networks suck! You should be ashamed!" &lt;crowd applauds&gt;). They fixed this by overloading document.write, and discovered that a good percentage of the time the ads either timed out or the user left before they got there. He said some other things about the performance case but none of us could make them out. In closing - keep it simple and loosely coupled.</p>
<p>Another funny interstitial video from the Richter Scales.  It reminds me that I need to update the funny videos in my new hire training!</p>
<p>Now, John Fowler, EVP of systems for Sun, talking about <a href="http://en.oreilly.com/velocity2008/public/schedule/detail/4582" target="_blank"><strong>infrastructure driving innovation</strong></a>. He cites horizontal scaling, universal communication, and openness as endemic trends. They're working on a "Web20kit" open source package that has squid/varnish/apache/mongrels/rails/glassfish/php/java, memcached, storage: mysql, mogile, hadoop, local FS... Not sure what that means. He's moving fast.</p>
<p>More threads and thus more cores are faster and more efficient. So they see more cores, more memory as being the solution to computing challenges. And Open Storage, which is an OpenSolaris-based storage management solution.  They have new ZFS flash "SSDs" to replace hard drives - more reliable, faster, but more expensive. They see a new server memory/storage hierarchy emerging which ranges from cache to RAM to flash to disk in a "hybrid" storage pool.</p>
<p>I love Sun in my heart as I'm an old school UNIX guy. We were on Suns at Rice when I was there at the turn of the '90's. But I'm not convinced. We had to move off Sun to Dell for all our app servers because they were just plain faster. You can have a zillion cores but on the Web, user performance often equates to the fastest line of performance of one thread. Thread-safe programming is rare in IT and not that more common in the people making the app servers, tools, etc. that our apps depend on.  Hell, the OAS app server we use aren't even certified to work with 64-bit JVMs.  For many slower chips to outperform fewer faster ones, you have to be able to successfully spread the workload across them in parallel, and most people can't do that yet.  Also, many of our disk performance issues are from huge ass database instances - I'm not sure how this new storage solution caches those.</p>
<p>And that's it for part one of the morning keynotes! We're moving fast...  More in Part II!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.webadminblog.com/index.php/2008/06/23/the-velocity-2008-conference-experience-part-i/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

