{"id":243,"date":"2009-06-24T10:36:17","date_gmt":"2009-06-24T15:36:17","guid":{"rendered":"http:\/\/www.webadminblog.com\/?p=243"},"modified":"2009-06-24T10:49:40","modified_gmt":"2009-06-24T15:49:40","slug":"velocity-2009-death-of-a-web-server","status":"publish","type":"post","link":"https:\/\/www.webadminblog.com\/index.php\/2009\/06\/24\/velocity-2009-death-of-a-web-server\/","title":{"rendered":"Velocity 2009 &#8211; Death of a Web Server"},"content":{"rendered":"<p>The first workshop on Monday morning was called <a href=\"http:\/\/en.oreilly.com\/velocity2009\/public\/schedule\/detail\/7871\" target=\"_blank\">Death of a Web Server: A Crisis in Caching<\/a>.\u00a0 The presentation itself is downloadable from that link, so follow along!\u00a0 I took a lot of notes though because much of this was coding and testing, not pure presentation.\u00a0 (As with all these session writeups, the presenter or other attendees are welcome to chime in and correct me!)\u00a0 I will italicize my thoughts to differentiate them from the presenter&#8217;s.<\/p>\n<p>It was given by <a href=\"http:\/\/www.campbellassociates.ca\/blog\/\" target=\"_blank\">Richard Campbell<\/a> from <a href=\"http:\/\/www.strangeloopnetworks.com\/\" target=\"_blank\">Strangeloop Networks<\/a>, which makes a hardware device that sits in front of and accelerates <a href=\"http:\/\/www.dotnetrocks.com\/\" target=\"_blank\">.NET<\/a> sites.<\/p>\n<p>Richard started by outing himself as a Microsoft guy.\u00a0\u00a0 He asks, &#8220;Who&#8217;s developing on the Microsoft stack?&#8221;\u00a0 Only one hand goes up out of the hundreds of people in the room.\u00a0 &#8220;Well, this whole demo is in MS, so strap in.&#8221;\u00a0 Grumbling begins to either side of me.\u00a0 <em>I think that in the end, the talk has takeaway points useful to anyone, not just .NET folks, but it is a little off-putting to many.<\/em><\/p>\n<p>&#8220;Scaling is about operations and development working hand in hand.&#8221;\u00a0\u00a0 <em>We&#8217;ll hear this same refrain later from other folks, especially Facebook and Flickr.\u00a0 If only developers weren&#8217;t all dirty hippies&#8230; \ud83d\ude42<\/em><\/p>\n<p>He has a hardware setup with a batch of cute lil&#8217; <a href=\"http:\/\/www.aopen.com\/\" target=\"_blank\">AOpen boxes<\/a>.\u00a0 He has a four server farm in a rolly suitcase.\u00a0 He starts up a load test machine, a web server, and a database; all IIS7, Visual Studio 2008.<\/p>\n<p>We start with a MS reference app, a car classifieds site.\u00a0 When you jack up the data set to about 10k rows &#8211; the developer says &#8220;it works fine on my machine.&#8221;\u00a0 However, once you deploy it, not so much.<\/p>\n<p>He makes a load test using MS Visual Studio 2008.\u00a0 Really?\u00a0 Yep &#8211; you can record and playback.\u00a0 That&#8217;s a nice &#8220;for free&#8221; feature.\u00a0 And it&#8217;s pretty nice, not super basic; it can simulate browsers and connection speeds. \u00a0He likes to run two kinds of load tests,and neither should be short.<\/p>\n<ul>\n<li>Step load for 3-4 hrs to test to failure<\/li>\n<li>Soak test for 24 hrs to hunt for memory leaks<\/li>\n<\/ul>\n<p>What does IIS have for built-in instrumentation?\u00a0 Perfmon.\u00a0 We also get the full perfmon experience, where every time he restarts the test he has to remove and readd some metrics to get them to collect.\u00a0 What metrics are the most important?<\/p>\n<ul>\n<li> Requests\/sec (ASP.NET applications) &#8211; your main metric of how much you&#8217;re serving<\/li>\n<li>Reqeusts queued (ASP.NET)\u00a0 &#8211; goes up when out of threads or garbage collecting<\/li>\n<li>%processor time &#8211; to keep an eye on<\/li>\n<li>#bytes in all heaps (.NET CLR memory) &#8211; also to keep an eye on<\/li>\n<\/ul>\n<p>So we see pages served going down to 12\/sec at 200 users in the step load, but the web server&#8217;s fine &#8211; the bottleneck is the db.\u00a0 But &#8220;fix the db&#8221; is often not feasible.\u00a0 We run ANTS to find the slow queries, and narrow it to one stored proc.\u00a0 But we assume we can&#8217;t do anything about it.\u00a0 So let&#8217;s look at caching.<\/p>\n<p>You can cache in your code &#8211; he shows us, using _cachelockObject\/HttpContext.Current.Cache.Get, a built in .NET cache class.<\/p>\n<p>Say you have a 5s initial load but then caching makes subsequent hits fast.\u00a0 But multiple first hits contend with each other, so you have to add cache locking.\u00a0 There&#8217;s subtle ways to do that right vs wrong.\u00a0 A common best practice patter he shows is check, lock, check.<\/p>\n<p>We run the load test again.\u00a0 &#8220;If you do not see benefit of a change you make, TAKE THE CODE BACK OUT,&#8221; he notes.\u00a0 Also, the harder part is the next steps, deciding how long to cache for, when to clear it.\u00a0 And that&#8217;s hard and error-prone; content change based, time based&#8230;<\/p>\n<p>Now we are able to get the app up to 700 users, 300 req\/sec, and the web server CPU is almost pegged but not quite (prolly out of load test capacity).\u00a0 Half second page response time.\u00a0 Nice!\u00a0 But it turns out that users don&#8217;t use this the way the load test does and they still say it&#8217;s slow.\u00a0 What&#8217;s wrong?\u00a0 We built code to the test.\u00a0 Users are doing various things, not the one single (and easily cacheable) operation our test does.<\/p>\n<p>You can take logs and run them through webtrace to generate sessions\/scenarios.\u00a0 But there&#8217;s not quite enough info in the logs to reproduce the hits.\u00a0 You have to craft the requests more after that.<\/p>\n<p>Now we make a load test with variety of different data (data driven load test w\/parameter variation), running the same kinds of searches customers are.\u00a0 Whoops, suddenly the web server cpu is low and we see steady queued requests.\u00a0 200 req\/sec.\u00a0 Give it some time &#8211; caches build up for 45 mins, heap memory grows till it gets garbage collected.<\/p>\n<p>As a side note, he says &#8220;We love Dell 1950s, and one of those should do 50-100 req per sec.&#8221;<\/p>\n<p>How much memory &#8220;should&#8221; an app server consume for .NET?\u00a0 Well, out of the gate, 4 GB RAM really = 3.3, then Windows and IIS want some&#8230;\u00a0 In the end you&#8217;re left with less than 1 GB of usable heap on a 32-bit box.\u00a0 Once you get to a certain level (about 800 MB), garbage collection panics.\u00a0 You can set stuff to disposable in a crisis but that still generates problems when your cache suddenly flushes.<\/p>\n<ul>\n<li>64 bit OS w\/4 GB yields 1.3 GB usable heap<\/li>\n<li>64 bit OS w\/8 GB, app in 32-bit mode yields 4 GB usable heap (best case)<\/li>\n<\/ul>\n<p>So now what?\u00a0 Instrumentation; we need more visibility. He adds a Dictionary object to log how many times a given cache object gets used.\u00a0 Just increment a counter on the key.\u00a0 You can then log it, make a Web page to dump the dict on demand, etc.\u00a0 These all affect performance however.<\/p>\n<p>They had a problem with an app w\/intermittent deadlocks, and turned on profiling &#8211; then there were no deadlocks because of observer effect.\u00a0 &#8220;Don&#8217;t turn it off!&#8221;\u00a0 They altered the order of some things to change timing.<\/p>\n<p>We run the instrumented version, and check stats to ensure that there&#8217;s no major change from the instrumentation itself.\u00a0 Looking at cache page &#8211; the app is caching a lot o fcontent that&#8217;s not getting reused ever.\u00a0 There are enough unique searches that they&#8217;re messing with the cache.\u00a0 Looking into the logs and content items to determine why this is, there&#8217;s an advanced search that sets different price ranges etc.\u00a0 You can do logic to try to exclude &#8220;uncachable&#8221; items from the cache.\u00a0 This removes memory waste but doesn&#8217;t make the app any faster.<\/p>\n<p>We try a new cache approach.\u00a0 .NET caching has various options &#8211; duration and priority.\u00a0 Short duration caching can be a good approach.\u00a0 You get the majority of the benefit &#8211; even 30s of caching for something getting hit several times a second is nice.\u00a0 So we switch from 90 minute to 30 second cache expiry to get better (more controlled) memory consumption.\u00a0 This is with a &#8220;flat&#8221; time window &#8211; now, how about a sliding window that resets each time the content is hit?\u00a0 Well, you get longer caching but then you get the &#8220;content changed&#8221; invalidation issue.<\/p>\n<p>He asks a Microsoft code-stunned room about what stacks they do use instead of .NET, if there&#8217;s similar stuff there&#8230;\u00a0 <em>Speaking for ourselves, I know our programmers have custom implemented a cache like this in Java, and we also are looking at &#8220;front side&#8221; proxy caching.<\/em><\/p>\n<p>Anyway, we still have our performance problem in the sample app.\u00a0 Adding another Web server won&#8217;t help, as the bottleneck is still the db.\u00a0 Often our fixes create new other problems (like caching vs memory).\u00a0 And here we end &#8211; a little anticlimactically.<\/p>\n<p>Class questions\/comments:<br \/>\nWhat about multiserver caching?\u00a0 So far this is read-only, and not synced across servers.\u00a0 The default .NET cache is not all that smart.\u00a0 MS is working on a new library called, ironically, &#8220;velocity&#8221; that looks a lot like memcached and will do cross-server caching.<\/p>\n<p>What about read\/write caching?\u00a0 You can do asynchronous cache swapping for some things but it&#8217;s memory intensive.\u00a0 Read-write caches are rarer- Oracle\/Tangosol Coherence and Terracotta are the big boys there.<\/p>\n<p>Root speed &#8211;\u00a0 At some point you also have to address the core query, it can&#8217;t take 10 seconds or even caching cant&#8217; save you.\u00a0 Prepopulating the cache can help but you have to remember invalidations, cache clearing events, etc.<\/p>\n<p>Four step APM process:<\/p>\n<ol>\n<li>Diagnosis is most challenging part of performance optimization<\/li>\n<li>Use facts &#8211; instrument your application to know exactly what&#8217;s up<\/li>\n<li>Theorize probable cause then prove it<\/li>\n<li>Consider a variety of solutions<\/li>\n<\/ol>\n<p>Peco has a bigger twelve-step more detailed APM process he should post about here sometime.<\/p>\n<p>Another side note, sticky sessions suck&#8230;\u00a0 Try not to use them ever.<\/p>\n<p>What tools do people use?<\/p>\n<ul>\n<li>Hand written log replayers<\/li>\n<li>Spirent avalanche<\/li>\n<li>wcat (MS tool, free)<\/li>\n<\/ul>\n<p>I note that we use LoadRunner and a custom log replayer.\u00a0 Sounds like everyone has to make custom log replayers, which is stupid, we&#8217;ve been telling every one of our suppliers in at all related fields to build one.\u00a0 One guy records with a proxy then replays with ec2 instances and a tool called &#8220;siege&#8221; (by Joe Dog).\u00a0 There&#8217;s more discussion on this point &#8211; everyone agrees we need someone to make this damn product.<\/p>\n<p>&#8220;What about Ajax?&#8221;\u00a0 Well, MS has a &#8220;fake&#8221; ajax that really does it all server side.\u00a0 It makes for horrid performance.\u00a0 Don&#8217;t use that.\u00a0 Real ajax keeps the user entertained but the server does more work overall.<\/p>\n<p>An ending quip repeating an earlier point &#8211; you should not be proud of 5 req\/sec &#8211; 50-100 should be possible with a dynamic application.<\/p>\n<p><em>And that&#8217;s the workshop.\u00a0 A little microsofty but had some decent takeaways I thought.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The first workshop on Monday morning was called Death of a Web Server: A Crisis in Caching.\u00a0 The presentation itself is downloadable from that link, so follow along!\u00a0 I took a lot of notes though because much of this was coding and testing, not pure presentation.\u00a0 (As with all these session writeups, the presenter or [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[77,259],"tags":[264,263,626,262,79,260,261],"class_list":["post-243","post","type-post","status-publish","format-standard","hentry","category-conferences","category-velocity-2009","tag-net","tag-caching","tag-conferences","tag-strangeloop","tag-velocity","tag-velocityconf","tag-velocityconf09"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pfI0c-3V","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/posts\/243","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=243"}],"version-history":[{"count":6,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/posts\/243\/revisions"}],"predecessor-version":[{"id":246,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/posts\/243\/revisions\/246"}],"wp:attachment":[{"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=243"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=243"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=243"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}