{"id":276,"date":"2009-07-06T21:51:42","date_gmt":"2009-07-07T02:51:42","guid":{"rendered":"http:\/\/www.webadminblog.com\/?p=276"},"modified":"2009-07-06T21:55:46","modified_gmt":"2009-07-07T02:55:46","slug":"velocity-2009-scalable-internet-architectures","status":"publish","type":"post","link":"https:\/\/www.webadminblog.com\/index.php\/2009\/07\/06\/velocity-2009-scalable-internet-architectures\/","title":{"rendered":"Velocity 2009 &#8211; Scalable Internet Architectures"},"content":{"rendered":"<p>OK, I&#8217;ll be honest.\u00a0 I started out attending &#8220;<a href=\"http:\/\/en.oreilly.com\/velocity2009\/public\/schedule\/detail\/9025\" target=\"_blank\">Metrics that Matter &#8211; Approaches to Managing High Performance Web Sites<\/a>&#8221; (presentation available!) by Ben Rushlo, Keynote proserv.\u00a0 I bailed after a half hour to the other one, not because the info in that one was bad but because I knew what he was covering and wanted to get the less familiar information from the other workshop.\u00a0 Here&#8217;s my brief notes from his session:<\/p>\n<ul>\n<li>Online apps are complex systems<\/li>\n<li>A siloed approach of deciding to improve midtier vs CDN vs front end engineering results in suboptimal experience to the end user &#8211; have to take holistic view.\u00a0 <em>I totally agree with this, in our own caching project we took special care to do an analysis project first where we evaluated impact and benefit of each of these items not only in isolation but together so we&#8217;d know where we should expend effort.<\/em><\/li>\n<li>Use top level\/end user metrics, not system metrics, to measure performance.<\/li>\n<li>There are other metrics that correlate to your performance &#8211; &#8220;key indicators.&#8221;<\/li>\n<li>It&#8217;s hard to take low level metrics and take them &#8220;up&#8221; into a meaningful picture of user experience.<\/li>\n<\/ul>\n<p><em>He&#8217;s covering good stuff but it&#8217;s nothing I don&#8217;t know.\u00a0 We see the differences and benefits in point in time tools, Passive RUM, tagging RUM, synthetic monitoring, end user\/last mile synthetic monitoring&#8230;\u00a0 If you don&#8217;t, read the presentation, it&#8217;s good.\u00a0 As for me, it&#8217;s off to the scaling session.<br \/>\n<\/em><br \/>\nI hopped into this session a half hour late.\u00a0 It&#8217;s <a href=\"http:\/\/en.oreilly.com\/velocity2009\/public\/schedule\/detail\/8859\" target=\"_blank\">Scalable Internet Architectures<\/a> (again, go get the presentation) by <a href=\"http:\/\/lethargy.org\/~jesus\/\" target=\"_blank\">Theo Schlossnagle<\/a>, CEO of <a href=\"http:\/\/omniti.com\/\" target=\"_blank\">OmniTI<\/a> and author of the similarly named book.<\/p>\n<p><em>I like his talk, it starts by getting to the heart of what Web Operations &#8211; what we call &#8220;Web Admin&#8221; hereabouts &#8211; is.\u00a0 It kinda confuses architecture and operations initially but maybe that&#8217;s because I came in late. <\/em><\/p>\n<p>He talks about knowledge, tools, experience, and discipline, and mentions that discipline is the most lacking element in the field.<em> Like him, I&#8217;m a &#8220;real engineer&#8221; who went into IT so I agree vigorously.<\/em><\/p>\n<p>What specifically should you do?<\/p>\n<ul>\n<li>Use version control<\/li>\n<li>Monitor<\/li>\n<li>Serve static content using a CDN, and behind that a reverse proxy and behind that peer based HA.\u00a0 Distribute DNS for global distribution.<\/li>\n<li>Dynamic content &#8211; now it&#8217;s time for optimization.<\/li>\n<\/ul>\n<h3><strong>Optimizing Dynamic Content<br \/>\n<\/strong><\/h3>\n<p>Don&#8217;t pay to generate the same content twice &#8211; use caching.\u00a0 Generate content only when things change and break the system into components so you can cache appropriately.<\/p>\n<p>example: a php news site &#8211; articles are in oracle, personalization on each page, top new forum posts in a sidebar.<\/p>\n<p>Why abuse oracle by hitting it every page view?\u00a0 updates are controlled.\u00a0 The page should pull user prefs from a cookie.\u00a0 (p.s. rewrite your query strings)<br \/>\nBut it&#8217;s still slow to pull from the db vs hardcoding it.<br \/>\nAll blog sw does this, for example<br \/>\nCheck for a hardcoded php page &#8211; if it&#8217;s not there, run something that puts it there.\u00a0 Still dynamically puts in user personalization from the cookie.\u00a0 In the preso he provides details on how to do this.<br \/>\nDo cache invalidation on content change, use a message queuing system like openAMQ for async writes.<br \/>\nApache is now the bottleneck &#8211; use APC (alternative php cache)<br \/>\nor use memcached &#8211; he says no timeouts!\u00a0 Or&#8230; be careful about them!\u00a0 Or something.<\/p>\n<h3>Scaling Databases<\/h3>\n<p>1. shard them<br \/>\n2. shoot yourself<\/p>\n<p>Sharding, or breaking your data up by range across many databases, means you throw away relational constraints and that&#8217;s sad.\u00a0 Get over it.<\/p>\n<p>You may not need relations &#8211; use files fool!\u00a0 Or other options like couchdb, etc.\u00a0 <em>Or hadoop, from the previous workshop!<\/em><\/p>\n<p>Vertically scale first by:<\/p>\n<ul>\n<li> not hitting the damn db!<\/li>\n<li> run a good db.\u00a0 postgres!\u00a0 not mySQL boo-yah!<\/li>\n<\/ul>\n<p>When you have to go horizontal, partition right &#8211; more than one shard shouldn&#8217;t answer an oltp question.\u00a0\u00a0 If that&#8217;s not possible, consider duplication.<\/p>\n<p>IM example.\u00a0 Store messages sharded by recipient.\u00a0 But then the sender wants to see them too and that&#8217;s an expensive operation &#8211; so just store them twice!!!<\/p>\n<p>But if it&#8217;s not that simple, partitioning can hose you.<\/p>\n<p>Do math and simulate it before you do it fool!\u00a0\u00a0 Be an engineer!<\/p>\n<p>Multi-master replication doesn&#8217;t work right.\u00a0 But it&#8217;s getting closer.<\/p>\n<h3>Networking<\/h3>\n<p>The network&#8217;s part of it, can&#8217;t forget it.<\/p>\n<p>Of course if you&#8217;re using Ruby on Rails the network will never make your app suck more.\u00a0 <em>Heh, the random drive-by disses rile the crowd up.<\/em><\/p>\n<p>A single machine can push a gig.\u00a0 More isn&#8217;t hard with aggregated ports.\u00a0 Apache too, serving static files.\u00a0 Load balancers too.\u00a0 How to get to 10 or 20 Gbps though?\u00a0 All the drivers and firmware suck.\u00a0 Buy an expensive LB?<\/p>\n<p>Use routing.\u00a0 It supports naive LB&#8217;ing.\u00a0 Or routing protocol on front end cache\/LBs talking to your edge router.\u00a0 Use hashed routes upstream.\u00a0 User caches use same IP.\u00a0 Fault tolerant, distributed load, free.<\/p>\n<p>Use isolation for floods.\u00a0 Set up a surge net.\u00a0 Route out based on MAC.\u00a0 Used vs DDoSes.<\/p>\n<h3>Service Decoupling<\/h3>\n<p>One of the most overlooked techniques for scalable systems.\u00a0 Why do now what you can postpone till later?<\/p>\n<p>Break transaction into parts.\u00a0 Queue info.\u00a0 Process queues behind the scenes.\u00a0 Messaging!\u00a0 There&#8217;s different options &#8211; AMQP, Spread, JMS.\u00a0 Specifically good message queuing options are:<\/p>\n<ul>\n<li><a href=\"http:\/\/activemq.apache.org\/\" target=\"_blank\"> ActiveMQ (Java)<\/a><\/li>\n<li> OpenAMQ (C)<\/li>\n<li> RabbitMQ (erlang)<\/li>\n<\/ul>\n<p>Most common &#8211; <a href=\"http:\/\/stomp.codehaus.org\/\" target=\"_blank\">STOMP<\/a>, sucks but universal.<\/p>\n<p>Combine a queue and a job dispatcher to make this happen.\u00a0 Side note &#8211; <a href=\"http:\/\/www.danga.com\/gearman\/\" target=\"_blank\">Gearman<\/a>, while cool, doesn&#8217;t do this &#8211; it dispatches work but it doesn&#8217;t decouple action from outcome &#8211; should be used to scale work that can&#8217;t be decoupled.\u00a0 (Yes it does, says dude in crowd.)<\/p>\n<p>Scalability Problems<\/p>\n<p>It often boils down to &#8220;don&#8217;t be an idiot.&#8221;\u00a0 <em>His words not mine.\u00a0 I like this guy.<\/em> Performance is easier than scaling.\u00a0 Extremely high perf systems tend to be easier to scale because they don&#8217;t have to scale as much.<\/p>\n<p>e.g. An email marketing campaign with an URL not ending in a trailing slash.\u00a0 Guess what, you just doubled your hits.\u00a0 Use the damn trailing slash to avoid 302s.<\/p>\n<p><em>How do you stop everyone from being an idiot though?\u00a0 Every person who sends a mass email from your company?\u00a0 That&#8217;s our problem\u00a0 &#8211; with more than fifty programmers and business people generating apps and content for our Web site, there is always a weakest link.<\/em><\/p>\n<p>Caching should be controlled not prevented in nearly any circumstance.<\/p>\n<p>Understand the problem.\u00a0 going from 100k to 10MM users &#8211; don&#8217;t just bucketize in small chunks and assume it will scale.\u00a0 Allow for margin for error.\u00a0 Designing for 100x or 1000x requires a profound understanding of the problem.<\/p>\n<p>Example &#8211; I plan for a traffic spike of 3000 new visitors\/sec.\u00a0 My page is about 300k.\u00a0 CPU bound.\u00a0 8ms service time.\u00a0 Calculate servers needed.\u00a0 If I varnish the static assets, the calculation says I need 3-4 machines.\u00a0 But do the math and it&#8217;s 8 GB\/sec of throughput.\u00a0 No way.\u00a0 At 1.5MM packets\/sec &#8211; the firewall dies.\u00a0 You have to keep the whole system in mind.<\/p>\n<p>So spread out static resources across multiple datacenters, agg&#8217;d pipes.<br \/>\nThe rest is only 350 Mbps, 75k packets per second, doable &#8211; except the 302 adds 50% overage in packets per sec.<\/p>\n<p>Last bonus thought &#8211; use zfs\/dtrace for dbs, so run them on solaris!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OK, I&#8217;ll be honest.\u00a0 I started out attending &#8220;Metrics that Matter &#8211; Approaches to Managing High Performance Web Sites&#8221; (presentation available!) by Ben Rushlo, Keynote proserv.\u00a0 I bailed after a half hour to the other one, not because the info in that one was bad but because I knew what he was covering and wanted [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,77,259],"tags":[303,79,260,261],"class_list":["post-276","post","type-post","status-publish","format-standard","hentry","category-apm","category-conferences","category-velocity-2009","tag-scalability","tag-velocity","tag-velocityconf","tag-velocityconf09"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pfI0c-4s","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/posts\/276","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=276"}],"version-history":[{"count":5,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/posts\/276\/revisions"}],"predecessor-version":[{"id":280,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/posts\/276\/revisions\/280"}],"wp:attachment":[{"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=276"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=276"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webadminblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=276"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}