Web Admin Blog Real Web Admins. Real World Experience.


Velocity 2009 – Best Tidbits

Besides all the sessions, which were pretty good, a lot of the good info you get from conferences is by networking with other folks there and talking to vendors.  Here are some of my top-value takeaways.

Aptimize is a New Zealand-based company that has developed software to automatically do the most high value front end optimizations (image spriting, CSS/JS combination and minification, etc.).  We predict it'll be big.  On a site like ours, going back and doing all this across hundreds of apps will never happen - we can engineer new ones and important ones better, but something like this which can benefit apps by the handful is great.

I got some good info from the MySpace people.  We've been talking about whether to run our back end as Linux/Apache/Java or Windows/IIS/.NET for some of our newer stuff.  In the first workshop, I was impressed when the guy asked who all runs .NET and only one guy raised his hand.   MySpace is one of the big .NET sites, but when I talked with them about what they felt the advantage was, they looked at each other and said "Well...  It was the most expeditious choice at the time..."  That's damning with faint praise, so I asked about what they saw the main disadvantage being, and they cited remote administration - even with the new PowerShell stuff it's just still not as easy as remote admin/CM of Linux.  That's top of my list too, but often Microsoft apologists will say "You just don't understand because you don't run it..."  But apparently running it doesn't necessarily sell you either.

Our friends from Opnet were there.  It was probably a tough show for them, as many of these shops are of the "I never pay for software" camp.  However, you end up wasting far more in skilled personnel time if you don't have the right tools for the job.  We use the heck out of their Panorama tool - it pulls metrics from all tiers of your system, including deep in the JVM, and does dynamic baselining, correlation and deviation.  If all your programmers are 3l33t maybe you don't need it, but if you're unsurprised when one of them says "Uhhh... What's a thread leak?" then it's money.

ControlTier is nice, they're a commercial open source CM tool for app deploys - it works at a higher level than chef/puppet, more like capistrano.

EngineYard was a really nice cloud provisioning solution (sits on top of Amazon or whatever).  The reality of cloud computing as provided by the base IaaS vendors isn't really the "machines dynamically spinning up and down and automatically scaling your app" they say it is without something like this (or lots of custom work).  Their solution is, sadly, Rails only right now.  But it is slick, very close to the blue-sky vision of what cloud computing can enable.

And also, I joined the EFF!  Cyber rights now!

You can see most of the official proceedings from the conference (for free!):


Velocity 2009 – Monday Night

After a hearty trip to Gordon Biersch, Peco went to the Ignite battery of five minute presentations, which he said was very good.  I went to two Birds of a Feather sessions, which were not.  The first was a general cloud computing discussion which covered well-trod ground.  The second was by a hapless Sun guy on Olio and Fabian.  No, you don't need to know about them.  It was kinda painful, but I want to commend that Asian guy from Google for diplomatically continuing to try to guide the discussion into something coherent without just rolling over the Sun guy.  Props!

And then - we were lame and just turned in.  I'm getting old, can't party every night like I used to.  (I don't know what Peco's excuse is!)


Velocity 2009 – Hadoop Operations: Managing Big Data Clusters

Hadoop Operations: Managaing Big Data Clusters (see link on that page for preso) was given by Jeff Hammerbacher of Cloudera.

Other good references -
book: "Hadoop: The Definitive Guide"
preso: hadoop cluster management from USENIX 2009

Hadoop is an Apache project inspired by Google's infrastructure; it's software for programming warehouse-scale computers.

It has recently been split into three main subprojects - HDFS, MapReduce, and Hadoop Common - and sports an ecosystem of various smaller subprojects (hive, etc.).

Usually a hadoop cluster is a mess of stock 1 RU servers with 4x1TB SATA disks in them.  "I like my servers like I like my women - cheap and dirty," Jeff did not say.


  • Pools servers into a single hierarchical namespace
  • It's designed for large files, written once/read many times
  • It does checksumming, replication, compression
  • Access is from from Java, C, command line, etc.  Not usually mounted at the OS level.


  • Is a fault tolerant data layer and API for parallel data processing
  • Has a key/value pair model
  • Access is via Java, C++, streaming (for scripts), SQL (Hive), etc
  • Pushes work out to the data


  • Avro (serialization)
  • HBase (like Google BigTable)
  • Hive (SQL interface)
  • Pig (language for dataflow programming)
  • zookeeper (coordination for distrib. systems)

Facebook used scribe (log aggregation tool) to pull a big wad of info into hadoop, published it out to mysql for user dash, to oracle rac for internal...
Yahoo! uses it too.

Sample projects hadoop would be good for - log/message warehouse, database archival store, search team projects (autocomplete), targeted web crawls...
As boxes you can use unused desktops, retired db servers, amazon ec2...

Tools they use to make hadoop include subversion/jira/ant/ivy/junit/hudson/javadoc/forrest
It uses an Apache 2.0 license

Good configs for hadoop:

  • use 7200 rpm sata, ecc ram, 1U servers
  • use linux, ext3 or maybe xfs filesystem, with noatime
  • JBOD disk config, no raid
  • java6_14+

To manage it -

unix utes: sar, iostat, iftop, vmstat, nfsstat, strace, dmesg, friends

java utes: jps, jstack, jconsole
Get the rpm!  www.cloudera.com/hadoop

config: my.cloudera.com
modes - standalong, pseudo-distrib, distrib
"It's nice to use dsh, cfengine/puppet/bcfg2/chef for config managment across a cluster; maybe use scribe for centralized logging"

I love hearing what tools people are using, that's mainly how I find out about new ones!

Common hadoop problems:

  • "It's almost always DNS" - use hostnames
  • open ports
  • distrib ssh keys (expect)
  • write permissions
  • make sure you're using all the disks
  • don't share NFS mounts for large clusters
  • set JAVA_HOME to new jvm (stick to sun's)

HDFS In Depth

1.  NameNode (master)
VERSION file shows data structs, filesystem image (in memory) and edit log (persisted) - if they change, painful upgrade

2.  Secondary NameNode (aka checkpoint node) - checkpoints the FS image and then truncates edit log, usually run on a sep node
New backup node in .21 removes need for NFS mount write for HA

3.  DataNode (workers)
stores data in local fs
stored data into blk_<id> files, round robins through dirs
heartbeat to namenode
raw socket to serve to client

4.  Client (Java HDFS lib)
other stuff (libhdfs) more unstable

hdfs operator utilities

  • safe mode - when it starts up
  • fsck - hadoop version
  • dfsadmin
  • block scanner - runs every 3 wks, has web interface
  • balancer - examines ratio of used to total capacity across the cluster
  • har (like tar) archive - bunch up smaller files
  • distcp - parallel copy utility (uses mapreduce) for big loads
  • quotas

has users, groups, permissions - including x but there is no execution, but used for dirs
hadoop has some access trust issues - used through gateway cluster or in trusted env
audit logs - turn on in log4j.properties

has loads of Web UIs - on namenode go to /metrics, /logLevel, /stacks
non-hdfs access - HDFS proxy to http, or thriftfs
has trash (.Trash in home dir) - turn it on

includes benchmarks - testdfsio, nnbench

Common HDFS problems

  • disk capacity, esp due to log file sizes - crank up reserved space
  • slow but not dead disks and flapping NICS to slow mode
  • checkpointing and backing up metadata - monitor that it happens hourly
  • losing write pipeline for long lived writes - redo every hour is recommended
  • upgrades
  • many small files


use Fair Share or Capacity scheduler
distributed cache
jobcontrol for ordering

Monitoring - They use ganglia, jconsole, nagios and canary jobs for functionality

Question - how much admin resource would you need for hadoop?  Answer - Facebook ops team had 20% of 2 guys hadooping, estimate you can use 1 person/100 nodes

He also notes that this preso and maybe more are on slideshare under "jhammerb."

I thought this presentation was very complete and bad ass, and I may have some use cases that hadoop would be good for coming up!


Velocity 2009 – Introduction to Managed Infrastructure with Puppet

Introduction to Managed Infrastructure with Puppet
by Luke Kanies, Reductive Labs

You can get the work files from git://github.com/reductivelabs/velocity_puppet_workshop_2009.git, and the presentation's available here.

I saw Luke's Puppet talk last year at Velocity 2008, but am more ready to start uptaking some conf management back home.  Our UNIX admins use cfengine, and puppet is supposed to be a better-newer cfengine.  Now there's also an (allegedly) better-newer one called chef I read about lately.  So this should be interesting in helping to orient me to the space.  At lunch, we sat with Luke and found out that Reductive just got their second round funding and were quite happy, though got nervous and prickly when there was too much discussion of whether they were all buying Teslas now.  Congrats Reductive!

Now, to work along, you git the bundle and use it with puppet.  Luke assumes we all have laptops, all have git installed on our laptops, and know how to sync his bundle of goodness down.  And have puppet or can quickly install it.  Bah.  I reckon I'll just follow along.

You can get puppet support via IRC, or the puppet-users google group.

First we exercise "ralsh", the resource abstraction layer shell, which can interact with resources like packages, hosts, and users.  Check em, add em, modify em.

You define abstraction packages.  Like "ssh means ssh on debian, openssh on solaris..."  It requires less redundancy of config than cfengine.

"puppet"  consists of several executables - puppet, ralsh, puppetd, puppetmasterd, and puppetca.

As an aside, cft is a neat config file snapshot thing in red hat.

Anyway, you should use puppet not ralsh directly.  Anyway the syntax is similar.  Here's an example invocation:

puppet -e 'file { "/tmp/eh": ensure => present }'

There's a file backup, or "bucket", functionality when you change/delete files.

You make a repository and can either distribute it or run it all from a server.

There is reporting.

There's a gepetto addon that helps you build a central repo.

A repo has (or should have) modules, which are basically functional groupings.  Modules have "code."  The code can be a class definition.  init.pp is the top/special one.   There's a modulepath setting for puppet.  Load the file, include the class, it runs all the stuff in the class.

It has "nodes" but he scoffs at them.  Put them in manifests/site.pp.  default, or hostname specific (can inherit default).   But you should use a different application, not puppet, to do this.

You have to be able to completely and correctly describe a task for puppet to do it.  This is a feature not a bug.

Puppet uses a client-server pull architecure.  You start a puppetmasterd on a server.  Use the SSH defaults because that's complicated and will hose you eventually.  Then start a puppetd on a client and it'll pull changes from the server.

This is disjointed.  Sorry about that.  The session is really just reading the slide equivalent of man pages while flipping back and forth to a command prompt to run basic examples.  I don't feel like this session gave enough of an intro to puppet, it was just "launch into the man pages and then run individual commands, many of which he tells you to never do."  I don't feel like I'm a lot more informed on puppet than when I started, which makes me sad.  I'm not sure what the target audience for this is.  If it's people totally new to puppet, like me, it starts in the weeds too much.  If it's for someone whohas used puppet, it didn't seem to have many pro tips or design considerations, it was basic command execution.  Anyway, he ran out of time and flipped through the last ten slides in as many seconds.  I'm out!


Velocity 2009 – Death of a Web Server

The first workshop on Monday morning was called Death of a Web Server: A Crisis in Caching.  The presentation itself is downloadable from that link, so follow along!  I took a lot of notes though because much of this was coding and testing, not pure presentation.  (As with all these session writeups, the presenter or other attendees are welcome to chime in and correct me!)  I will italicize my thoughts to differentiate them from the presenter's.

It was given by Richard Campbell from Strangeloop Networks, which makes a hardware device that sits in front of and accelerates .NET sites.

Richard started by outing himself as a Microsoft guy.   He asks, "Who's developing on the Microsoft stack?"  Only one hand goes up out of the hundreds of people in the room.  "Well, this whole demo is in MS, so strap in."  Grumbling begins to either side of me.  I think that in the end, the talk has takeaway points useful to anyone, not just .NET folks, but it is a little off-putting to many.

"Scaling is about operations and development working hand in hand."   We'll hear this same refrain later from other folks, especially Facebook and Flickr.  If only developers weren't all dirty hippies... 🙂

He has a hardware setup with a batch of cute lil' AOpen boxes.  He has a four server farm in a rolly suitcase.  He starts up a load test machine, a web server, and a database; all IIS7, Visual Studio 2008.

We start with a MS reference app, a car classifieds site.  When you jack up the data set to about 10k rows - the developer says "it works fine on my machine."  However, once you deploy it, not so much.

He makes a load test using MS Visual Studio 2008.  Really?  Yep - you can record and playback.  That's a nice "for free" feature.  And it's pretty nice, not super basic; it can simulate browsers and connection speeds.  He likes to run two kinds of load tests,and neither should be short.

  • Step load for 3-4 hrs to test to failure
  • Soak test for 24 hrs to hunt for memory leaks

What does IIS have for built-in instrumentation?  Perfmon.  We also get the full perfmon experience, where every time he restarts the test he has to remove and readd some metrics to get them to collect.  What metrics are the most important?

  • Requests/sec (ASP.NET applications) - your main metric of how much you're serving
  • Reqeusts queued (ASP.NET)  - goes up when out of threads or garbage collecting
  • %processor time - to keep an eye on
  • #bytes in all heaps (.NET CLR memory) - also to keep an eye on

So we see pages served going down to 12/sec at 200 users in the step load, but the web server's fine - the bottleneck is the db.  But "fix the db" is often not feasible.  We run ANTS to find the slow queries, and narrow it to one stored proc.  But we assume we can't do anything about it.  So let's look at caching.

You can cache in your code - he shows us, using _cachelockObject/HttpContext.Current.Cache.Get, a built in .NET cache class.

Say you have a 5s initial load but then caching makes subsequent hits fast.  But multiple first hits contend with each other, so you have to add cache locking.  There's subtle ways to do that right vs wrong.  A common best practice patter he shows is check, lock, check.

We run the load test again.  "If you do not see benefit of a change you make, TAKE THE CODE BACK OUT," he notes.  Also, the harder part is the next steps, deciding how long to cache for, when to clear it.  And that's hard and error-prone; content change based, time based...

Now we are able to get the app up to 700 users, 300 req/sec, and the web server CPU is almost pegged but not quite (prolly out of load test capacity).  Half second page response time.  Nice!  But it turns out that users don't use this the way the load test does and they still say it's slow.  What's wrong?  We built code to the test.  Users are doing various things, not the one single (and easily cacheable) operation our test does.

You can take logs and run them through webtrace to generate sessions/scenarios.  But there's not quite enough info in the logs to reproduce the hits.  You have to craft the requests more after that.

Now we make a load test with variety of different data (data driven load test w/parameter variation), running the same kinds of searches customers are.  Whoops, suddenly the web server cpu is low and we see steady queued requests.  200 req/sec.  Give it some time - caches build up for 45 mins, heap memory grows till it gets garbage collected.

As a side note, he says "We love Dell 1950s, and one of those should do 50-100 req per sec."

How much memory "should" an app server consume for .NET?  Well, out of the gate, 4 GB RAM really = 3.3, then Windows and IIS want some...  In the end you're left with less than 1 GB of usable heap on a 32-bit box.  Once you get to a certain level (about 800 MB), garbage collection panics.  You can set stuff to disposable in a crisis but that still generates problems when your cache suddenly flushes.

  • 64 bit OS w/4 GB yields 1.3 GB usable heap
  • 64 bit OS w/8 GB, app in 32-bit mode yields 4 GB usable heap (best case)

So now what?  Instrumentation; we need more visibility. He adds a Dictionary object to log how many times a given cache object gets used.  Just increment a counter on the key.  You can then log it, make a Web page to dump the dict on demand, etc.  These all affect performance however.

They had a problem with an app w/intermittent deadlocks, and turned on profiling - then there were no deadlocks because of observer effect.  "Don't turn it off!"  They altered the order of some things to change timing.

We run the instrumented version, and check stats to ensure that there's no major change from the instrumentation itself.  Looking at cache page - the app is caching a lot o fcontent that's not getting reused ever.  There are enough unique searches that they're messing with the cache.  Looking into the logs and content items to determine why this is, there's an advanced search that sets different price ranges etc.  You can do logic to try to exclude "uncachable" items from the cache.  This removes memory waste but doesn't make the app any faster.

We try a new cache approach.  .NET caching has various options - duration and priority.  Short duration caching can be a good approach.  You get the majority of the benefit - even 30s of caching for something getting hit several times a second is nice.  So we switch from 90 minute to 30 second cache expiry to get better (more controlled) memory consumption.  This is with a "flat" time window - now, how about a sliding window that resets each time the content is hit?  Well, you get longer caching but then you get the "content changed" invalidation issue.

He asks a Microsoft code-stunned room about what stacks they do use instead of .NET, if there's similar stuff there...  Speaking for ourselves, I know our programmers have custom implemented a cache like this in Java, and we also are looking at "front side" proxy caching.

Anyway, we still have our performance problem in the sample app.  Adding another Web server won't help, as the bottleneck is still the db.  Often our fixes create new other problems (like caching vs memory).  And here we end - a little anticlimactically.

Class questions/comments:
What about multiserver caching?  So far this is read-only, and not synced across servers.  The default .NET cache is not all that smart.  MS is working on a new library called, ironically, "velocity" that looks a lot like memcached and will do cross-server caching.

What about read/write caching?  You can do asynchronous cache swapping for some things but it's memory intensive.  Read-write caches are rarer- Oracle/Tangosol Coherence and Terracotta are the big boys there.

Root speed -  At some point you also have to address the core query, it can't take 10 seconds or even caching cant' save you.  Prepopulating the cache can help but you have to remember invalidations, cache clearing events, etc.

Four step APM process:

  1. Diagnosis is most challenging part of performance optimization
  2. Use facts - instrument your application to know exactly what's up
  3. Theorize probable cause then prove it
  4. Consider a variety of solutions

Peco has a bigger twelve-step more detailed APM process he should post about here sometime.

Another side note, sticky sessions suck...  Try not to use them ever.

What tools do people use?

  • Hand written log replayers
  • Spirent avalanche
  • wcat (MS tool, free)

I note that we use LoadRunner and a custom log replayer.  Sounds like everyone has to make custom log replayers, which is stupid, we've been telling every one of our suppliers in at all related fields to build one.  One guy records with a proxy then replays with ec2 instances and a tool called "siege" (by Joe Dog).  There's more discussion on this point - everyone agrees we need someone to make this damn product.

"What about Ajax?"  Well, MS has a "fake" ajax that really does it all server side.  It makes for horrid performance.  Don't use that.  Real ajax keeps the user entertained but the server does more work overall.

An ending quip repeating an earlier point - you should not be proud of 5 req/sec - 50-100 should be possible with a dynamic application.

And that's the workshop.  A little microsofty but had some decent takeaways I thought.


The Velocity 2009 Conference Experience

Velocity 2009 is well underway and going great!  Here's my blow by blow of how it went down.

Peco, my erstwhile Bulgarian comrade, and I came in to San Jose  from Austin on Sunday.  We got situated at the fairly swank hotel, the Fairmont, and wandered out to find food.  There was some festival going on so the area was really hopping.  After a bit of wandering, we had a reasonably tasty dinner at Original Joe's.  Then we walked around the cool pedestrian part of downtown San Jose and ended up watching "Terminator:  Salvation" at a neighborhood movie theater.

We went down at 8  AM the next morning for registration.  We saw good ol' Steve Souders, and hooked up with a big crew from BazaarVoice, a local Austin startup that's doing well.  (P.S. I don't know who that hot brunette is in the lead image on their home page, but I can clearly tell that she wants me!)

This first day is an optional "workshop" day with a number of in depth 90 minute sessions.  There were two tracks, operations and performance.   Mostly I covered ops and Peco covered performance.  Next time - the first session!


Velocity 2009 – The Web Performance and Operations Conference

You're in luck!  Peco and I are attending Velocity 2009 and we'll be taking notes and blogging about the conference.  You can see what to expect by going back and reading my coverage of Velocity 2008!

As Web Admins, we love Velocity.  Usually, we have to bottom-feed at more generalized conferences looking for good relevant content on systems engineering.  This is the only conference that is targeted right at us, and has a dual focus of performance and operations.  The economy's hitting us hard this year and we could only do one conference - so this is the one we picked.

Look for full coverage on the sessions to come!


Thoughts on the TRISC 2009 Conference

This was my third consecutive year attending the TRISC Conference and it gets better and better every year.  This year, the location was outstanding, the presenters were top-notch, and the Keynotes were pretty good.  This was my first time actually presenting at the TRISC Conference and I thought they did an excellent job from the presenter point-of-view as well.  They kept the presentations on time, they had my notes all printed up and ready for attendees, and A/V equipment worked well.  No complaints from me there.

My favorite Keynote speaker was far and away Johnny Long.  His talk was on "No Tech Hacking" and he is as entertaining as he is talented.  If you ever get a chance to see him speak, definitely do so.  Also, be sure to check out his website at IHackCharities.org.

My least favorite Keynote speaker was Ken Watson.  He spoke all monotone and the presentation on these centers around the country that the government is using to team up with industry to prevent attacks on critical infrastructure was pretty lame.  I guess I just expected more and from talking with others it seems like I'm not alone.

My favorite presentation was Robert Hansen and Rob MacDougal's talk on "Assessing Your Web App Manually Without Hacking It".  It was a simple concept that everyone from managers to developers to IT guys can follow to get an idea as to how many vulnerabilities their application might contain.  RSnake!

My least favorite presentation was "The Importance of Log Management in Today's Insecure World" by Ricky Allen and Randy Holloway from ArcSite.  Too vendory, not technical enough, and kinda a lame presentation in general.  Maybe I'm just bitter because I heard that the other presentations that took place while I was in this session were really good.

This was the first year that TRISC had a Casino Night and it was awesome.  I played Texas Hold 'Em most of the night and took Nathan Sportsman's money and a bunch of Rob MacDougal's as well.  They had Roulette, Blackjack, and Craps tables there as well and the goal was to start with $10,000 in chips and for every $5,000 you had at the end of the night you got a raffle ticket.  I ended up with over $40,000 and 9 raffle tickets and won three different items.  Score.

Overall, TRISC 2009 was not the best conference that I've ever attended, but was certainly the best TRISC to date.  I was very impressed and am looking forward to next year.  FYI, all presentations from the conference are online and available for viewing here.


OWASP Google Hacking Project – OWASP AppSec NYC 2008

This presentation is by Christian Heinrich, the project leader for the OWASP "Google Hacking" project.  Presentation published on http://www.slideshare.net/cmlh  Dual licensed under OWASP License and AU Creative Commons 2.5.

OWASP Testing Guide v3 - Spiders/Robots/Crawlers

1. Automatically traverses hyperlinks

2. Recursively retrieves content referenced

Behavior governed by the robots exclusion protocol.  New method is <META NAME="Googlebot" CONTENT="nofollow">  Not supported by all Robots/Spiders/Crawlers.  Traditional method is robots.txt located in web root directory.  Regular expressions supported by minority only.  "User-agent: *" applies to all spiders/robots/crawlers or you can specify a specific robot name.  Can be intentionally ignored.  Not for httpd access control or digital rights management.

Testing - Robots Exclusion Protocol

  1. Sign into Google Webmaster Tools
  2. On the dashboard, click the URL
  3. Click "Tools"
  4. Click "Analyze robots.txt"

Search Engine Discovery

Microsoft Remote Desktop Web Connection: intitle:Remote.Desktop.Web.Connection inurl: tsweb

VNC: "VNC Desktop" inurl:5800

Outlook Web Access: inurl:"exchange/logon.asp"

Outlook Web Access: intitle:"Microsoft Outlook Web Access - Logon"

Adobe Acrobat PDF: filetype:pdf

Google caught onto this and is now displaying a "We're sorry" message with certain searches.  To get around, use different search queries that returns overlapping results.

Google Advanced Search Operators: "site:" and "cache:"  Two ways of using "site:".  EIther as "site:www.google.com" where you get that specific subdomain's results or "site:google.com" where you get all hostnames and subdomains. Use "cache:www.owasp.org" to display an indexed web page in the google cache.  There is also a site operator labeled "Cached" which will do the same thing.

You can get updates of the latest relevant Google results (web, news, etc) using Google Alerts.

Download Indexed Cache

Google SOAP Search API.  Query limited to either 10 words or 2048 bytes.  One thousand search queries per day and limited to search results within 0-999.  Up to 10K possible results from 10 different search queries.

$Google_SOA_Search_API -> doGoogleSearch( $key, $q, $start, $maxResults, $filter, $restricts, $safeSearch, $lr, $ie, $oe );

See presentation for response.

Proof of concept tool is "dic.pl" or "Download Indexed Cache" that downloads the search results.  Licensed under the Apache License 2.0.  Tool produces a URL and cachedSize response.

OWASP Google Hacking Project

Tools built using Perl using CPAN Modules SOAP::Lite, Net::Google, and Perl::Critic.  Development environmetn is based on Eclipse with EPIC Plug-in.  Subversion repository is at code.google.com.


Upcoming presentations at ToorCon X in San Diego, SecTor 2008 in Toronto, Canada, and RUXCON 2K8 in Sydney, Australia.

"TCP Input Text" Proof of Concept

"Speak English" Google Translate Workaround

Refactor and 3rd Project review of PoC Perl Code with public release at RUXCON 2K8 in November 2008.

Check in at code.google.com after RUXCON 2K8

4 hr "half day" training course Q1 2009


Day 1 Keynote – OWASP AppSec NYC 2008

I'm currently at the OWASP AppSec 2008 Conference in New York City and am listening to the keynote presentation shared by the board of OWASP.  Starting off is Jeff Williams, Chair of OWASP.  He talked about OWASP's mission, what we're currently working on, and offered the following suggestions on how to take OWASP into the future:

1) Prioritize

  • You can't "hack" code secure.
  • Use risk metrics.

2) Set a useful research agenda

  • Don't spend time searching for obscure vulnerabilities
  • Create tools that verify that software does the RIGHT thing instead of just looking for problems.

3) Turn application security from a black art to a science

  • OWASP in School program
  • Translating OWASP Top 10 and various books and projects into other languages.
  • Printing guides, books, and manuals for cost of printing.  Free downloads online.

4) We can enable secure coding

  • Breaking things is easy, try creating something secure and tell people how you did it.
  • Check out the OWASP Enterprise Security API Project
  • Increased visibility (software should provide info on who built it, what libraries they used, etc)

5) Make application security into a movement

  • Evangelize application security
  • Show people what an application security program looks like

Next up was Dave Wichers.  He talked about the OWASP goals of improving quality and support.  OWASP is publishing a "desk reference" guide on application security.  Community outreach is a huge focus of OWASP.  Over 100 chapters around the world.  Dave is the Conference Chair and helps to organize these conferences.  Let him know if you're interested in putting one on.

Tom Brennan, head of NY/NJ chapter and OWASP Board Member starts talking about over 10,000 members on the mailing list and over 120 chapters involved in OWASP effort.  Says you should get involved in OWASP!

Next up is Dinis Cruz, another board member, who says he comes up with all sorts of crazy ideas for OWASP.  Helped come up with the OWASP Grants ideas when the Belgium chapter had extra money in the bank.  OWASP Spring of Code 2007 sponsored 26 projects at $125,000.  Summor of Code 2008 has 31 grants and they are focusing on quality with reviewers, project managers, etc.  OWASP has given out over $250,000 in grants since the Seasons of Code project started.  Then he started talking about the OWASP EU Summit happening in Portugal in 2008 in November.  Nice hotel by the seafront.  Go to meet all of the guys who are influential in OWASP.  Coming up with a bunch of training courses that are completely OWASP related and mostly done by our leaders.  Lots of working sessions to start discussing projects and set the AppSec agenda for 2009.  Five nights at a 5 star hotel for 300 Euros if you share a room or 600 euros if you want a single.  It's a deal!  If you're at the conference, they're giving out free books.

Last up is Sebastian Deleersnyder who compares OWASP to Second Life.  A lot of people doing this as a second job, but it's also a virtual community.  Asks chapter leaders to stand up and everyone gives them a hand.  *pats self on the back*  End of keynote.