Web Admin Blog Real Web Admins. Real World Experience.

5Mar/106

Microsoft Azure for Dummies – or for Smarties?

What Is Microsoft Azure?

I'm going to attempt to explain Microsoft Azure in "normal Web person" language.  Like many of you, I am more familiar with Linux/open source type solutions, and like many of you, my first forays into cloud computing have been with Amazon Web Services.  It can often be hard for people not steeped in Redmondese to understand exactly what the heck they're talking about when Microsoft people try to explain their offerings.  (I remember a time some years ago I was trying to get a guy to explain some new Microsoft data access thing with the usual three letter acronym name.  I asked, "Is it a library?  A language?  A protocol?  A daemon?  Branding?  What exactly is this thing you're trying to get me to uptake?"  The reply was invariably "It's an innovative new way to access data!"  Sigh.  I never did get an answer and concluded "Never mind.")

Microsoft has released their new cloud offering, Azure.  Our company is a close Microsoft partner since we use a lot of their technologies in developing our company's desktop software products, so as "cloud guy" I've gotten some in depth briefings and even went to PDC this year to learn more (some of my friends who have known me over the course of my 15 years of UNIX administration were horrified).  "Cloud computing" is an overloaded enough term that it's not highly descriptive and it took a while to cut through the explanations to understand what Azure really is.  Let me break it down for you and explain the deal.

Point of Comparison: Amazon (IaaS)

In Amazon EC2, as hopefully everyone knows by now, you are basically given entire dynamically-provisioned, hourly-billed virtual machines that you load OSes on and install software and all that.  "Like servers, but somewhere out in the ether."  Those kinds of cloud offerings (e.g. Amazon, Rackspace, most of them really) are called Infrastructure As A Service (IaaS).  You're responsible for everything you normally would be, except for the data center work.  Azure is not an IaaS offering but still bears a lot of similarities to Amazon; I'll get into details later.

Point of Comparison: Google App Engine (PaaS)

Take Google's App Engine as another point of comparison.  There, you just upload your Python or Java application to their portal and "it runs on the Web."  You don't have access to the server or OS or disk or anything.  And it "magically" scales for you.  This approach is called Platform as a Service (PaaS).   They provide the full platform stack, you only provide the end application.  On the one hand, you don't have to mess with OS level stuff - if you are just a Java programmer, you don't have to know a single UNIX (or Windows) command to transition your app from "But it works in Eclipse!" to running on a Web server on the Internet.  On the other hand, that comes with a lot of limitations that the PaaS providers have to establish to make everything play together nicely.  One of our early App Engine experiences was sad - one of our developers wrote a Java app that used a free XML library to parse some XML.  Well, that library had functionality in it (that we weren't using) that could write XML to disk.  You can't write to disk in App Engine, so its response was to disallow the entire library.  The app didn't work and had to be heavily rewritten.  So it's pretty good for code that you are writing EVERY SINGLE LINE OF YOURSELF.  Azure isn't quite as restrictive as App Engine, but it has some of that flavor.

Azure's Model

Windows Azure falls between the two.  First of all, Azure is a real "hosted cloud" like Amazon Web Services, like most of us really think about when we think cloud computing; it's not one of these on premise things that companies are branding as "cloud" just for kicks. That's important to say because it seems like nowadays the larger the company, the more they are deliberately diluting the term "cloud" to stick their products under its aegis.  Microsoft isn't doing that, this is a "cloud offering" in the classical (where classical means 2008, I guess) sense.

However, in a number of important ways it's not like Amazon.  I'd definitely classify it as a PaaS offering.  You upload your code to "Roles" which are basically containers that run your application in a Windows 2008(ish) environment.  (There are two types - a "Web role" has a stripped down IIS provided on it, a "Worker role" doesn't - the only real difference between the two.)  You do not have raw OS access, and cannot do things like write to the registry.  But, it is less restrictive than App Engine.  You can bundle up other stuff to run in Azure - even run Java apps using Apache Tomcat.  You have to be able to install whatever you want to run "xcopy only" - in other words, no fancy installers, it needs to be something you could just copy the files to a Windows PC, without administrative privilege, and run a command from the command line and have it work.  Luckily, Tomcat/Java fits that description. They have helper packs to facilitate doing this with Tomcat, memcached, and Apache/PHP/MediaWiki.  At PDC they demoed Domino's Pizza running their Java order app on it and a WordPress blog running on it.  So it's not only for .NET programmers.  Managed code is easier to deploy, but you can deploy and run about anything that fits the "copy and run command line" model.

I find this approach a little ironic actually.  It's been a lot easier for us to get the Java and open source (well, the ones with Windows ports) parts of our infrastructure running on Azure than Windows parts!  Everybody provides Windows stuff with an installer, of course, and you can't run installers on Azure.  Anyway, in its core computing model it's like Google App Engine - it's more flexible than that (g00d) but it doesn't do automatic scaling (bad).  If it did autoscaling I'd be willing to say "It's better than App Engine in every way."

In other ways, it's a lot like Amazon.  They offer a variety of storage options - blobs (like S3), tables (like mySQL), queues (like SQS), drives (like EBS).  They have an integral CDN.  They do hourly billing.  Pricing is pretty similar to Amazon - it's hard to totally equate apples to apples, but Azure compute is $0.12/hr and an Amazon small Windows image compute is $0.12/hr (Coincidence?  I think not.).  And you have to figure out scaling and provisioning yourself on Amazon too - or pay a lot of scratch to one of the provisioning companies like RightScale.

What's Unique and Different

Well, the largest thing that I've already mentioned is the PaaS approach.  If you need OS level access, you're out of luck;  if you don't want to have to mess with OS management, you're in luck!  So to the first order of magnitude, you can think of Azure as "like Amazon Web Services, but the compute uses more of a Google App Engine model."

But wait, there's more!

One of the biggest things that Azure brings to the table is that, using Visual Studio, you can run a local Azure "fabric" on your PC, which means you can develop, test, and run cloud apps locally without having to upload to the cloud and incur usage charges.  This is HUGE.  One of the biggest pains about programming for Amazon, for instance, is that if you want to exercise any of their APIs, you have to do it "up there."  Also, you can't move images back and forth between Amazon and on premise.  Now, there are efforts like EUCALYPTUS that try to overcome some of this problem but in the end you pretty much just have to throw in the towel and do all dev and test up in the cloud.  Amazon and Eclipse (and maybe Xen) - get together and make it happen!!!!

Here's something else interesting.  In a move that seems more like a decision from a typical cranky cult-of-personality open source project, they have decided that proper Web apps need to be asynchronous and message-driven, and by God that's what you're going to do.  Their load balancers won't do sticky sessions (only round robin) and time out all connections between all tiers after 60 seconds without exception.  If you need more than that, tough - rewrite your app to use a multi-tier message queue/event listener model.  Now on the one hand, it's hard for me to disagree with that - I've been sweating our developers, telling them that's the correct best-practice model for scalability on the Web.  But again you're faced with the "Well what if I'm using some preexisting software and that's not how it's architected?" problem.  This is the typical PaaS pattern of "it's great, if you're writing every line of code yourself."

In many ways, Azure is meant to be very developer friendly.  In a lot of ways that's good.  As a system admin, however, I wince every time they go on about "You can deploy your app to Azure just by right clicking in Visual Studio!!!"  Of course, that's not how anyone with a responsibly controlled production environment would do it, but it certainly does make for fast easy adoption in development.   The curve for a developer who is "just" a C++/Java/.NET/whatever wrangler to get up and going on an IaaS solution like Amazon is pretty large comparatively; here, it's "go sign up for an account and then click to deploy from your IDE, and voila it's running on the Intertubes."  So it's a qualified good - it puts more pressure on you as an ops person to go get the developers to understand why they need to utilize your services.  (In a traditional server environment, they have to go through you to get their code deployed.)  Often, for good or ill, we use the release process as a touchstone to also engage developers on other aspects of their code that need to be systems engineered better.

Now, that's my view of the major differences.  I think the usual Azure sales pitch would say something different - I've forgotten two of their huge differentiators, their service bus and access control components.  They are branded under the name "AppFabric," which as usual is a name Microsoft is also using for something else completely different (a new true app server for Windows Server, including projects formerly code named Dublin and Velocity - think of it as a real WebLogic/WebSphere type app server plus memcache.)

Their service bus is an ESB.  As alluded to above, you're going to want to use it to do messaging.   You can also use Azure Queues, which is a little confusing because the ESB is also a message queue - I'm not clear on their intended differentiation really.  You can of course just load up an ESB yourself in any other IaaS cloud solution too, so if you really want one you could do e.g. Apache ServiceMix hosted on Amazon.  But, they are managing this one for you which is a plus.  You will need to use it to do many of the common things you'd want to do.

Their access control - is a mess.  Sorry, Microsoft guys.  The whole rest of the thing, I've managed to cut through the "Microsoft acronyms versus the rest of the world's terms and definitions" factor, but not here.   "You see, you use ACS's WIF STS to generate a SWT," says our Microsoft rep with a straight face.   They seem to be excited that it will use people's Microsoft Live IDs, so if you want people to have logins to your site and you don't want to manage any of that, it is probably nice.  It takes SAML tokens too, I think, though I'm not sure if the caveats around that end up equating to "Well, not really."  Anyway, their explanations have been incoherent so far and I'm not smelling anything I'm really interested in behind it.  But there's nothing to prevent you from just using LDAP and your own Internet SSO/federation solution.  I don't count this against Microsoft because no one else provides anything like this, so even if I ignore the Azure one it doesn't put it behind any other solution.

The Future

Microsoft has said they plan to add on some kind of VM/IaaS offering eventually because of the demand.  For us, the PaaS approach is a bit of a drawback - we want to do all kinds of things like "virus scan uploaded files," "run a good load balancer," "run an LDAP server", and other things that basically require more full OS access.  I think we may have an LDAP direction with the all-Java OpenDS, but it's a pain point in general.

I think a lot of their decisions that are a short term pain in the ass (no installs, no synchronous) are actually good in the long term.  If all developers knew how to develop async and did it by default, and if all software vendors, even Windows based ones, provided their product in a form that could just be "copy and run without admin privs" to install, the world would be a better place.  That's interesting in that "Sure it's hard to use now but it'll make the world better eventually" is usually heard from the other side of the aisle.

Conclusion

Azure's a pretty legit offering!  And I'm very impressed by their velocity.  I think it's fair to say that overall Azure isn't quite as good as Amazon except for specific use cases (you're writing it all in .NET by hand in Visual Studio) - but no one else is as good as Amazon either (believe me, I evaluated them) and Amazon has years of head start; Azure is brand new but already at about 80%! That puts them into the top 5 out of the gate.

Without an IaaS component, you still can't do everything under the sun in Azure.  But if you're not depending on much in the way of big third party software chunks, it's feasible; if you're doing .NET programming, it's very compelling.

Do note that I haven't focused too much on the attributes and limitations of cloud computing in general here - that's another topic - this article is meant to compare and contrast Azure to other cloud offerings so that people can understand its architecture.

I hope that was clear.  Feel free and ask questions in the comments and I'll try to clarify!

31Oct/081

Using Proxies to Secure Applications and More

I've been really surprised that for as long as I've been active with OWASP, I've never seen a proxy presentation.  After all, they are hugely beneficial in doing web application penetration testing and they're really not that difficult to use.  Take TamperData for example.  It's just a firefox plugin, but it does header, cookie, get, and post manipulation just as well as WebScarab.  Or Google Ratproxy, which works in the background while you browse around QA'ing your web site and gives you a nice actionable report when you're done.  I decided it was time to educate my peers on the awesomeness of proxies.

This past Tuesday I presented to a crowd of about 35 people at the Austin OWASP Meeting.  The title of my presentation was "Using Proxies to Secure Applications and More".  Since so many people came up to me afterward telling me what a great presentation it was and how they learned something they can take back to the office, I decided (with a little insistance from Ernest) that it was worth putting up on SlideShare and posting to the Web Admin Blog.

The presentation starts off with a brief description of what a proxy is.  Then, I talked about the different types of proxies.  Then, the bulk of the presentation was just me giving examples and demonstrating the various proxies.  I included anonymizing proxies, reverse proxies, and intercepting proxies.  While my slides can't substitue for the actual demo, I did try to include in them what tool I used for the demo.  If you have any specific questions, please let me know.  All that said, here's the presentation.

24Sep/080

OWASP Google Hacking Project – OWASP AppSec NYC 2008

This presentation is by Christian Heinrich, the project leader for the OWASP "Google Hacking" project.  Presentation published on http://www.slideshare.net/cmlh  Dual licensed under OWASP License and AU Creative Commons 2.5.

OWASP Testing Guide v3 - Spiders/Robots/Crawlers

1. Automatically traverses hyperlinks

2. Recursively retrieves content referenced

Behavior governed by the robots exclusion protocol.  New method is <META NAME="Googlebot" CONTENT="nofollow">  Not supported by all Robots/Spiders/Crawlers.  Traditional method is robots.txt located in web root directory.  Regular expressions supported by minority only.  "User-agent: *" applies to all spiders/robots/crawlers or you can specify a specific robot name.  Can be intentionally ignored.  Not for httpd access control or digital rights management.

Testing - Robots Exclusion Protocol

  1. Sign into Google Webmaster Tools
  2. On the dashboard, click the URL
  3. Click "Tools"
  4. Click "Analyze robots.txt"

Search Engine Discovery

Microsoft Remote Desktop Web Connection: intitle:Remote.Desktop.Web.Connection inurl: tsweb

VNC: "VNC Desktop" inurl:5800

Outlook Web Access: inurl:"exchange/logon.asp"

Outlook Web Access: intitle:"Microsoft Outlook Web Access - Logon"

Adobe Acrobat PDF: filetype:pdf

Google caught onto this and is now displaying a "We're sorry" message with certain searches.  To get around, use different search queries that returns overlapping results.

Google Advanced Search Operators: "site:" and "cache:"  Two ways of using "site:".  EIther as "site:www.google.com" where you get that specific subdomain's results or "site:google.com" where you get all hostnames and subdomains. Use "cache:www.owasp.org" to display an indexed web page in the google cache.  There is also a site operator labeled "Cached" which will do the same thing.

You can get updates of the latest relevant Google results (web, news, etc) using Google Alerts.

Download Indexed Cache

Google SOAP Search API.  Query limited to either 10 words or 2048 bytes.  One thousand search queries per day and limited to search results within 0-999.  Up to 10K possible results from 10 different search queries.

$Google_SOA_Search_API -> doGoogleSearch( $key, $q, $start, $maxResults, $filter, $restricts, $safeSearch, $lr, $ie, $oe );

See presentation for response.

Proof of concept tool is "dic.pl" or "Download Indexed Cache" that downloads the search results.  Licensed under the Apache License 2.0.  Tool produces a URL and cachedSize response.

OWASP Google Hacking Project

Tools built using Perl using CPAN Modules SOAP::Lite, Net::Google, and Perl::Critic.  Development environmetn is based on Eclipse with EPIC Plug-in.  Subversion repository is at code.google.com.

Roadmap

Upcoming presentations at ToorCon X in San Diego, SecTor 2008 in Toronto, Canada, and RUXCON 2K8 in Sydney, Australia.

"TCP Input Text" Proof of Concept

"Speak English" Google Translate Workaround

Refactor and 3rd Project review of PoC Perl Code with public release at RUXCON 2K8 in November 2008.

Check in at code.google.com after RUXCON 2K8

4 hr "half day" training course Q1 2009

24Aug/083

Two Simple Ways to Read Restricted Website Content

Have you ever had a problem that you used a search engine to try to find the solution?  Did that search bring you results from a site that then forced you to register in order to see the content?  This happened to me all of the time before I found two simple ways to display that content without me having to register at all.

Let me begin by explaining the why before I tell you the how.  In order for a search engine to index a site's content, it needs to be able to see that content.  The webmasters of that site are eager to let the search engine see the content as they know it will drive additional visitors to their site.  The end result is that they have to find a way for the search engine to see the content, while at the same time obscuring it from the view of the average user.  Most of the time they do this by keying off of the browser's USER AGENT.  This creates a loophole for us to exploit since if Google is able to see the search engine results, then so can we.  Here's my two tricks to see the restricted content:

22Jul/085

Google Ratproxy

If you are responsible for developing or maintaining a website and haven't checked out Ratproxy yet, you're missing out. Before I start spouting off about just how cool and useful this tool is, I suppose I should first tell you what a proxy is. In a nutshell, a proxy is an application that runs local on your computer and intercepts requests and responses between your web browser and the web server. In almost all cases, the proxy has the ability to manipulate the conversation going on between the two. Things like modifying your cookies, changing POST and GET parameters, and finding hidden fields are made uber-easy with the assistance of a proxy.