Even though the Drobo is supposed to be a pretty rock-solid tool for backing up your files, there are still plenty of reasons why one would want to keep a copy of those files elsewhere just in case. For example, what would happen if there is a fire and your Drobo is damaged. Are you OK with losing everything? I've even heard about the rare case where the Drobo drives get out of sync and a complete reformat is necessary; causing you to lose everything. To prevent this, it is a good idea to install the Crashplan Drobo app and ensure that a copy of your data is recoverable, even if the worst case scenario happens with your Drobo.
If you do as I mention above, chances are that things will work well for a while and then suddenly, one day, you will find that Crashplan is no longer running on your Drobo. Despite multiple attempts to start it back up, you will inevitably find yourself staring at a message saying either "crashplan is enabled and stopped" or "crashplan is disabled and stopped" and will be clueless, like I was, in terms of how to fix it. The good news is that after months of struggling with this, I finally came across a post on the DroboSpace forums from the guy who packages the Crashplan application for Drobo. It was a bit cryptic at first, but eventually I was able to interpret what he was saying and I wanted to share it with everyone in a bit more layman's terms.
The underlying issue here is that Crashplan is configured to automatically upgrade itself on the Drobo. When this happens, it downloads the replacement files and goes to run the upgrade script. Unfortunately, the Crashplan team does not write the upgrade script to work in the Busybox environment (the one that runs on your Drobo) and the script breaks. By tweaking the script ever so slightly, you can get it to run the upgrade and Crashplan will once again start up on your Drobo. Here are the steps to do it:
- SSH into your Drobo with the command "ssh -L 4201:localhost:4243 root@[your Drobo IP]"
- Take a look at the /tmp/DroboApps/crashplan/log.txt file and you'll probably see a message saying something like "Error: Could not find or load main class com.backup42.service.CPService"
- Go to the crashplan upgrade directory with the command "cd /mnt/DroboFS/Shares/DroboApps/crashplan/app/upgrade"
- Here you will see a directory whose name looks like a random value like "1388642400364.1415587076262". I believe you should see a new one of these directories for each version you are upgrading to. Change to that directory using the command "cd 1388642400364.1415587076262" substituting for whatever directory name you see.
- Edit the upgrade.sh script inside that directory. You want to change the "rm -fv" line to "rm -f" and the "mv -fv" line to "mv -f". You will also want to search for the two lines that start with "/bin/ps -ef" and change them to use "/bin/ps w" instead. Save the file.
- Change the permissions on the upgrade.sh script to make it executable with the command "chmod +x upgrade.sh".
- Run the upgrade.sh script with the command "./upgrade.sh".
When the script completes, you should be back at the terminal prompt. From here, you can go back to the /mnt/DroboFS/Shares/DroboApps/crashplan directory and try starting Crashplan using the command "./service.sh start". Check that it is "enabled and running" by running the command "./service.sh status" to get the status. You may have to run through steps 4-7 multiple times based on how many upgrades back you are, but when all is said and done, you should be back up and running with Crashplan on your Drobo. Good luck!
I absolutely love my job and one of the coolest things about what I do is getting to do proof-of-concepts with bleeding edge technology. I feel very privileged that many companies out there respect me enough to provide me with these opportunities and I feel that engaging on this level enables me to be a better security practitioner because I routinely have my finger on the pulse of the latest and greatest tools out there. The problem that I run into, however, is when vendors present me "enterprise ready" tools that are clearly not enterprise ready. Maybe it's a cool concept, a working prototype, or even a functional system. The problem is that "enterprise ready" assumes so much more than just a product that does some stuff as advertised. To me, at least, it assumes a product that can be easily transitioned to the company's IT team for long-term support of the platform. Here are some signs to look out for that will tell you if the tool is truly ready for the big show:
- Installation Process: This one could honestly go either way. Personally, I prefer to have a product that I can install and configure myself. I cringe every time I hear a vendor mention to me that professional services are involved in an installation. I get it, sometimes a tool is so highly customized to your environment that you need help, but the majority of the products I use on a daily basis aren't that way. If installing a product requires a day of professional services time, then this should probably be your first signal to at least start looking out for the following additional signs.
- Initialization Script: I honestly feel a bit silly even having to mention this as I would assume this to be a standard part of any enterprise product, but it's not. If I have to poke around in the installation directory looking for the right script to run to start or stop your product, then it's not enterprise ready. Even worse, if it's a more complex product that requires starting multiple pieces and you don't have a single init script to handle the startup and shutdown in the proper order, then your product is not enterprise ready. If you're trying to sell me something to make my life as a security professional easier, then I should spend my time using your tool instead of figuring out how to start and stop it.
- Release Notifications: If I buy a product from you and I'm paying you for support, then, I'm typically doing so with the intention that I will be able to move to the next version once it is released. Maybe it's because there are bugs that need to be fix or because there is new functionality, but whatever the reason, I want to know when that version becomes available. I'll talk a bit more about the upgrade process itself in the next bullet, but if the company does not have a way to notify you when a new release is available, be wary.
- Defined Upgrade Process: Have you ever used a tool that you thought was completely awesome until the first time that an upgrade rolled around? They tell you copy these files over and it breaks. Now, run this script and it fails. You engage support and spend hours on the phone with them and then a week later they offer a webex where a support person will take care of the upgrade for you. I had to ditch a really interesting tool a while back for this very reason and I'm currently dealing with another one where every upgrade requires a support person to come onsite. It's a completely ineffective use of both my time and theirs. When I designed SimpleRisk, one of the first things I considered was how to make it as simple as possible for a remote person to upgrade the tool without assistance. I've at least got it down to copying some files and running a script which anyone can do. Even better are the companies where it's click a button to upgrade. Better still are the companies that just automatically do the upgrade for you. In any case, be wary of any upgrade processes that are not well-defined.
- Backup Plan: This may not apply to all products or all scenarios, but it's a good idea when evaluating a product to ask yourself how you will back up the data and recover it if a disaster ever strikes. If the answer is "We'd just wipe and reinstall", then cool, but if the answer is "F*ck, I don't know", it may be worth having that discussion with the vendor.
- Monitoring: Nothing bothers me more than when I'm all excited to use my shiny new toy and when I go to log in it's down. In reality, I should know it's down when it happens because there's a high likelihood that the tool isn't doing what it's supposed to if it's not running. Ask your vendor what you should be monitoring in order to ensure that the tool is functioning properly. If they don't have a good answer for you, be wary.
- Product Roadmap: When you purchase a product, you purchase it not only for what it's capable of doing for you today, but also for the opportunities that it will provide you with tomorrow. Ask the vendor about their product roadmap to see if it's in-line with your vision of how you intend to use the product. Are there features that you can use down the line? More importantly, do they have plans to continue to invest in the platform that they are selling you or is it just major bug fixes at this point while they move on to something else. If the vendor can't give you a straight answer to this question, then you may have problems.
Don't get me wrong. There are plenty of tools out there that fail one or more of these signs and that doesn't mean that you should completely avoid them, but you shouldn't expect to pay a premium for them either. Hopefully the vendor is being honest with themselves and labeling it as "Beta" while they work to iron these things out. If not, you should be honest with them about your willingness to accept a product that is not "enterprise ready". Perhaps you're willing to accept a little bit of pain for a smaller price tag. Maybe you want to be able to brag to your peers that you were the first to have that product hotness. Whatever the reason, just make sure that you are aware of what you're getting into up front.
A couple of years ago I decided, along with support from my management, that Enterprise Risk Management would become a focal point for my Information Security Program. I was convinced that framing vulnerabilities in the form of risks was essential to giving management visibility into issues they currently didn't know existed and to give our staff the comfort of knowing that the issues that caused them to lose sleep at night were now being considered for mitigation by management. I couldn't have been more right.
I began by collecting the risks submitted by each team in Excel spreadsheets and Word documents. They had all of the pertinent information like a subject, owner, risk assessment, etc, but very quickly I became a victim of my own success. Before I knew it, I had more risks than I could efficiently track in this format. First off, it was extremely cumbersome to try to manually maintain the risk index in Excel. While Excel is good at calculating formulas, it sucks at maintaining links to external documents. It can be done, but it requires quite a bit of manual effort to do so. Second, maintaining your risk details in Word documents is something they should reserve only for your worst enemies. They are difficult to update, difficult to track updates with, difficult to search and, well, just plain difficult. I thought to myself that there has to be a better way, yet, this is what the unfortunate majority out there are currently stuck with today.
After some research, it turns out that many years back, my company had another security professional who was interested in Enterprise Risk Management. Apparently, they had come to similar conclusions as I did with the Word documents and Excel spreadsheets, but they were able to get some internal development time to create a Lotus Notes based risk management database. It was everything that I needed, or so I thought, so I started to manually enter all of my new risks into this old risk management database. At first, things seemed to be working well. I had some different views into my data that would allow me to see way more information than I could before. I also had the ability for management of our various teams to be able to see their risks without involving me. It was much better, but soon I began to realize the limitations of this approach. The database itself was rigid. Changes required me to go through another internal team for resources and it often took a long time to make them. Also, any updates that were made didn't modify the current risks, only the ones submitted after that point. Once, I found myself opening and re-saving hundreds of risks just because I decided to change my risk calculation formula slightly. I began looking again for another way.
Soon, my new round of research brought me to a special set of tools called Governance, Risk, and Compliance or GRC for short. There are a number of such tools out there by well-resepcted companies such as EMC Archer and CA. They looked completely awesome and seemed to solve all of my problems with many more features to spare so I started to get some SWAG quotes from a few of the vendors. Low and behold, these tools hold a pricetag of $100k to half a million dollars and beyond. A request for budget for one of these tools was dismissed immediately with management literally laughing at my suggestion. OK, so maybe it was on me, right? Maybe I didn't do a good enough job of selling the tool? Maybe I didn't engage the right stakeholders to back my request? I guess you could call me a glutton for punishment, but I decided to keep trying. This time I gathered people I thought would be interested in risk from all different areas of our business for a demo of one of the tools. Trade Compliance, Health and Safety, Facilities, Legal, and many more. They watched the presentation, asked some fantastic questions, and ultimately left that meeting saying that they thought that a GRC solution was a fantastic idea. That was until I mentioned the price tag. If even with a budget split between half a dozen different teams, it wasn't going to happen, I knew that it simply wasn't going to happen.
As I began to think about the situation that I was in, I realized that I wasn't alone in all this. I talked with friends at various state agencies, friends at risk consultancies, and friends at companies large and small. They had gone through the same trials and tribulations that I had and fared no better for the most part. Having spent the better part of the last decade coding random applications and websites in PHP and MySQL, I decided that there may be something that I could do about it. I would go home from work and start coding until the wee hours of the morning. I would wake up early on my weekends and start coding again until the family awoke. After several weeks of this, I had a working prototype for a new risk management system based on some simplifications of the NIST 800-30 risk management framework and running on my LAMP (Linux Apache MySQL PHP) stack. SimpleRisk was born.
At the time of this writing, I have released 7 official versions of SimpleRisk since March of this year. It has come a long way since then, but still holds true to it's roots. SimpleRisk is free and open source. The methodology was designed to be as simple as possible, hence the name. A five step process walks you through the basics of risk management:
- Submit your risks
- Plan your mitigations
- Perform management reviews
- Prioritize for project planning
- Review regularly
It has every basic feature required of an enterprise risk management system and I'm adding new ones all the time. It has five different ways to weight classic risk calculations (ie. likelihood and impact) and can perform CVSS scoring as well. It has it's own built-in authentication system, but I've built an extra module to do LDAP authentication that I'm giving away to anyone who donates $500 or more to the cause. It also has a half-dozen different ways to report on the risks and many more reports should be complete soon. You can check out the demo (minus the Administrator interface) using the username "user" and password "user" at http://demo.simplerisk.org. Or, if you're ready to dive right in, you can obtain the download package for free at http://www.simplerisk.org.
In order to make your foray into SimpleRisk as simple as possible, I've created a SimpleRisk LAMP Installation Guide that you can use to have the tool up and running in about 30-60 minutes. And if all else fails and that proves too difficult or time consuming, then you should make your way to http://www.hostedrisk.com where for a fraction of what it would cost to buy a GRC solution, you will have your own dedicated SimpleRisk instance, running on hardware dedicated to you, built with security in mind, including extra modules not part of the standard distribution, and you'll never have to worry about installing or upgrading risk management software ever again. Hopefully you won't ever need this, but the option is always there in case you do.
My frustrations with a lack of efficient and cost-effective risk management tools led me to create one of my own. My hope is that by making SimpleRisk free and open source, it will benefit the rest of the security community as much as it has already benefited me. If you have any questions or requests for features that you would like to see included in the tool, I'm always here to help. SimpleRisk is simple, enterprise risk management, for the masses.
Let's say that you go to the same restaurant at least once a week for an entire year. The staff is always friendly, the menu always has something that sounds appealing, and the food is always good enough to keep you coming back for more. The only real drawback is that it usually takes a solid half-hour to get your food, but you've learned to find something else to do while you're waiting because it's always been worth the wait. Today you go into the same restaurant, but now the staff goes out of their way to service you, the menu has twice as much selection as before, the food is literally the best thing you've ever tasted, and it was on your table just the way you like it within 30 seconds of placing your order. This is my initial impression of the newly released version of 21CT's LYNXeon software (version 2.29).
I'll be honest. Before we upgraded to the new version I had mixed feelings. On one hand, I loved the data that the LYNXeon platform was giving me. The ability to comb through NetFlow data and find potentially malicious patterns in it was unlike any other security tool that I've experienced. On the other hand, the queries sometimes ran for half an hour or more before I had any results to analyze. I learned to save my queries for when I knew my computer would be sitting idle for a while. It was a burden that I was willing to undertake for the results, but a burden nonetheless. We upgraded to LYNXeon 2.29 less than a week ago, but already I can tell that this is a huge leap in the right direction for 21CT's flagship network pattern analysis software. Those same queries that used to take 30 minutes now take 30 seconds or less to complete. The reason being is a massive overhaul of the database layer of the platform. By switching to a grid-based, column-oriented, database structure for storing and querying data, the product was transformed from a pack mule into a thoroughbred.
Enhanced performance wasn't the only feature that found it's way into the 2.29 release. They also refactored the way that LYNXeon consumes data as well. While the old platform did a fairly good job of consuming NetFlow data, adding in other data sources to your analytics was a challenge to say the least; usually requiring custom integration work to make it happen. The new platform has added the concept of a connector with new data types and a framework around how to ingest these different types of data. It may still require some assistance from support in order to consume data types other than NetFlow, but it's nowhere near the level of effort it was before the upgrade. We were up and running with the new version of LYNXeon, consuming NetFlow, IPS alerts, and alerts from our FireEye malware prevention system, in a few hours. The system is capable of adding DNS queries, HTTP queries, and so much more. What this amounts to is that LYNXeon is now a flexible platform that can allow you to consume data from many different security tools and then visualize and correlate them in one place. Kinda like a SIEM, but actually useful.
As with any tool, I'm sure that LYNXeon 2.29 won't be without it's share of bugs, but overall the new platform is a huge improvement over the old and with what I've seen so far I gotta say that I'm impressed. 21CT is undoubtedly moving in the right direction and I'm excited to see what these guys do with the platform going forward. That's my first impression of the 21CT LYNXeon 2.29 release.
I recently had the opportunity to play with a data analytics platform called LYNXeon by a local company (Austin, TX) called 21CT. The LYNXeon tool is billed as a "Big Data Analytics" tool that can assist you in finding answers among the flood of data that comes from your network and security devices and it does a fantastic job of doing just that. What follows are some of my experiences in using this platform and some of the reasons that I think companies can benefit from the visualizations which it provides.
Where I work, data on security events is in silos all over the place. First, there's the various security event notification systems that my team owns. This consists primarily of our IPS system and our malware prevention system. Next, there are our anti-virus and end-point management systems which are owned by our desktop security team. There's also event and application logs from our various data center systems which are owned by various teams. Lastly, there's our network team who owns the firewalls, the routers, the switches, and the wireless access points. As you can imagine, when trying to reconstruct what happened as part of a security event, the data from each of these systems can play a significant role. Even more important is your ability to correlate the data across these siloed systems to get the complete picture. This is where log management typically comes to play.
Don't get me wrong. I think that log management is great when it comes to correlating the siloed data, but what if you don't know what you're looking for? How do you find a problem that you don't know exists? Enter the LYNXeon platform.
The base of the LYNXeon platform is flow data obtained from your various network device. Regardless of whether you use Juniper JFlow, Cisco NetFlow, or one of the other many flow data options, knowing the data that is going from one place to another is crucial to understanding your network and any events that take place on it. Flow data consists of the following:
- Source IP address
- Destination IP address
- IP protocol
- Source port
- Destination port
- IP type of service
Flow data also can contain information about the size of the data on your network.
The default configuration of LYNXeon basically allows you to visually (and textually) analyze this flow data for issues which is immediately useful. LYNXeon Analyst Studio comes with a bunch of pre-canned reporting which allows you to quickly sort through your flow data for interesting patterns. For example, once a system has been compromised, the next step for the attacker is often times data exfiltration. They want to get as much information out of the company as possible before they are identified and their access is squashed. LYNXeon provides you with a report to identify the top destinations in terms of data size for outbound connections. Some other extremely useful reporting that you can do with basic flow data in LYNXeon:
- Identify DNS queries to non-corporate DNS servers.
- Identify the use of protocols that are explicitly banned by corporate policy (P2P? IM?).
- Find inbound connection attempts from hostile countries.
- Find outbound connections via internal protocols (SNMP?).
It's not currently part of the default configuration of LYNXeon, but they have some very smart guys working there who can provide services around importing pretty much any data type you can think of into the visualizations as well. Think about the power of combining the data of what is talking to what along with information about anti-virus alerts, malware alerts, intrusion alerts, and so on. Now, not only do you know that there was an alert in your IPS system, but you can track every system that target talked with after the fact. Did it begin scanning the network for other hosts to compromise? Did it make a call back out to China? These questions and more can be answered with the visual correlation of events through the LYNXeon platform. This is something that I have never seen a SIEM or other log management company be able to accomplish.
LYNXeon probably isn't for everybody. While the interface itself is quite easy to use, it still requires a skilled security professional at the console to be able to analyze the data that is rendered. And while the built-in analytics help tremendously in finding the proverbial "needle in the haystack", it still takes a trained person to be able to interpret the results. But if your company has the expertise and the time to go about proactively finding problems, it is definitely worth looking into both from a network troubleshooting (something I really didn't cover) and security event management perspective.
I had a meeting yesterday with a vendor who sells a SaaS solution for binary application vulnerability testing. They tell a very interesting story of a world where dynamic testing ("black box") takes place alongside static testing ("white box") to give you a full picture of your application security posture. They even combine the results with some e-Learning aspects so that developers can research the vulnerabilities in the same place they go to find them. In concept, this sounds fantastic, but I quickly turned into a skeptic and as I dug deeper into the details I'm not sure I like what I found.
I wanted to make sure I fully understood what was going on under the hood here so I started asking questions about the static testing and how it works. They've got a nice looking portal where you name your application, give it a version, assign it to a group of developers, and point it to your compiled code (WAR, EAR, JAR, etc). Once you upload your binaries, their system basically runs a disassembler on it to get it into assembly code. It's then at this level that they start looking for vulnerabilities. They said that this process takes about 3 days initially and then maybe 2 days after the first time because they are able to re-use some data about your application. Once complete, they say they are able to provide you a report detailing your vulnerabilities and how to fix them.
The thing that immediately struck me as worth noting here was the 2-3 day turnaround. This means that our developers would need to wait a fairly substantial amount of time before getting any feedback on the vulnerability status of their code. In a world full of Agile development, 2-3 days is a lifetime. Compare that to static source code testing where you get actionable results at compile time. The edge here definitely goes to source code testing as I believe most people would prefer the near-instant gratification.
The next thing worth noting was that they are taking binary files and disassembling them in order to find vulnerabilities. This lends itself to one major issue which is how can you determine with any accuracy the line number of a particular vulnerability written in let's say Java from assembly code generated by disassembling the binaries. By default, it's simply not possible. This vendor claimed that they can by adding in some debug strings at compile time, but even then I'd contend that you're not going to get much. I'm guessing they have some heuristics that are able to tell what function generated a set of assembly code, but I'm extremely skeptical that they can do anything with variable names, custom code functions, etc. I've seen some source code scanners, on the other hand, that not only tell you what line of code is affected, but are able to give you an entire list of parameters that have been consequently affected by that vulnerability. The edge here definitely goes to source code testing.
The main benefit that I can see with binary testing vs source code testing is that we can test code that we didn't write. Things like APIs, third-party applications, open source, etc are all things that we now have visibility into. The only problem here is that while we now can see the vulnerabilities in this software, they are unfortunately all things that we can't directly influence change in, unless we want to send our developers off to work on somebody else's software. I'd argue that scanning for vulnerabilities in that type of code is their responsibility, not ours. Granted, it'd be nice to have validation that there aren't vulnerabilities there that we're exposing ourselves to by uptaking it, but in all honesty are we really going to take the time to scan somebody else's work? Probably not. The edge here goes to binary testing with the caveat being that it's in something that I frankly don't care as much about.
This isn't the complete list of pros and cons by any means. It's just me voicing in writing some concerns that I had about the technology while talking to this particular vendor. In my opinion, the benefits of doing source code testing far outweigh any benefits that we could get from testing compiled binary files. What do you think about the benefits of one versus the other? I'd certainly love for someone to try to change my mind here and show me where the real value lies in binary testing.
A year ago I wrote about Oracle's plan on how to combine BEA Weblogic and OAS. A long time went by before any more information appeared - we met with our Oracle reps last week to figure out what the deal is. The answer wasn't much more clear than it was way back last year. They do certainly want some kind of money to "upgrade" but it seems poorly thought through.
OAS came in various versions - Java, Standard, Standard One, Enterprise, and then the SOA Suite versions. The new BEA, now "Fusion Middleware 11g" comes in different versions as well.
- WLS Standard
- WLS Enterprise - adds clustering, costs double
- WLS Suite - adds Coherence, Enterprise Manager, and JRockit realtime, costs quadruple
But they can't tell us what OAS product maps to what FMW version.
There is also an oddly stripped down "Basic" edition which noted as being a free upgrade from OAS SE but it strips out a lot of JMS and WS stuff; there's an entire slide of stuff that gets stripped out and it's hard to say if this would be feasible for us.
As for SOA Suite, "We totally just don't know."
Come on Oracle, you've had a year to get this put together. It's pretty simple, there's not all that many older and newer products. I suspect they're being vague so they can feel out how much $$ they can get out of people for the upgrade. Hate to break it to you guys - the answer is $0. We didn't pay for OAS upgrades before this, we just paid you the generous 22% a year maintenance that got you your 51% profit margin this year. If you're retiring OAS for BEA in all but name, we expect to get the equivalent functionality for our continued 22%.
Oracle has two (well, three) clear to dos.
1. Figure out what BEA product bundles give functionality equivalent to old OAS bundles
2. Give those to support-paying customers
3. Profit. You're making plenty without trying to upcharge customers. Don't try it.
I've had a couple of discussions lately about customized Apache error pages that prompted me to do a little bit of research on it. What I've come up with is somewhat interesting so I thought I'd share it with everyone. First, it is not technically possible to tell Apache to serve up a different error page for image content than for html content than for php content since the only command Apache accepts for this is of the "ErrorDocument error-code document" format. That said, if you allow .htaccess overrides on a particular directory, then you can specify your ErrorDocument directive in there as well; overriding the default error handling specified in the httpd.conf file. An example:
In my httpd.conf file I have all 404's going to errorpage.cgi with the following line:
ErrorDocument 404 /cgi-bin/errorpage.cgi
I'm a good little code monkey and put all of my images in a /images directory under the DirectoryRoot. By default, if I were to hit a non-existent image in that directory, I would get the default error message defined in the httpd.conf file. If that image were referenced in an html page that I hit, I now download the html page plus the errorpage.cgi page for the bad image reference, introducing one whole page's worth of additional overhead.
But since I was a good code monkey and put all of my images in a /images directory, the fix for this is really simple. I create a .htaccess file inside of my /images directory and add the following line to it:
ErrorDocument 404 "404 - Image does not exist <-- Note: No end quote is intentional
Now, if I hit http://www.mysite.com/badpage.html I get the errorpage.cgi page, but if I hit http://www.mysite.com/images/badimage.jpg I get a short and sweet message saying "404 - Image does not exist". I haven't tested this yet to see how it works when you are using something like mod_oc4j to send certain URLs to an application server, but it's possible that this could work there too if Apache checks for existing static URLs before passing requests to the app server. Further testing could be useful there.
So there you have it. I can't tell Apache to serve up different error pages based on the URL or file type, but if I'm diligent about putting different files under different directories, I can effectively do the same thing using .htaccess files. Woot!
The 1.0 release of Google Chrome has everyone abuzz. Here at NI, loads of people are adopting it. Shortly after it went gold, we started to hear from users that they were having problems with our internal collaboration solution, based on the Atlassian Confluence wiki product. They'd hit a page and get a terse error, which if you clicked on "More Details" you got the slightly more helpful, or at least Googleable, string "Error 320 (net::ERR_INVALID_RESPONSE): Unknown error."
At first, it seemed like if people reloaded or cleared cache the problem went away. It turned out this wasn't true - we have two load balanced servers in a cluster serving this site. One server worked in Chrome and the other didn't; reloading or otherwise breaking persistence just got you the working server for a time. But both servers worked perfectly in IE and Firefox (every version we have lying around).
So we started researching. Both servers were as identical as we could make them. Was it a Confluence bug? No, we have phpBB on both servers and it showed the same behavior - so it looked like an Apache level problem.
Sure enough, I looked in the logs. The error didn't generate an Apache error, it was still considered a 200 OK response, but when I compared the log strings the box that Chrome was erroring on showed that the cookie wasn't being passed up; that field was blank (it was populated with the cookie value on the other box and on both boxes when hit in IE/Firefox). Both boxes had an identically compiled Apache 2.0.61. I diffed all the config files- except for boxname and IP, no difference. The problem persisted for more than a week.
We did a graceful Apache restart for kicks - no effect. Desperate, we did a full Apache stop/start - and the problem disappeared! Not sure for how long. If it recurs, I'll take a packet trace and see if Chrome is just not sending the cookie, or sending it partially, or sending it and it's Apache jacking up... But it's strange there would be an Apache-end problem that only Chrome would experience.
I see a number of posts out there in the wide world about this issue; people have seen this Chrome behavior in YouTube, Lycos, etc. Mostly they think that reloading/clearing cache fixes it but I suspect that those services also have large load balanced clusters, and by luck of the draw they're just getting a "good" one.
Any other server admins out there having Chrome issues, and can confirm this? I'd be real interested in knowing what Web servers/versions it's affecting. And a packet trace of a "bad" hit would probably show the root cause. I suspect for some reason Chrome is partially sending the cookie or whatnot, choking the hit.
I've been really surprised that for as long as I've been active with OWASP, I've never seen a proxy presentation. After all, they are hugely beneficial in doing web application penetration testing and they're really not that difficult to use. Take TamperData for example. It's just a firefox plugin, but it does header, cookie, get, and post manipulation just as well as WebScarab. Or Google Ratproxy, which works in the background while you browse around QA'ing your web site and gives you a nice actionable report when you're done. I decided it was time to educate my peers on the awesomeness of proxies.
This past Tuesday I presented to a crowd of about 35 people at the Austin OWASP Meeting. The title of my presentation was "Using Proxies to Secure Applications and More". Since so many people came up to me afterward telling me what a great presentation it was and how they learned something they can take back to the office, I decided (with a little insistance from Ernest) that it was worth putting up on SlideShare and posting to the Web Admin Blog.
The presentation starts off with a brief description of what a proxy is. Then, I talked about the different types of proxies. Then, the bulk of the presentation was just me giving examples and demonstrating the various proxies. I included anonymizing proxies, reverse proxies, and intercepting proxies. While my slides can't substitue for the actual demo, I did try to include in them what tool I used for the demo. If you have any specific questions, please let me know. All that said, here's the presentation.