Auditors Just Don’t Understand Security
Part of my new role as the Information Security Program Owner at NI is taking care of our regulatory compliance concerns which means I spend quite a bit of time dealing with auditors. Now auditors are nice people and I want to preface what I'll say next by saying that I think auditors do perform a great service to companies. I'm sure that most of them are hard-workers and understand compliance requirements probably better than I do, but they just don't understand security.
As a case in point, we're in the middle of our annual audit by one of those "Big Four" audit firms which I won't name here to protect the innocent. I sent an email checking in with our auditors to make sure that they had everything they needed before we went into our four-day holiday weekend. They said that they had received everything they needed except for documentation on "privileged users from the current OS and Database environments" as well as "evidence of current password settings from the application servers, OS, and Database". We go through a round of translation from Auditorese to Techie and figure out that they want exports of some specific user, profile, role, and privilege tables from the database and copies of /etc/passwd, /etc/shadow, and /etc group from the servers.
So we obtain the requested documentation and I shoot them back an email message to find out their proposed method for transferring the files. Secure FTP? No. PGP encryption? Nope. Their response back was astonishing:
How large do you think they'll be? Email should be fine.
Seriously? These are the guys that we're paying to verify that we're properly protecting our systems and they're suggesting that sending our usernames and password hashes via cleartext email is an appropriate method of file transfer. I respond back:
I'm not really concerned about the size of the files, but rather, the data that they contain. Sending files containing the users, groups, and password hashes for our financial systems via cleartext is probably not a good plan considering the point of this process is protecting that data.
And they respond with:
Whatever you'd like Josh. As long as you have the files as of today, we're good.
So now I'm convinced that auditors (or at least these auditors) view security as nothing more than a checklist. The people telling me what I need to do in order to protect my systems really have no clue about the fundamentals of security. If it's not on their checklist, then it must not be of importance. In this particular situation it may be easier or more convenient to send the documents via email, but any security professional worth their salt would tell you that's not secure nor appropriate for that data. Either our auditors hold themselves to a very different standard than the rest of us security professionals or they just don't understand security unless it's on a checklist.
Simplifying On-call Through Alert Aggregation
One of the coolest things about working on the Web Systems Team at National Instruments is that the company has invested in a wide variety of tools to assist us with our jobs. Since we are responsible for the availability of ni.com, we have the standard URL and content monitors (Sitescope and Nagios). We also have the ability to do real user monitoring with a tool called Coradiant TrueSight. We are also responsible for the website's performance so we have purchased tools like Panorama to diagnose code level issues. We have Splunk for log monitoring and Gomez for a third-party performance and availability monitor. We even have a SaaS provider that does application security scanning. Having all of these tools at our disposal is quite awesome and allows us to quickly find and fix issues with the site. The problem is that every single one of those tools has it's own alerting and reporting interface.
This isn't a new problem by any means. I've seen this issue at every job that I've ever had where the responsibilities included operational support. You rely on multiple tools to tell you when things aren't going quite right, but now you end up spending some non-zero portion of your time managing those tools. For example, lets say that your company has a small release that lasts a few hours once a month. You now have to log in to the control panel (GUI) for each one of those tools and disable your alerts for that time period so that your on-call device isn't going crazy. Assume that you have only four alerting tools and it takes you approximately 5 minutes to log in to each, set the maintenance window, and log back out. You just spent 20 minutes to disable alerts! Now you're getting to the end of the release and things didn't go as planned so the release is running longer than expected. Now you have to spend another 20 minutes to extend the maintenance window. How frustrating is that?
The issue gets even more complicated when you have multiple people providing support in either an on-call rotation or follow-the-sun type of scenario. At NI, we have an operations team that handles alerts during normal business hours, an on-call admin who handles alerts from 5 PM to 2 AM, and then a super-awesome Hungarian Web Admin who takes over responding to pages after 2 AM (9 AM in Hungary). Most of the alerting configurations that these tools provide aren't even able to handle this type of scenario, but let's suppose they did. You're still stuck logging into multiple systems every time there's a holiday, somebody goes on vacation, etc. And what happens if you don't have a dedicated on-call device to pass from person to person? Then you're stuck updating the alert configurations every time the on-call person changes in your rotation.
This really got me thinking that there has to be a better way to do things. I searched the internet looking for a solution, but when I couldn't find anything to do exactly what I wanted it to do, I ended up writing my own. It's now my pleasure to share with you iAlertYou. The idea is quite simple. You take all of those different tools that send alerts and you aggregate them in the same place. In this case, it's on ialertyou.com. By doing this, you gain the ability to control everything from a single, centralized, management platform. Have a maintenance window? No problem. Just log in, set it once, and it affects all of your alerts. Same thing for both alert scheduling (who should get pages and when) and contact groups (used for on-call rotations). Plus, by having all of your alerts going through a single aggregation point, it means that we can also do reporting on all of your alerts. Ever wondered how many of your alerts come from what tools? What times of the day you get the most alerts? It's all possible through alert aggregation.
Certainly there are drawbacks to this type of scenario. Most importantly, you're introducing another dependency in what is typically a mission critical activity. While I can't eliminate this concern completely, I built the system on top of internet cloud technologies for superior scalability. I've architected the application using best-practices in availability, performance, security, and usability. Currently, the only offering is a $30/month "everything" plan, but if you spend more than 10-20 minutes a month changing alert configurations, the ROI is realized very quickly. I will also be rolling out a "free" plan (thanks to Peco) with a limited subset of the functionality. I'd like to invite you to check out http://www.ialertyou.com and see if it can help your company simplify on-call through alert aggregation.