The 2014 Verizon Data Breach Investigation Report (DBIR) is out and it paints quite the gloomy picture of the world we live in today where cyber security is concerned.  With over 63,000 security incidents and 1,367 confirmed data breaches, the question is no longer if you get popped, but rather, when.  According to the report, data export is second only to credit card theft on the list of threat actions as a result of a breach.  And with the time to compromise typically measured in days and time to discovery measured in weeks or months, Houston, we have a problem.

I’ve written in the past about all of the cool tricks we’ve been doing to find malware and other security issues by performing NetFlow analysis using the 21CT LYNXeon tool and this time I’ve found another trick around data loss detection that I thought was worth writing about.  Before I get into the trick, let’s quickly recap NetFlow for those who aren’t familiar with it.

Think of NetFlow as the cliff notes of all of the network traffic that your systems handle on a daily basis.  Instead of seeing WHAT data was transmitted (a task for deep packet inspection/DPI), we see the summary of HOW the data was transmitted.  Things like source and destination IP, source and destination port, protocol, and bytes sent and received.  Because many network devices are capable of giving you this information for free, it only makes sense to capture it and start using it for security analytics.

So, now we have our NetFlow and we know that we’re going to be breached eventually, the real question becomes how to detect it quickly and remediate before a significant data loss occurs.  Our LYNXeon tool allows us to create patterns of what to look for within NetFlow and other data sources.  So, to help detect for data loss, I’ve designed the following analytic:

LYNXeon Analytics for Data Loss

What this analytic does is it searches our NetFlow for any time an internal IP address is talking to an external IP address.  Then, it adds up the bytes sent for each of these unique sets of connections (same source, destination, and port) and presents me with a top 25 list.  Something like this:

Top 25 List

So, now we have a list of the top 25 source and destination pairs that are sending data outside of our organization.  There are also some interesting ports in this list like 12547, 22 (SSH), 443 (HTTPS), and 29234.  A system with 38.48 GB worth of data sent to a remote server seems like a bad sign and something that should be investigated.  You get the idea.  It’s just a matter of analyzing the data and separating out what is typical vs what isn’t and then digging deeper into those.

My advice is to run this report on an automated schedule at least daily so that you can quickly detect when data loss has begun in order to squash it at the source.  You could probably argue that an attacker might take a low and slow approach to remain undetected by my report, and you’d probably be right, but I’d also argue that if this were the case, then I’ve hopefully slowed them enough to catch them another way within a reasonable timespan.  Remember, security is all about defense in depth and with the many significant issues that are highlighted by the Verizon DBIR, we could use all of the defense we can muster.