The real challenges of big data

Malware hits the Mac but is it worth worrying about?
Malware hits the Mac but is it worth worrying about?


Earlier this year we looked at big data and asked if it was marketing speak or a genuine concern.

Did I reach a conclusion? Judge for yourself, but arguably I did not; however, the subject has persisted in research and opinion since then, so I decided to take another look at some of the thoughts around this topic.

Of the vendors who have their heads screwed on when it comes to this topic, Splunk came to me and talked about what their customers are saying. The winner of the 2012 SC Magazine award for best enterprise security solution, Splunk have suddenly found a niche with the big data trend; DJ Skillman, director of technical services EMEA, said that big data is not about numerous volumes of data, but about "a huge amount of data and not knowing how to deal with it".

He said: “This is not about one subset of data that you need to look at, but all subsets of intelligence, and users find a problem as they look at logs but often look at other things. So then they bring up scripts and other things that are living in other systems and collecting other data; they do not know where the next threat is coming from.

“There is a lot of collaboration, but the problem of big data is in the massive set of data and the complexity that it ejected into engines. With the multiple web properties, we are now looking at tens of thousands of questions being asked. Years ago it was easy to be an email server, but now Exchange can get in via BlackBerry, iPhone, an outlook web client or a full client, so how we get email has got more complex.

“There is so much more information. Some clients are doing terabytes of data a day and it now matters how smart you or your technology is, there is not enough horsepower to deal with it. It is not just asking the obvious questions, especially when it comes to security.”

I asked Skillman if he thought big data was a security problem or one of data management. He said that he felt it was a bit of both, as a lot of data has always existed, and adding in virtualisation makes it more complex.

“Many people will have a hundred servers doing some jobs, but which level are they on? We have found more data in thousands of global services, if you have IT then you know data exists somewhere,” he said.

Speaking on customers, he admitted that feedback is often one of finding that they have a problem but not realising it. “There is a question to ask to help you through the known technologies; what you should be looking for is the underlying raw data as you like the ability to see something that has peaked interest when you do not know what has happened,” he said.

Last month, Michael Reagan, chief marketing officer at LogRhythm, said that the problem of big data comes from the fact that so much of the data collected is seen as benign.

He said that "big data is a big phrase" as most of its customers are generating 10,000 logs a day, but with a nebulas concept they can know where the vulnerabilities are and where they are coming from, and reading logs and assessing big data can help to eliminate the blind spots.  

Kim Singletary, director of technical solution marketing at McAfee, said in a blog posting that "big data holds a lot of promise – from the potential to change business models to the ability to rapidly refine services and goods that traditionally took years of industry speculation". So no sitting on the fence there then, but Singletary claimed that from a security perspective, more connections must be allowed to flow into the organisation with devices feeding 'in near real time' to centralised data repositories that can be accessed by analysts.

Conrad Constantine, research team engineer at AlienVault, said the reason everyone is so hyped over big data is "possibly because people are now realising the power of big data".

He said that with log data seen as an incredibly rich source of information for detecting security intrusions, a taste has been developed for more and more logs, and this has led to log correlation, with individual log entries being placed into context against one another to illustrate more than just system-level events.

He said: “Vast databases of information being mined for emergent patterns and used to process simulations over and over are hardly new to the world – the finance, medical and aerospace industries have spent years in this realm. How is it, then, that the security world has not previously tapped into this pool of expertise before now to help us glean the knowledge lying dormant within our vast supplies of data? Quite simply, it's because we still don't know what questions to ask in the first place.

“In security analytics, it's often the relations between the data, not the data itself, that is important. Just as detective work is a matter of connecting the dots, so are the relations between our data points for the true information.”

So often with perspective comes some research. A survey of UK CIOs at 5,000 companies by Virgin Media Business found that they are facing up to 80 per cent rises in the amount of big data they handle each year, with almost half of them (42 per cent) expecting big data to increase by 50 per cent.

Also, research of 180 IT security professionals by Varonis found that more than two-thirds of respondents thought that big data should be a strategic priority, with more than half expecting big data to be a strategic initiative over the next five years.

However, less than half of the respondents felt there was a clear definition of big data, and even fewer felt they had adequate knowledge of big data products. When asked how they would like to use big data, the top three choices were: finding risk-sensitive data; identifying possible malicious activity; and finding users with excessive access rights.

David Gibson, vice-president of strategy at Varonis, said: “This survey validates what Varonis experiences with the organisations we engage with daily – IT is looking for practical big data solutions for data management and protection. With the explosion of data and the demand for rapid, ubiquitous digital collaboration, IT knows traditional data management methods can no longer keep pace, so they are looking for advanced solutions to protect their data.

“The key for IT with big data is to get past all the hype and to learn more about the practical benefits, like finding exposed sensitive data, flagging malicious activity and identifying excessive access.”

Responding to the Varonis research, Philip Howard, research director at Bloor, said in a blog that the survey of IT security professionals who visited the Varonis booth at Infosecurity Europe 2012 gave an expected answer, but said he was surprised that almost 60 per cent of respondents felt that "there was a clear definition of big data and its uses for IT"; he was less surprised that almost three-quarters rated themselves at five or lower on a scale of one to ten when it came to "awareness of and visibility into the big data products currently in the market".

He said: “Given that all the hype is around Hadoop with barely a nod to Cassandra and MongoDB (for example) and no mention at all of HPCC or of graph databases, then I expect that even those who think they are fully aware of the big data market (a little over five per cent) are over-estimating what they think they know.

“When asked whether big data should be a key strategic priority for IT, 69 per cent agreed with this statement and the more people thought they understood big data the more likely they were to agree with it, which is encouraging. However, the caveat is about big data being strategic for IT. When you see a question like this do you answer on the basis that IT supports the business – and if social media analysis is important for the business then it's important for IT – or are people just thinking about IT functions like security? Without getting into people's heads I don't know the answer to that.”

Like many areas that are thriving in this sector, people are getting down to understand exactly what big data is, how it is affecting them (if at all) and what they can do to manage it.

I suspect that this is a case of reviewing how you manage log data and reconsidering that on a larger scale, and looking at external use of mobile devices and cloud services. Then you will have a clearer view of where the data is and how it can be better centralised for easier management. Easy, I say? Surely nothing is easy! Well, no, but more straightforward perhaps. Now let's get on to the next debate: is it big data or Big Data?


Sign up to our newsletters