Cyber-attacks are getting  larger, faster and more diverse
Cyber-attacks are getting larger, faster and more diverse

It therefore makes sense that security information is increasingly being handled by ‘big data' systems, which can pick up on attacks in real time, in some cases even before they happen.

Big data techniques in security are similar to those used in wider information gathering, for purposes such as marketing. Security data is collected and analysed before being converted into something meaningful that can be interpreted by the business.

And this is only the start. Taking the idea one step further, the systems themselves are becoming ‘intelligent', able to make decisions based on the information they are analysing. Known as ‘machine learning' and ‘artificial intelligence' (AI), techniques are being developed that make use of big data to spot anomalous behaviour.

 Trained to identify ‘unusual' events on the network either externally or by insiders, the intelligent algorithms are far more capable than any human could be – and crucially they are able to deal with vast amounts of security information. 

Security data is accumulating at such a fast rate that it's not human readable, says Andrew Rogoyski, VP of cyber-security services at CGI. This paves the way for big data, he says, which is about “manipulating this large amount of information and turning it into something useful”.

Andrew Rogoyski

In machine learning, the system is then able to alert a human who can take action. According to Dr Nick Kotsis, business intelligence expert, PA Consulting Group: “This happens via pattern cognition allowing the system to discriminate between a typical action and an abnormality.”

For example, he says, an unknown user might try to access a company system. A machine learning system will pick this out and raise it as an exception to the norm because it has been “trained” to separate suspicious behaviour from normal activity. “So, if you make a prediction on what might happen if a series of events come together, when the machine assumes something is wrong, it will notify someone,” says Kotsis.

Insiders are a growing threat that can be dealt with by the information gathered by big data systems. The market is now moving from malware behaviour towards people inside organisations and how they act on a day-to-day basis, says Pete Shoard, chief architect at SecureData.  “We are now looking at mechanisms to find the insider.”

In order to do this, he says, systems are programmed to spot certain behaviours: “We examine, for example, how many pages someone printed – how humans interact with a system.”

Big data risk

Intelligent techniques have a bright future, but there can be issues with exploiting big data for security purposes. For example, says Scott Crawford, research director at 451 Research, the collection of high volumes of information increases the risk that an organisation is collecting ‘toxic' data that could potentially expose it. 

“It's not uncommon for security teams to find that, in addition to monitoring data, their systems have just incidentally collected confidential or personal information, that which is subject to regulatory control, or something that could expose them to regulatory non-compliance – even a data breach. Multiply that risk by the volume of data being added to security monitoring, and organisations could be facing a significant problem.”

Then there is the question of who owns the data being processed by the systems. Crawford says: “The question of whose data it is depends on the nature of the information and the organisation.”

Information collected from enterprise IT systems and employee activity is considered the organisation's data, he explains.

But at the same time, the personal information of employees must be protected. This could raise concerns, says Crawford. However, he adds: “Most of the techniques coming to market minimise these risks by looking primarily at activity in IT systems and end points that could indicate a threat.”

Even so, it is possible firms could inadvertently collect data which could pose a risk if exposed. There are risks in having large volumes of data that are aggregated from different sources, Rogoyski explains. “You are pulling out inferences and data protection law is tightening up – and that poses problems when complex data sets are coming from multiple sources.”