Is Hybrid AI the future of cyber-security?

The future of cyber-security looks part human and part machine, according to MIT's Computer Science and Artificial Intelligence Laboratory but what does the broader industry think?

When people fail, could machines do it better?
When people fail, could machines do it better?

According to researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), working with machine-learning startup, PatternEx, the future of cyber-security could be part-human and part-bot.

A newly published paper from MIT suggests that the prototype AI2 system it has developed combines machine learning with human analysis to end up with an 85 per cent successful cyber-attack prediction rate.

The MIT researchers maintain that analyst-driven security systems miss too many attacks as they rely upon humans to create rules that have to be matched. On the other hand, machine-learning solutions rely upon anomaly-detection which is prone to triggering false positives and so lead to mistrust. The hybrid AI approach of mixing human intuition with machine-learning is, we are told, a much better way forward.

AI2 was tested on 3.6 billion pieces of log-line data, generated by millions of users over a three month period. By combing through this data and detecting meaningful patterns that suggest suspicious activity, all through an unsupervised machine-learning process, humans only need to be involved when presented with the suspicious data. That human analyst feedback is then fed into the models for the next set of data to be looked at, and so on.

By continuously generating new models in this way, the researchers reckon it can refine the process in a matter of hours; improving detection rates significantly and rapidly. So is Hybrid-AI, a virtual analyst if you will, the future of cyber-security?

The first question that needs to be addressed is whether machine learning has far to go before it can be fully up to spec when it comes to cyber-security solutions. Dan Palmer, head of manufacturing at the British Standards Institution (BSI) doesn't think it's a question of whether machine learning is up to spec, but rather one of maintaining a lead over the various nefarious actors in cyber-space and ensuring that our tools to combat cyber threats are fit for purpose.

"It's worth noting that nation states, cyber-criminals and others have consistently innovated and will use machine learning to improve their techniques" Palmer, told SCMagazineUK.com, "while autonomous systems are in their infancy it is important that those involved in their development pay close attention to the issues of ethics. BSI has led the way and this month has published BS 8611 uniting expert views into a clear standard on the consideration of ethical hazards of robots."

Adrian Sanabria, senior security analyst at 451 Research isn't convinced by machine learning it seems. "Machine learning isn't the problem here", said Sanabria, "advances to machine learning will likely only help in small increments. The bigger problem is the quality and understanding of the data we're pulling in."

Sanabria explained to SC that there's a lot of rethinking we need to do on the front end that will greatly reduce the effort and expense needed to clean things up on the back end with machine learning and analysts. "We need to go back and revisit the approaches and assumptions used when creating the raw log, event and alert data that feeds into security systems", he told SC, "the percentage of this data that is actually useful for security is a minuscule fraction of what we're having to comb through. Higher quality data coming in can only make it faster and easier to advance the use of AI in cyber-security."

Hal Lonas, chief technical officer at Webroot, certainly sees a value in machine learning and that's that "threat researchers are free to get on with identifying new and complex threats, and don't spend time working on the ‘easy' classifications that the machine can do." This means that they are engaged by new challenges, which helps to attract and retain new talent, but it also means organisations are able to protect themselves against more threats earlier in the threat's lifetime according to Lonas.

Gareth Grindal, head of analysis at context information security, admits it is difficult to say how long until machine learning is something we would fully trust as security professionals to monitor and protect networks and data. "We have two separate disciplines now converging on a common problem", he told SC, "the volume of data that can be consumed and processed for monitoring networks naturally lends itself to systems that can be leveraged to process this data. It is an important area of work, and while I don't think it is something that can replace an analyst, it is certainly a powerful mechanism for helping to identify suspicious activity that requires further investigation."

The issue around trust and the creation of more false positives is a risk to the adoption of any unsupervised machine-learning solution, we simply add more data that requires additional analysis to the workload.  So what are the strengths and weaknesses of a hybrid approach, combining the strengths of machine learning with the expertise of human analysts?

Steven Allen, Senior Security Consultant at Capgemini, worries that "machine learning can sift through a lot of the noise and analysts can look for the needle in the haystack, but even by combining the strengths of both, you still have many of the weaknesses of both." He see the hybrid solution as more 'security in depth' than two complementary halves that simply address each other's shortfalls.

Michael Fimin, CEO of Netwrix, is sure that it's not a good idea to leave AI solutions running completely unattended, telling SC that "unless machine learning is supervised we will not be able to pick up on mistakes in an algorithm or understand why the machine has taught itself to respond in a particular way."

Itsik Mantin, director of security research at Imperva, agrees that the 'accuracy paradox' whereby you can have 99.99 percent accuracy and still the 0.01 percent of mistakes can be translated into thousands of security alerts, which security administrators simply cannot afford to investigate necessitates human interaction. "Human-aid is essential for anomaly detection algorithms to provide feedback to the algorithm for the correctness of the prediction and to specify override instructions", Mantin insists, "the user interface becomes much more than the pretty face of the algorithm, it has a significant role in ongoing improvement of the algorithm accuracy, both by giving the user the context and the tools to be effective in analysing alerts and by giving the means to provide algorithm feedback."

Garry Sidaway, SVP of security strategy at NTT Com Security, argues that we've seen great strides in machine learning recently in information and risk management driven off the ability to analyse, and hence learn from, huge amounts of data. "Because of this we can split data sets to train the machine and reduce the number of false positives", he told SC, "which is essential when adding this capability to an managed security services offering." For now, having experts leverage the reduction in data noise and then target further analysis and advanced correlation algorithms to eliminate false positives, will remain the focus it seems.

But where will that focus be in years to come? What role will there be for AI in cyber-security going forward? Arik Paran is the head of algorithms at CyberArk and he's cautiously optimistic about the future of AI. "Existing analytical methods and algorithms in Machine Learning fit into cyber-security very naturally", he added, "it is reasonable to say that Machine Learning has only done its very first steps in cyber-security, and progress beyond continuous improvements in detection and profiling capabilities would probably take place when new systems will change the way analysts work with them."

He also expects to see security software vendors developing stronger ways of combining insights from analysing the data gathered from different customers to produce a more complete and immediate understanding of the evolving threats. "Finally we would expect to see improvements in how easy it is to use security solutions, where more and more forensics tasks will be automated" Paran concludes "and small findings will be jointly analysed (correlated) and served to the analyst in such ways that the big picture insights will be immediate."      

Ryan Permeh, Founder and Chief Scientist at Cylance is totally positive about the future, telling SC, "AI and machine learning are absolutely imperative not just in security, but in addressing many of our modern challenges. The reality is that we have exponentially growing data and ever-shortening windows of time to make value from that stream."

A great stock pick a day late has diminished to no value, Permeh explains, and predicting it will rain yesterday because you had too much data to process has no value either. "In security", he continues, "getting the answer later may be better than never getting it, but getting it before the event happens (actual prediction) is where the most real value is produced."

The trouble is that humans have a hard time scaling with the volume and complexity of data available today and that trend will not slow down according to Permeh. "They have individual biases and high variance in how they act", he said, "not just from person to person, but the same person day to day. The number of qualified people who can operate at an expert class level is growing, but never at the scale of demand."

So, as patterns shift (and security requires that they do, as this is an adversarial relationship), only computers can keep up. "Bad guys generally don't send memos when they bypass security systems", Permeh concludes, "we just need to ensure that we continue to measure and focus growth towards solving the hard problems that offer high value." We also need to remember that security is a risk-management situation. You gain benefit by measuring your spend against real, measured reduction in risk. "ML, AI, and math-based approaches can be a good way to show real-world value being produced", Permeh insists, "if they are done right..."