One petabyte of sensitive data exposed online in big data security gaff

Organisations failing to protect information within Big Data projects

One petabyte of sensitive data exposed online in big data security gaff
One petabyte of sensitive data exposed online in big data security gaff

Poorly configured Big Data applications are potentially leaking over one petabyte of data according to a new report.

The research, carried out by Swiss security firm BinaryEdge, found that over 35,000 instances of Redis cache and store archives could be accessed without authentication. It also discovered that over 39,000 MongoDB NoSQL databases is also unprotected.

In a blog post, the firm said that over 118,000 instances of the Memcached general-purpose distributed memory caching system were easily accessible from the internet and leaking data.

Finally, nearly 9000 ElacticSearch servers, which is search server technology based on Lucene, were found by BinaryEdge probes. It said a number of these servers were version 1.4.3 or older and therefore “vulnerable to CVE-2015-1427, which allows for attackers to use the API to gain remote code execution in these platforms”.

In total 1,175 Terabytes (or 1.1 Petabytes) of data was found exposed online by the security firm.

"Versions installed are quite often old and not updated, which means that, in some cases, not only is data exposed but even servers can be compromised,” the firm said. "Companies are still figuring out how to use these technologies and by default they are not secure.”

It said that the insecurely configured servers range from small companies to large top 500 companies.  "Some of these technologies are used as cache servers, so its data is always changing and a multitude of client/company data can be looked at, for example, auth sessions information," the company said.

The firm made clear that specific company data or confidential data was collected by its probes, only statistical information for each technology. 

“No data from this dataset will be made public. We are in the process of setting up a automated system that will alert companies of open technologies in their networks,” it said.

Jason du Preez, chief executive of Privita, told SC Magazine UK that this is a significant and escalating problem.

“With easy, cheap access to significant computing power, its really not that hard to find valuable or sensitive information in what might have historically seemed like enormous amounts of data,” he said.

Du Preez added that developers and data scientists are exploring next-gen open source tools for algorithm and analytics development – focusing on the big data problem.  The reality is that most often they are not focused on information security concerns.

“With these new, open-source software tools, powered by unprecedented access to cloud computing, it is clear that we need new approaches to protecting sensitive data sets,” said du Preez. “There's no excuse for poor perimeter security and best practice should be followed here – but organisations should also be taking a privacy-by-default approach to data management.  This is best designed in to architectures from the outset.”

Peter Shoard, SecureData's chief architect, told SC Magazine UK that most providers use big data solutions to create highly valuable ‘subsets' or ‘metrics' which summarise the wider dataset on a scheduled basis. “Stealing the right data sets could be hugely valuable, and small malicious changes to these subsets could have astronomic consequences, depending on how reliant a business is the data (or the output) to manage BAU operations.”

Catalin Cosoi, chief security strategist at Bitdefender, told SC Magazine UK that Big data projects should be built with a data-centric approach in mind. “They should protect sensitive data end-to-end, through proper encryption and provide security for all network communications.”

One reason why companies are disregarding security when building their products is that they want to cook up apps quickly and look for immediate return of investment. Another problem lies in the development cycle –requirements change often, so we're stuck on a patch-go-round that never ends,” said Cosoi.

David Gibson, VP of Strategy and Market Development at Varonis, told SC Magazine UK that one of the problems with big data, and unstructured data in general, is that it grows so quickly and, if uncontrolled, just as quickly becomes a big problem for management and security.

“Before you know it, you've got petabytes of data that you know very little about – what does it contain? Is it sensitive? Is it active? Who uses it? Who does it belong to? Is it locked down correctly? Organisations already have a hard time protecting unstructured data inside their environments from being stolen by both insiders and outside attackers that get in,” he said.

“We certainly don't need to make it easier for attackers by exposing petabytes of information on the internet without so much as a password.”

Dr. Steven Furnell, senior member of the IEEE, told SC Magazine UK that security is probably an afterthought in these projects for the same reason it has historically been in many others.

“If it isn't seen as a main aim of what the project is trying to achieve then it risks being left on the sidelines until the main functionality is delivered.  As with our adoption of any other technologies, it requires someone to recognise security and give it appropriate priority from the outset, because successive past lessons should have taught us that it doesn't work as well as a bolt-on,” he added.

Sign up to our newsletters