Securing Hadoop - inflating the life rafts for big data lakes

New platforms such as Hadoop are pushing IT professionals to find innovative solutions to ensure data security, says Greg Hanson.

Greg Hanson
Greg Hanson

Big data has been a buzzword in the industry for years, but we are starting to see hard evidence of both enterprises that have been successful with technologies, such as Hadoop, and those that are facing challenges. Businesses are using Hadoop for a huge range of exploratory analytics such as energy exploration, fraud detection, predictive maintenance, manufacturing processes, security and other large-scale systems initiatives. One of the distinguishing elements of successful enterprises is the holistic approach with which they have approached big data management, governance and security.

Hadoop is an open source software framework for storing and processing large volumes of distributed data. It provides a set of instructions that organises and processes data on many servers rather than from a centralised management nexus.

Hadoop has fast made its way from shiny newness to enterprise staple, underpinning big data projects for a myriad of industries. Global data market revenues are expected to hit $115 billion (£75 billion) by 2019, according 451 Research, driving rapid adoption. There is clear value to using technologies, such as Hadoop, to build next generation “data lakes” for collecting, preparing and analysing greater volumes and types of data. Enterprises are augmenting their traditional data warehousing architectures to include Hadoop both as a more efficient and scalable preparation stage and to offload less frequently used data for easily access archives.

However as more businesses dive into data lakes for collecting, preparing and analysing greater volumes and types of data, IT is working out how to inflate the life rafts for this new approach. Compliance-sensitive industries, such as healthcare or financial services, or any other consumer-driven industry, such as retail or CPG, are legally obligated to ensure strict controls on the use of data. These controls must also apply to new data platforms, such as Hadoop.

With updates to the EU Data Protection Act looming, the risks of data breaches for enterprises, and indeed data service providers, are very clear. Most conservatively, data breaches lead to a loss in customer trust and revenue shortfalls from consumer churn. More drastically, we hear of executives facing fines or incarceration for failing to meet government mandated compliance policies for the collection and access of sensitive consumer information. The risks of data breaches are all the more complicated by a trend toward more autonomous and self-service access to data by an increasingly information-driven workforce. In a world where data is becoming progressively more strategic to the enterprise, organisations have the opportunity to either manage data as an asset or face the risk of it becoming a liability.

Spurred on by the stick and carrot of fines and business advantage, security professionals will be tasked with devising a data centric approach to securing information. Whilst things may look like plain sailing by pooling and securing data in one place, this does not accommodate the myriad ways it gets there, the ‘prize' these lakes could represent for a motivated hacker and the risks introduced from organisations involved with provisioning services for processing this data. In our experience, enterprises have very little awareness of where data is and the risks it may be exposed to.

Attack, compromise, and exfiltration of data takes place within minutes, but more often than not enterprises don't discover these breaches until weeks or months after the fact. Even then containment and restoration can take days or weeks to complete. To move faster, businesses need more insights to understand location, type and risk of data, and to protect the most critical.

Successful enterprises are taking a holistic approach to big data security by evolving beyond traditional perimeter and endpoint-based security approaches and addressing the security of data itself at multiple levels.

Authentication. Most Hadoop distributions have native Kerberos-based systems for controlling access to data. Enterprises are enabling Kerberos-based access controls and using data preparation technologies that fully integrate with these control systems.

Authorisation. Most Hadoop distributions are also shipping with fine-grained authorisation mechanisms, such as Apache Ranger and Apache Sentry. Enabling fine-grained authorisation to data further ensures protected access to sensitive information.

Sensitive Data Protection. Data Masking technologies can be used to de-identify/de-sensitise private and confidential data. Data access is controlled in instances where data sets are shared for outsourcing, application testing or third-party analytics. 

Data Security Intelligence. A new category of data security has recently emerged to even more effectively improve enterprise security posture through proactive risk management. Before migrating data to Hadoop, organisations can identify where sensitive data resides and understand how it should be protected. After migrating data to Hadoop, risk can continue to be monitored with proper data security controls applied so that data proliferation is managed.

Advanced companies who have gained expertise in Hadoop technologies are also leveraging the larger-scale data analysis capabilities of Hadoop to drive even broader security outcomes for the organisation. For example, financial services organisations are collecting network access data, physical building access data and other sources of human interaction data in real-time into Hadoop to more quickly and comprehensively identify patterns of intrusion detection. The use of big data analytics as a platform to drive broader security outcomes opens up fundamentally new categories of big data driven security analytics.

At the heart of successful data security is understanding where applications create sensitive information in databases and how that information is proliferating as it's used by line-of-business applications, cloud services, mobile apps or pooled for analysis and processing in data lakes. Only then can businesses visualise where sensitive data resides - regardless of whether it's inside or outside of the corporate perimeter – and secure information at its source.

There is no doubt that big data security is one of the key pillars of making big data ready for analytical success. As the use of Hadoop continues to grow and more sensitive information is collected, processed and analysed in Hadoop, enterprises will need holistic approaches for ensuring big data security. Successful organisations are moving beyond traditional and superficial approaches to security to focus on more intelligent and metadata-driven approaches to data security. By leveraging a systematic understanding of big data, enterprises can more holistically improve their big data security postures and ensure that big data remains an asset, and not a liability.

Contributed by Greg Hanson, vice president business operations EMEA at Informatica