The effects of leaked personal health information can be exceptionally damaging.
A single breach of one individual's privacy can thrust a healthcare institution, and the victim, into international headlines. The UK's National Health Service will soon require General Practitioners to enter patient data into a centralised health database called care.data.
After being anonymised, the clinical information will be used primarily by the NHS for analysis of medical outcomes, as well for drug research by academics and healthcare companies. Large scale collection and distribution of health data, even though it has been de-identified, does raise privacy concerns, and minimally requires more attention to compliance obligations under the UK's existing Data Protection Act (DPA) and NHS rules.
DPA sets the stage for protecting health data
In April 2013, an advisory commission to the NHS approved the care.data program to make medical data more widely available to the medical research community. There are obvious big data benefits to analysing millions of health records, but this comes with considerable privacy implications. Fortunately, the NHS is not starting with a blank slate.
Since 1998, the UK's DPA has required organisations to have security controls in place when collecting and processing consumer personal data - names, addresses and other data that relates to an individual. The NHS falls under most, but not all of the DPA's security and personal privacy regulations.
For example, while patient data is protected and requires an explicit opt-in to share with others, in cases of epidemics and other health emergencies the NHS can access and distribute patient data to investigators.
When personal data is stripped from data sets however, the DPA requirements no longer apply and the data can be more openly used. This is the approach that the NHS has taken with its care.data repository. Allowing anonymised patient data to be available makes great sense, in that researchers and others can publish results, papers and important breakthroughs without compromising patient identities.
The DPA does have an exception for sensitive data - which includes ethnicity, religion, political belief and medical information. Even though there are no identifiers in the strict sense, sensitive data still needs to be treated with great care. It turns out that there is more of an issue with sensitive and other patient medical data than is suggested by the DPA regulations.
DPA anonymisation falls short
Though the UK's data protection regulations set a high privacy standard, advocates have openly voiced their concerns with the government's plans to make patient medical data more widely available.
Ross Anderson, professor in security engineering at the University of Cambridge Computer Laboratory, notes that conventional ideas about identifiers in personal data are outdated: Computer scientists realised about 30 years ago that protecting privacy using anonymity is a lot harder than it looks.
In its report on science as an open enterprise, the Royal Society said in June that: “It had been assumed in the past that the privacy of data subjects could be protected by processes of anonymisation such as the removal of names and precise addresses of data subjects.”
The final warning was sobering: “However, a substantial body of work in computer science has now demonstrated that the security of personal records in databases cannot be guaranteed through anonymisation procedures where identities are actively sought.”
Consumer data often contains certain information, formerly considered non-consequential but now because of the web, public online forums, expanding use of social media and improved computing resources, that can be used as a new kind of personal identifier. The re-identification potential of this ‘grey' personal data has been known for over a decade.
In a well-publicised incident in 1998, MIT graduate student Latanya Sweeney managed to identify the medical condition of the governor of Massachusetts from ‘anonymous' records released by the Veterans' Administration. Matching the zip code, date of birth and gender in the records to public voting rolls, Sweeney was able to re-identify Weld's diagnosis and prescriptions.
Sweeney has also pointed out that ethnicity when coupled with location information can act as a quasi-identifier. More recently, researchers have shown it is possible to re-identify anonymous web search histories and movie preferences, matching results to individuals.
The larger point is the divide between what has historically been seen as personal (and anonymous) data has become blurred since we started to share so much of it on social networking sites.
In the UK, this issue of re-identifying data came to a head in the 1990s when John Major's government built a database of hospital records with names removed, postcode and date of birth and other quasi-identifiers still present – so most patients were easy to identify.
After the British Medical Association (BMA) objected, a committee led by Dame Fiona Caldicott was established in 1997 to look into the problem. One of the results of the Caldicott committee was a set of principles focused on limiting medical data collection and usage.
While Caldicott and DPA rules are still important; we have her to thank for ‘Caldicott guardians' to oversee patient data. However, they don't directly address the re-identification and other privacy issues now posed by the NHS's centralised databases.
More recently, Dame Caldicott was commissioned by the Chief Medical Officer to investigate how patient information should be used in the new NHS data collection and sharing system.
The report ‘Information to share or not to share' was made public in March 2013. Besides updates to the original Caldicott principles and new recommendations with regard to breach notifications, Caldicott directly addresses quasi-identifiers, which she directly refers to as a ‘grey area' of data identifiers.
The report doesn't necessarily supersede DPA's regulations: GPs and hospitals are still considered ‘data collectors' and are responsible for protecting patient personal data. The Caldicott report outlines in great detail an approach to share data between medical providers and the NHS that protects sensitive data and other quasi-identifiers while still enabling medical researchers to share and publish their work.
Even with the positive changes to data protection and privacy brought about by the original Caldicott committee and the DPA's overall security regulations, in the 12 months to the end of June 2012 186 serious data breaches were notified to the Department of Health. Caldicott points out that these were all about data losses and breaches of the Data Protection Act – but not sharing.
However, even with new Caldicott recommended controls, there's simply no foolproof system. With the wider sharing of medical data in the UK, there will be opportunities for data to be hacked, misused, misplaced and accessed by unauthorised users.
Data governance and the business case for it is clear: If you can ensure that the right people have access to only the data they need to perform their jobs, all use is monitored and abuse is flagged, you will improve security, reduce risk and take a significant step toward sustainable least privilege access compliance.
While this may mean we have to cross some bureaucratic bridges, the eventual goal – preventing healthcare information getting into the wrong hands - will be reached as long as we play our part to ensure a lot fewer data breaches.
David Gibson is vice president at Varonis