Sensitive data - Services for Research
Definition of sensitive data
While composing a definitive description for sensitive data is hard to accomplish there are a few simple guidelines for identifying sensitive data, which are all derived from national and EU legislation. Clearly, personal data should be handled with care, but the term ‘sensitive personal data' needs further definition and EU General Data Protection Act (GDPR) does just that by listing the following as special categories (Art. 9 (1), Art. 10):
- racial or ethnic origin;
- political opinions; or
- religious or philosophical beliefs, or trade union membership.
- genetic data;
- biometric data for the purpose of uniquely identifying a natural person;
- data concerning health;
- data concerning a natural person's sex life or sexual orientation; or
- data relating to criminal convictions and offences, or related security measures
Another candidate for sensitive data is health care data, defined in more detail by the Act on the status and Rights of Patients (785/1992, 13 §) by stating that information contained by, and derived from, patient documents shall be confidential.
Closely related to health care data is biomedical data for which the Biobank Act (688/2012) indicates that data related human samples and processing of those samples shall be confidential.
Another, much more diverse topic are documents; agreements, contracts, governmental documents, documents addressed to or in possession of an authority, etc., may be secret, classified, confidential or otherwise deemed sensitive. The Act on the Openness of Government Activities (621/1999) covers some of those, but not all. Sensitive data can also include data that reveals the location of rare, endangered or commercially-valuable species, or other conservation efforts.
Some data collected from Statistics Finland is also sensitive and you should consult Statistics Finland should you have any data from them before processing the data.
Finally, data under a non-disclosure agreement(s), such as confidential business related data which, if leaked, could harm the data owner shall be deemed confidential.
Should your data fall under any one, or more, categories listed above it is very likely that your data should be deemed sensitive and processed accordingly.
- EU General Data Protection Act (GDPR) defines the personal information in its 4th article and the data handling principles in its 5th, 24th and 32nd articles. In 9th article it defines the special cases which included for example genomic or biometric data.
- In Finnish; Yleinen tietosuoja-asetus
- Read more: Finnish Social Science Data Archive, Anonymisation and Personal Data
- Consider also the ethical principles, that must be taken into account when handling sensitive data.
How to manage sensitive data
While not giving out exact technical details on how to process sensitive data, the EU General Data Protection Regulation (GDPR) outlines the principles on sensitive data processing. The list is long and definitions are complex, but certain basic rules may be easily highlighted.
First, data minimisation shall be enforced. This means that only the data that is absolutely needed should be processed. For example, if the dataset includes information about persons' age but that information is not needed for the research, it should not be included in the dataset and should be removed from it before processing.
Second, the data should be anonymised or pseudonymised whenever possible. Note, however, that pseudonymisation does not remove sensitiveness of the data and thus, all requirements for sensitive data processing are still valid. The code registry for back referencing pseudonymised data should not be stored along with the data but preferably in a completely separate system in order to minimise potential damage in case of a data leak.
Third, sensitive data at rest (i.e. stored in any type of media) or in transport (i.e. copying over network) should always be encrypted with sufficient encryption key length and commonly accepted encryption algorithms.
Fourth, the data should be completely destroyed when there is no need for it anymore.
Furthermore, for any sensitive data, a Data Controller who determines the means and procedures for processing the data, must be identified and named. For research data, this would typically be the Principal Investigator either alone or together with another legal person or entity. A Data Processor, who processes the data on behalf of the controller, has also be identified and named. Considering that the GDPR states that 'processing' means any operation which is performed on personal data – such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction – it becomes clear that the data processor would be the computing facility where the data is to be processed, such as CSC.
Especially if you need to process sensitive data, ePouta ( https://research.csc.fi/epouta ) might be suitable for you. Its virtual machines do not have direct internet connection but are aimed to be connected to the customer organisations' local network. To setup such connection requires some effort and is usually not suitable for short term usage. It is possible to add storage capacity into the customer's ePouta environment. It does not have back-up.
Meanwhile, please contact firstname.lastname@example.org and lets discuss about the possible solutions.
Read the article: CSC for sensitive data — because your data is worth it (and should be kept that way) by Jaakko Leinonen