Managing Sensitive Data

While not giving out exact technical details on how to process sensitive data, the EU General Data Protection Regulation (GDPR) outlines the principles on sensitive data processing. The list is long and definitions are complex, but certain basic rules may be easily highlighted.

  1. Minimise the data. This means that you should only process the data that is absolutely needed. For example, if a dataset includes information about the people's age but that information is not needed for the research, it should not be included in the dataset. The unnecessary information should be removed from the dataset before processing data.
  2. Anonymize or pseudonymize the data whenever possible. Anonymization refers to removing all identifiable information from the data so that it is impossible to identify a single individual from it. Whereas, pseudonymization means replacing identifiable information in the data with artificial identifiers (pseudonyms) so that a single individual cannot be identified from the dataset without additional information.
    Note that pseudonymization does not remove sensitiveness of the data and thus, all requirements for sensitive data processing are still valid. The code registry for back-referencing pseudonymized data should not be stored along with the data. Preferably, it should be kept in a completely separate system in order to minimise the potential damage in case of a data leak.
  3. Encrypt the data. Sensitive data at rest (i.e. stored in any type of media) or in transport (i.e. being copied over the network) should always be encrypted with sufficient encryption key length and commonly accepted encryption algorithms.
  4. Destroy the data you do not need. The data should be completely destroyed when there is no need for it anymore.

Remember to identify and name a data controller for your sensitive data. A data controller determines the means and procedures for processing the data, meaning that they control how data is processed and for what purposes. For research data, a data controller is usually the Principal Investigator either alone or together with another legal person or entity.

You also have to identify and name a data processor, who processes the data on behalf of the controller. The GDPR states that 'processing' means any operation which is performed on personal data, such as collection, recording, organization, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction and erasure or destruction. This means that the data processor is often the computing facility where the data is processed, such as CSC.

More information