Organizing your data

All digital information is structured data. When organizing your research data it is important to create coherent and intelligible entities that are easy to access and reuse.

  • Sort and classify your information
    • For instance: don't mix different types of information in excel columns: it is usually easier to combine datasets than sort out ill structured data later
  • Think about granularity (file size) and metadata
  • Decide on formats, units, codes etc. and be consistent
    • Use common file formats, preferably open
    • You can find a list of recommended file formats on the page about preservation. If you use other formats you will need to think about adding technical documentation of the file format. 
  • Write a code book, document. Read me files are often necessary.
  • Think about intelligibility
  • Be careful when rearranging, reformatting, sorting or copy-pasting data
  • Try to avoid including temporary or hidden system files along with actual data files
  • Have processes in place for checking the data quality and completeness
  • Be clear about master copies and other copies
  • Be careful and plan well for sensitive data and anonymization
  • Think about security and access rights
  • Plan and agree on which versions of a dataset will be archived and/or published
  • Think about reproducibility and citing data
 

 

Files and folders: structuring and naming

It is important to take some time to plan file and folder structures and naming.

  • Create and agree on a system for naming files and folders and be consequent
  • Try to organize files logically using folders and subfolders rather than including all files in a single folder
    • Avoid very deep folder structures, since they can be difficult to handle
  • If your data is time-sensitive, and logically organized by time periods, it could be useful to organize files by time-specific folders, such as YYYY-MM-DD
  • Use meaningful, unique file and folder names
  • Keep file and folder names as short as possible but relevant. 25 characters is usually considered maximum.
  • Dates in YYYY-MM-DD format allows you to sort and search your files
  • Avoid using special characters such as % & / \ : ; * . ? < > ^! " () and Scandinavians
  • Use three digits (or 4 if you have a large number of files) i.e. 001, 002…….201, 202 (not 1, 2, 21).
  • Use underscores (_) instead of spaces
  • If using a personal name in the name give the surname first followed by first name
    • Though, be very careful with personal data when naming files and folders
  • Indicate version number by using ‘V' or "version" and number (and subversions with more digits if minor changes)

 

More reading

The UK Data Service: Format your data