What is GDPR? A compliance guide

GDPR – business or pleasure?
Do you remember GDPR (or, for mnemonic assistance, Gitte and Per)? This is one of three posts on the EU law that everybody feared last year: What did we think it was, what is it, and what effects has it had?

As of the 25th of May 2018 the new general data protection regulations came out. The purpose of GDPR is to give people control over their own personal data and give them the power to decide who can use it and when it can be used. The new regulations were written with large corporations in mind, but they also have implications for the academic world, for example when personally sensitive data is used in written assignments.

For students and researchers of linguistics, this means that there are new considerations to make when collecting and storing, for example, conversational data. (See also Researchers hiding in fear of GDPR, ed.)

Here’s a short-cut to compliance:

Is your data sensitive data?

Personal data is defined as “any information that relates to an identified or identifiable living individual. Different pieces of information, which collected together can lead to the identification of a particular person, also constitute personal data.” You only need two different pieces of information to point at the same person for your data to be sensitive.

This means your audio or video recordings, names and locations, or un-anonymized transcripts require a secure database suitable for storing sensitive data.

Know your data! :)

Text, audio, and video data require different amounts of storage space, and different databases accept different file formats.

You likely also have some metadata (information about participants, languages spoken, place and time, etc.), which can be in different formats, including:

Header information in CLAN
A spreadsheet or text file
.imdi file, a standard format used with multi-media linguistic data

Who can access? And how much are you willing to share?

Decide if you want to share your data or simply organize it for your own use, and if you want to store it locally or have remote access. You can share all your data, such as metadata, transcripts, and recordings, or just some of it.

When sharing online, you can make your data publicly accessible or restrict access via a log-in. Some databases will create a log showing who has accessed your data. You can also share your data by request and create your own log.

Know your limits

If you don’t have a lot of time to organize your data, or if you need a solution soon, look for an existing system that requires minimal metadata. If your data is already organized, and you have more metadata, you can take advantage of databases with more functionality. If you have plenty of resources and IT expertise, you can create a database from scratch.

Best practice and recommendations

Use a clear consent form
Keep a log of who has accessed person sensitive data and for what purpose
Your subjects have the right to know what personal data you have, what it is used for, and by whom. Subjects can revoke access to their data at any time.
If there is a breach in security, your subjects must be notified
You cannot put sensitive data online (in Google Docs, DropBox, etc.) or store it on your network-connected computer, where it can be hacked or stolen. The safest option is to put the original data on a hard drive and keep it locked up.
Look at what other people in your field are using. Contact them for more information.
If you plan to share your metadata with others, make sure it is anonymised, but keep the original information as a backup.

Some Linguistic Databases

TalkBank is a website with several subcorpora such as CHILDES, SamtaleBank, AphasiaBank, and others. TalkBank is a US-based member of CLARIN.
CLARIN-DK collects all kinds of anonymized linguistic data and metadata.
ELAR (The Endangered Language Archive) at the SOAS University of London is “a digital repository preserving and publishing endangered language documentation materials from around the world.”

You can also choose to keep your data locally on an encrypted server or on a securely stored external drive. An example of this is DanTIN (Danish Talk In Interaction) at the University of Aarhus. CLAN also allows for metadata search if you follow specific transcription conventions.

We all get by with a little help from our friends…

DIGHUMLAB is a national group “promoting tools, digital resources, communities, and opportunities to Danish researchers in the humanities and the social sciences” . They can offer advice and guidance on your personal project.
File Sender is a Danish system to securely send large files using the WAYF identification system.
WeTransfer is a platform you can use to securely send your sensitive data files.
CMDI Maker is an online resource to easily make .imdi files from your metadata. It is part of CLASS – Cologne Language Archive Services.

Emily Melsen Jørgensen, Phoebe Berke, Giordana Meloni and Carina Ryge Andersen researched this subject as part of their MA in Linguistics in fall 2017. The information has been updated and revised in spring 2018.