Release Data from Silos…, but Triumph in the Paper War First

A guest post by Juha Pajula from VTT

The traditional way to define research projects in health (and other) domains is first to define challenge to study and how to approach it. Then, after having this idea crystalized, a research plan is written on the basis of the brilliant idea. Often highly influenced by requirements related to funding instruments or thesis supervisor. When the plan is accepted and funding confirmed, the project team contacts data owners or starts collecting data from clinical trials. Ideally, when enough data is available, the actual analysis and research is carried out, and results are reported. This has worked well in the past, but now, in the GDPR-era, different regulations, improved data security and dedicated measures for ethics of data usage must be fulfilled before researchers can even start to access the precious data.

What happens now with this traditional setup? In practice, when a team should start the data collecting activities, they first need to write a new research plan defining the specific needed variables from data sources and motivate why they are needed for the research. Additionally, in the health domain and especially with personal data, they need to clarify ethical and other considerations on top of the new research plan and get all of these accepted by lawyers from their own institution as well as data owners’ institution. Often, there are also limitations on data transfer, and it may be that data cannot leave the data owner environment, or there are significant security requirements posed by the data owner regarding where and how the data may be stored and accessed. If there is more than one data source, most of these “research plan-ethics-data access” etc. exercises must be repeated separately. In some cases, even additional agreement is needed to allow combining the different data sources. All of this takes months, or years, in worst-case scenarios. For a project that lasts a few years (as is typical), this is excessively long and complex for sustainable research scenarios. It causes a major risk towards reaching objectives in time for many projects.

So can researchers act or do something to avoid falling into this pit of sheer endless bureaucracy? Is there a better way to do this? The personal security concerns and GDPR related regulation are here to stay, they won’t disappear, but fortunately, researchers can take the previously described problems into account earlier. If the research projects are planned using the GDPR and related ethics as a guideline, we can move some of the problems out from the actual project work and save project resources as well as time from all related parties: researchers, data owners, regulators lawyers etc. What does this mean in practice? Shortly: when a research approach and methods is known, and research planning is started, the data owners and even lawyers from all sides should be involved in research plan writing as early as possible. In this way, the needed regulation and possible limitations from the nature of data are considered from the beginning. When challenges can be taken into account in an early phase, it also enables to use the same or almost the same research plan when requesting data as well as needed security clearances for the data owner security personnel. From targeted research outcomes in comparison with the original plan, this “GDPR-proof” approach ensures that researchers don’t promise things you cannot do with the data!

Sounds nice, let us change the way we do this! In practice, this is still challenging. Even if researchers may be ready to change the way they work, many relevant parties do not have resources for participating in a research planning process as described above. To make all of this possible in practice also data-owners should have some basic resources from their side to support the planning work. Data owners should also know their data as well as how they can deliver the data for the researchers. During the MIDAS project, we witnessed multiple actions from Ireland, Northern Ireland, Finland and the Basque country to support this by Honest Broker initiatives. As an example, in Finland, the new data authority named Findata will start to work within the data access regulations in April 2020 to cover all required paperwork for all health register data in Finland.

As the COVID-19 virus is running around the globe, many research institutions, along with the data industry, are looking at options to gather health data related to COVID-19 patients. Due to the fast pace of the virus spreading it is interesting to follow how the regulatory activities of the data sharing are modified to get the necessary data for researchers all around the world to enable the data analytics as fast as possible to conquer the COVID-19 challenge.