AI and Data Access: Lessons learned in MIDAS

A guest post by Gorana Nikolic from KU Leuven

With increased usage of AI in international projects that have large and versatile consortium members, there is a strong need to redefine the order of actions that are executed in the beginning of a project. But what are those actions and why would we need to reconsider them?

In the traditional approach, after the project grant/definition has been written and the consortium members have been chosen, it would often be followed by a painstaking and, more often than not, a long-lasting process of various legal discussions, agreements, NDAs, DAAs etc.. A significant amount of time is spent on going back and forth between various legal, ethical and privacy committees, making sure that everything is in order before the data can be accessed. What is the reason for this?

One of the reasons is the lack of understanding of the data. Not all stakeholders are aware of what their data really contains. Being able to understand what data they have in private data silos and what they need would pave the road for formulating better research questions and requirements in the phase where the project is being defined.

Another reason are various legal restrictions that are imposed when it comes to sharing and using the data. The recently established GDPR is the present-day “must-comply-to” rule. But rather as a constraint, the GDPR can be used as a guideline or tool in creating a safe environment without the need to have the data aggregated on one local machine. If used correctly, the GDPR can be a great quality assurance tool that will encourage the stakeholders to take the necessary steps before the beginning of the project. Through this process, it would be possible to empower the stakeholders to raise their consciousness about their data and help them write better project research questions. 

Those are only some of the topics that were tackled during the MIDAS Symposium “AI and Data Access” panel in Londonderry. The feedback received from the panel discussion and the consortium confirmed that both industry and academic partners struggle with similar issues.

All the listed facts seem to point to one thing – there is a need for a universal protocol that will be shared by both the industry and the academia, that will connect ethical privacy and legal committees, that will stimulate stakeholders to share practices, and that will make use of the GDPR not as a limiting factor but as a quality assurance tool. And, most importantly, that will ensure that the discussions and agreements about data access happen prior to the project start. The future lies in setting in motion a process of data access and working towards a definition of a clear guideline that could become a standard protocol in the next 15-20 years. This will help bridge the gap in the lack of trust between the health data owners and technical data processors, making the use of data safe and beneficial for both sides.