Guest post by Joao Pita Costa from (Quintelligence)
As the world starts to realise the potential global impact of the Coronavirus (COVID-19) which has just been declared a Pandemic by the World Health Organisation (WHO), and the outbreak is rapidly spreading across Europe, the MIDAS project provides useful monitoring and research tools to help better understand and potentially predict the impacts. This is in line with the global effort of WHO and the recent release of the news monitoring dashboard originating from the meaningful initiative of the UNESCO Research Centre for Artificial Intelligence (IRCAI) also hosted in Ljubljana, Slovenia.
With the aim to provide the meaningful integration of data analytics and services (MIDAS) supporting decision-making in Public Health institutes across Europe, the MIDAS Horizon 2020 European project can also be tuned to monitor and better understand COVID-19. It allows users to explore the worldwide media, configuring the topic to be monitored over a real-time news stream over 100 thousand news articles daily. Side-by-side it enables health professionals to explore the biomedical research open dataset MEDLINE, that feeds the well-established and worldwide adopted medical science search engine PubMed.
The MIDAS news monitoring tool is fed by Event Registry, collecting and analysing news articles daily in real-time, offering rich visualisations to explore the health topics of interest. Based on the same system, the UNESCO Research Centre for Artificial Intelligence (IRCAI), yesterday released a news monitoring dashboard dedicated to the COVID-19. This is showcasing the news on the epidemic outbreak in real-time, and allowing to explore that country-by-country. This useful and innovative system will try to predict related events based on the collected data.
Taking it from here, the MIDAS news dashboard allows the user to further explore the news based on the available data visualisation modules including related concepts, entities and categories, or even the sentiment on the news article selection on the query. To ensure the multilingual potential of the dashboard, the search query is using the Wikipedia terms, including Coronavirus (relating to the COVID-19 virus family, available in 86 languages), Coronavirus disease 2019 (corresponding to the specific COVID-19 sort, available in 67 languages), and 2019–20 Coronavirus pandemic (that relates to the pandemic itself, available in 87 languages). In the timeline of Figure 3, we can backtrack the news articles about the Coronavirus in Italy, to January 20 this year, discussing the triage of passengers commuting from Wuhan, China. We can further access the article sources and, in some cases, their impact on social media. The MIDAS news dashboard also allows the user to explore the related events in full, as well as the related news and timeline of news release per country. The Event Registry technology that feeds the MIDAS news dashboard also allows to build and share monitoring dashboards, such as in this example dedicated to the Coronavirus from the worldwide news perspective.
PubMed has been freely available since 1997, providing access to references and abstracts on life sciences and biomedical topics. MEDLINE is the underlying open database, maintained by the United States National Library of Medicine (NLM) at the National Institutes of Health (NIH). It stores structured information on more than 27 million records dating from 1946 to the present. The comprehensive controlled vocabulary associated with the MEDLINE dataset – MeSH – delivers a functional system of indexing both journal articles and books in the life sciences. Humans annotate most of the articles in MEDLINE with MeSH descriptors and classifiers. These descriptors permit the user to explore a certain biomedical related topic, which relies on curated information made available by the NIH. MeSH is composed of 16 major categories (covering anatomical terms, diseases, drugs, etc.) that further subdivide from the most general to the most specific in up to 13 hierarchical depth levels.
Although the very recent introduction of the supplementary concept relating to COVID-19 on January 13 – COVID-19 – the MeSH Heading Coronavirus, introduced in 1994 an referring to the predecessor known virus type, SARS-CoV-2, is located in the MeSH tree under the Coronaviridae family. The articles that are hand-annotated with this MeSH class can help researchers better understand the new sort from the available scientific literature. With this specific aim, the MIDAS platform offers an exploratory tool (see Figure 1, above) that allows the user to explore the published research based on a query (supporting the powerful syntax of the Lucene language) and a target with which the user interacts to explore the results on the subtopics it relates to. An example of such a query is MeshHeadingList.desc: “coronavirus” that will provide the user with all the articles in MEDLINE that were hand-annotated with the MeSH heading Coronavirus. This query can be combined with others using AND, OR, NOT operators as in the example.
MeshHeadingList.desc:”coronavirus” NOT ChemicalList.NameOfSubstance: “Viral Proteins”
that results with the set of all articles, including the MeSH descriptor Coronavirus but not labelled with the substance Viral Proteins. The MIDAS platform also makes available an exploratory dashboard that allows access to all the MEDLINE records and explore those directly (see Figure 4). These are stored as JSON files in an elastic search based database, provided with robust and well-established technology.
The usage of the MIDAS MEDLINE dashboard does not require much technical knowledge, allowing for the standard health professional to explore MEDLINE on his/her own. In fact, it also enables the user to rapidly build a selection of data visualisation modules over the queried data (saved as a subset of records). These visualisation modules include a variety of charts, tag clouds, heat maps and lists, easily configurable and based on templates. These are provided by the Kibana open-source data visualisation dashboard built for Elasticsearch, an also well-established technology. They allow to build and share templates, composed of different visualisation modules, such as in this example dedicated to the Coronavirus from the MEDLINE point of view. From the visualisation modules, the user can detect useful insights (e.g., on health topics and substances related to the Coronavirus) and recover the scientific articles relating to it, by using the precise Lucene syntax in their queries (as described above).
One of the highly innovative technologies derived from the research within the MIDAS project is the MeSH classifier. Given any snippet of text, it uses advanced text mining algorithms to assign MeSH classes to it. It, thus, can be used to classify news, reports and health records with this well-accepted health taxonomy. The system learned over the MeSH headings’ hand-annotation of the scientific articles in MEDLINE, and is using similarity reasoning to identify the most prominent classes it can annotate the provided text. It offers a web portal (see Figure 6) and an API to a diversity of usages. The web portal provides the positioning of the assigned MeSH categories, their similarity percentage and the MeSH tree branches to which the class belongs.
To extend the impact of the described MeSH classifier, the MIDAS project integrated the classifier with the news dashboard. This integration allows the user to utilise the MeSH headings together with keywords when exploring a certain news topic. This is also impacting some of the data visualisation modules that are provided by the MIDAS news dashboard. An example is the Article Categories option where we can see the distribution of the news articles subsequent to the search over the related MeSH classes. In the example of Figure 7, we read that 6.64% of the news on Coronavirus talks about topics related to the Mesh heading Organisms/Viruses/RNA Viruses.
The MIDAS project has also worked together with meaningful European Public Health initiatives, such as Influenzanet. To this aim, we have refocused some of the MIDAS technology (in particular, the MEDLINE dashboard) to offer open-source an easy to use exploratory tool – Influenzanet Hubs – that can be deployed by the national Influenzanet hubs to better understand their own data with little technical requirements. The European leaders in this initiative are using their resources to track COVID-19 with the contribution of online volunteers. These were previously providing weekly online information on Influenza-like-illness by reporting their symptoms. MIDAS hopes to also contribute in this front to the fight against the COVID-19 outbreak.