Metadata – the key to unlocking data utility

Guest blog post by Susan Campbell from Business Services Organisation

What is metadata?

Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use or manage an information resource. Metadata describes the data characteristics such as type, length, and value ranges. It also provides information on completeness, context, reliability and quality.

Why is it important?

This graph illustrates the phenomenon of “information entropy”. At the time of data development, the developers know the most about their dataset and the steps that were taken to create it. Over time, memory of the details begins to fade. Circumstances in life can intervene, and eventually the knowledge about the dataset is gone. Without a metadata record, information about the dataset could be lost forever, therefore, making the data unusable.

A good metadata record provides all of the critical information for discovery, understanding, and reuse.

Metadata allows data developers to:

  • Avoid data duplication
    • Share reliable information
    • Saves time and resources in the long-run

Metadata gives a data user the ability to:

  • Search, retrieve, and evaluate dataset information from both inside and outside an organization
    • Find data: Determine what data exists for a geographic location and/or topic
    • Determine applicability; decide if a dataset meets a particular need
    • Discover how to acquire the dataset identified; process and use the dataset
    • Understand the dataset, including definitions of column names, or expected numerical ranges found in the data

Metadata helps ensure an organization’s investment in data:

  • Documentation of data processing steps, quality control, definitions, data uses, and restrictions
    • Ability to use data after initial intended purpose
    • Allows organization to track data use and facilitates publication
    • Removes reliance on single members of staff
    • Advertises the organization’s willingness to facilitate research

What hinders good metadata?

Metadata does require time and effort to create. The workload can be reduced when metadata creation is incorporated into the data development process and the effort is distributed among data contributors. Metadata creation and on-going management should be treated as a standard data development procedure and resources for staff and time should be included in project and proposal work plans and budgets.

NI Honest Broker Service & Metadata

When the NI Honest Broker Service (HBS) was established in 2014, no formal metadata existed for the Health and Social Care (HSC) data being made available through the service. Applicants to the service were often not clear on data availability, interpretation, reliability or quality. This was a barrier to potential projects and source of delay to approved projects.

Since the launch, metadata has been developed for the Enhanced Prescribing Database, Emergency Department Activity dataset and a subset of the Hospital Inpatient Activity dataset. These documents were shared with the MIDAS data ingestion team. Metadata still varies greatly in coverage and quality across the 20+ datasets available through the HBS.

As the HBS had no internal capacity, the decision was taken in 2016 to commission the development of metadata for the Northern Ireland Maternity system (NIMATS). NIMATS records data on each birth in Northern Ireland hospitals irrespective of the residence area of the mother. The dataset had already been used on several occasions for health related research. The work was undertaken on a part-time basis by staff in the NI Public Health Agency, one of whom had extensive experience in working with the dataset.

The metadata is presented as an Excel document and firstly provides a description of NIMATS and its data:

Variables were selected from the full dataset based on expected potential research value and then prioritised. 316 were deemed high priority and included in Phase 1, and a further 442 included in Phase 2.

One unique aspect of the metadata is that it matches the variable with its module/screen location in the NIMATS system and with its object name and location in the reporting tool (SAP Business Objects). An example of the metadata for the “SMOKING” variable is shown below.

Once completed and reviewed, the metadata was published on the NI Honest Broker Service website and was referenced by a potential researcher on the first day.

Lessons learned

Following completion of the work, a Lessons Learned review highlighted some issues and made a number of recommendations for future metadata development projects:

  • Relevant expertise – always have at least one person familiar with the application system and business area (Subject Matter Expert) for which metadata is being developed
  • System developer input – include system developer/technical resource time in budget
  • Allocation of time – need to set aside dedicated protected time to complete the work
  • Be realistic – pilot the template before moving to the main stage of the project
  • Work in parallel not series – agree at outset which fields are to be completed by the system developer. Have both teams working on templates at once
  • Categorise template fields by user – consider excluding/hiding certain categories from publicly available metadata file
  • Moving goal-posts (e.g. a series of updates to the application system) – keep to policy of making no amendments to the application system system without new/updated metadata being supplied

What still needs to be done

A plan for developing more metadata needs to be agreed and resourced. The plan should focus on commonly used datasets already available through the HBS. The metadata needs to be developed by Subject Matter Experts and reviewed by HBS.

Consideration should be given to mandating production of metadata as a pre-requisite to bringing datasets into the NI HSC Regional Data Warehouse.

A process for periodic review and updating of metadata also needs to be established.

Acknowledgements

With thanks to Dr Diane Anderson, Joanne Murphy and Adele Graham (Public Health Agency)

“lock” is licensed under CC0 1.0

Some material sourced from https://www.dataone.org/education-modules