Data Science Talent Logo
Call Now

Challenges and Opportunities in Clinical Trial Registry Data ByPhilipp Diesinger

 width=Dr Philipp M Diesinger has occupied key roles in data science since 2009. His career includes a post-doctoral position at the Massachusetts Institute of Technology, Data Science Consultant at SAP and Head of Global Data Science at Boehringer Ingelheim. Philipp’s specialisms include Predictica Analytics and machine learning.
In this post, Philipp Diesinger and his team give us an overview of the current challenges in the realm of clinical trials data. High quality data is crucial for a trial’s success, but clinical trial registry data suffers from accessibility, quality and completeness issues. The solution, Diesinger argues, lies in harnessing emerging technologies:

Clinical trials are research studies conducted to evaluate the safety, efficacy, and potential side effects of medical treatments, drugs, devices, or other interventions. Such studies are crucial for advancing medical knowledge, developing new therapies, and improving patient care. Clinical trials typically follow a structured protocol that outlines the objectives, methodology, participant eligibility criteria, treatment regimens, and data collection procedures.

Regulatory environments play a crucial role in overseeing and governing clinical trials to ensure participant safety, data integrity, and  ethical conduct. Regulatory bodies such as the Food and Drug Administration (FDA) in the United States, the European Medicines Agency (EMA) in Europe, and similar agencies in other countries, set forth relevant guidelines, regulations, and approval processes. These regulations cover various aspects, including trial design, participant recruitment, informed consent, data collection and reporting, and adherence to good clinical practice standards.

Trial ‘sponsors’ are individuals, organisations, companies, or institutions that take primary responsibility for initiating, managing, and funding the clinical trial. The sponsor plays a central role in all phases of the trial, from protocol development to study completion and reporting. It is a sponsor’s responsibility to ensure the registration of every clinical trial and report its results with accuracy and comprehensiveness in relevant clinical trial registries such as the US NIH platform ClinicalTrials.gov [1]

Trial registries play a crucial role in the process of clinical trials. They promote transparency by providing a centralised platform for researchers and sponsors to publicly register details about their clinical trials even before they begin.

This includes information such as the study objectives, methodology, participant eligibility criteria, interventions, outcomes, and contact information. Trial registries help prevent publication bias, which occurs when studies with positive results are more likely to be published than those with negative or inconclusive results.

Trial registries serve as valuable repositories of information about ongoing, planned and completed clinical trials. Researchers, healthcare professionals, patients, and the general public can access trial registries to identify relevant studies, learn about study design and objectives, assess eligibility criteria for participation, and obtain contact information for trial investigators or sponsors. This helps to facilitate collaboration, improve awareness of available research opportunities, and support informed decision-making by stakeholders.

Trial registries help prevent unnecessary duplication of research efforts by enabling researchers to identify existing trials on similar topics or populations. This is especially important amidst everincreasing regulations as well as economic considerations that influence clinical research [2]

Like many other big data sets, clinical trial registry data suffers from accessibility, quality and completeness issues. Many clinical trials are conducted across multiple countries resulting in fragmentation of relevant information across multiple separate registry databases. This fragmentation poses several challenges. With clinical trial data spread across registries, researchers, healthcare professionals, policymakers, and patients may face difficulties conducting comprehensive searches or analyses. Fragmentation can result in an incomplete picture of the overall research landscape within a particular disease area, intervention type, or population group.

The World Health Organisation (WHO) has identified 17 public registries as so-called Primary Registries. For these registries, the WHO has established a set of minimal standards [3] which these registries are required to adhere to.

In addition, the WHO mandates a minimum amount of information that individual registries need to collect to consider a clinical study as ‘registered’, known as the Trial Registration Data Set (TRDS). But whereas the TRDS qualifies the ‘what’, it does not qualify the ‘how’, leaving substantial room for interpretation to the registries, along with the possibility to individually set requirements for additional data. Some of these differences can be small, e.g. using Roman vs Arabic numerals for phase numbers.

Other differences, however, can be substantial. For instance, differing data models increase the difficulty of comparing trial information across registries. One registry might have individual ‘rows’ for study endpoints, timepoints and metrics, whereas another might simply have a large text field containing all the information at once. Registries might not be internally consistent, e.g. allowing different spellings for the same condition.

Addressing fragmentation of clinical trial data requires efforts to promote data standardisation, interoperability, and collaboration among registries and stakeholders. Initiatives such as the World Health Organisation’s International Clinical Trials Registry Platform [4] aim to improve the accessibility and quality of clinical trial data

Clinical trial data is stored across several national registries

A wealth of data: Clinical trial data submissions by year (all registries combined)

by facilitating the harmonisation of registry standards, promoting data sharing, and enhancing transparency in clinical research. Other initiatives have outlined the advantages of combining clinical trial data into a single platform [5, 6] A centralised repository for clinical trial data may help to identify opportunities for collaboration, share data across studies, or conduct meta-analyses to synthesise evidence from multiple trials, increasing the potential for scientific advancement and innovation in healthcare.

Harmonising registry data poses a significant challenge [7]. Simply using the ‘union’ of all registries’ data models as the basis for a unified data model would not only be prohibitively complex to understand but also extremely inefficient. Instead, it would be important to build it around an understanding of how data fields in individual registries map to each other, what fields are commonly reported, and which ones are so uncommon or specific that they are of limited relevance to the average user.

Beyond that, harmonisation of terminology is required. Medical condition terminologies are currently inconsistent across registries. For instance, the NIH’s ClinicalTrials.gov uses Medical Subject Headings (MeSH) terms, while the EU Clinical Trial Registry recommends Medical Dictionary for Regulatory Activities (MeDRA). Other registries may not name or enforce a standard. Sponsor names may appear in various forms, pharmaceutical companies may be referred to under different variations of their legal entity names and names of clinical sites appear to be misspelled frequently. In addition, some information might be missing entirely.

Emerging technologies offer promising solutions to address the challenges associated with clinical trial data, unlocking its full potential. A significant portion of clinical trial data exists in unstructured formats, posing barriers to efficient analysis and utilisation. However, recent breakthroughs in GenAI and natural language processing present opportunities to effectively harness this wealth of unstructured data. Furthermore, rapid progress in vector database storage capabilities has enabled scalable embedding, storage, and retrieval of unstructured data, paving the way for more comprehensive and insightful analyses on a large scale.

Conducting clinical trials is a resource-intensive process. Balancing cost considerations with the need to maintain rigorous scientific standards and ethical practices is a significant challenge in the design and execution of clinical trials.

Utilisation of new technologies and methods can help alleviate cost pressure. Digitisation of clinical trials offers opportunities to enhance trial efficiency, data quality, participant engagement, and regulatory compliance while driving innovation and accelerating the development of new therapies and treatments in healthcare. Digitisation of clinical trials utilises the integration of digital technologies and tools into various aspects of the clinical trial process to enhance efficiency, data collection, analysis, and patient engagement. This transformation involves leveraging digital platforms, software applications, electronic devices, and data analytics to streamline trial operations, improve data quality, and accelerate the development and evaluation of new medical interventions. Key components of digitisation in clinical trials include: electronic data capture (EDC), remote patient monitoring (RPM), telemedicine and virtual visits, data analytics and artificial intelligence, and electronic informed consent (eConsent).

Advancements in data technologies and the digitisation of clinical studies are revolutionising the utilisation of vast datasets, analytics, and insights to enhance decisionmaking processes. Within this landscape, clinical trial registry data emerges as a pivotal resource. It serves multiple crucial functions, including optimising site selection to ensure robust and timely participant recruitment and identifying potential risks such as competitors aiming to recruit patients from the same sites. Moreover, clinical trial registry data facilitates the evaluation of a trial’s success rate by comparing it to similar studies conducted in the past. Given that trial data is continuously updated, it can also be leveraged to monitor ongoing trials for potential risks. Furthermore, insights gleaned from trial data shed light on the activities and portfolios of potential competitors, aiding in strategic decisionmaking processes.

Despite strong opportunities, trial registry data remains underutilised by the industry. However, the emergence of new technologies presents exciting opportunities to overcome the challenges associated with harnessing the full potential of global clinical trial registry data.

Trial data can be used to derive insights and establish benchmarks and KPIs across clinical studies

Diabetes studies are typically shorter than oncology studies. Diabetes studies: median 645 days, mean 787 days, sample size 1432 studies. Cancer/neoplasm studies: median 1644 days, mean 1817 days, sample size 3048 studies. (Of each distribution a random sample of 1200 studies was visualised below)

REFERENCES

[1] Tse T, Williams RJ, Zarin DA. Reporting “basic results” in ClinicalTrials.gov. Chest. 2009 Jul;136(1):295303. doi: 10.1378/chest.08-3022. PMID: 19584212; PMCID: PMC2821287.

[2] Elisabeth Mahase, Clinical trials: Number started in UK fell by 41% in four years, finds report BMJ 2022; 379 doi: doi.org/10.1136/bmj.o2540

[3] World Health Organisation (2018) International Standards for Clinical Trial Registries who.int/publications/i/item/international-standardsfor-clinical-trial-registers

[4] International Clinical Trials Registry Platform (ICTRP who.int/clinical-trials-registry-platform

[5] Venugopal N, Saberwal G (2021) A comparative analysis of important public clinical trial registries, and a proposal for an interim ideal one. PLoS ONE 16(5): e0251191. doi.org/10.1371/journal.pone.0251191

[6] Goldacre B, Gray J. OpenTrials: towards a collaborative open database of all available information on all clinical trials. Trials. 2016 Apr 8;17:164. doi: 10.1186/s13063-016-1290-8. PMID: 27056367; PMCID: PMC4825083. pubmed.ncbi.nlm.nih.gov/27056367/

[7] Fleminger J, Goldacre B (2018) Prevalence of clinical trial status discrepancies: A cross-sectional study of 10,492 trials registered on both ClinicalTrials.gov and the European Union Clinical Trials Register. PLoS ONE 13(3): e0193088. doi.org/10.1371/journal.pone.0193088

Back to blogs
Share this:
© Data Science Talent Ltd, 2024. All Rights Reserved.