Data Science Talent Logo
Call Now

Engineering Certainty: How Data and AI Are Rewriting the Rules of Clinical Trial Success by Philipp Diesinger, Robert Lindner, Gabriell Máté, Stefan Stefanov and Peter Bärnreuther

 width=Philipp Diesinger is a data science executive with over 15 years’ experience driving AI-powered transformation. He’s led initiatives at Boehringer Ingelheim, BCG, and Rewire, delivering value through advanced analytics, GenAI, and data strategy at scale.

 

 width=Robert Lindner is an expert in AI and data-driven decision-making. He is the owner of Rewire’s TSA offering and has innovation in healthcare and life sciences. led multiple AI/ML product initiatives built on real-world data in the life sciences.

 

 width=Gabriell Máté leads technical efforts as CTO at GoTrial. With extensive experience in addressing challenges throughout the pharmaceutical value chain, he is committed toinnovation in healthcare and life sciences.

 

 

 

 width=Stefan Stefanov is the chief engineer of the GoTrial clinical data platform with over 7 years of experience in leading and developing data and AI solutions for the life science sector.

 

 

 

 width=Peter Bärnreuther is an expert in AI-related risks with 10 years of experience from Munich Re. He is a physicist and economist specialising in regulatory topics and emerging technologies.

In our latest post, industry experts from GoTrial, Rewire, and Munich Re discuss their recent collaboration: pioneering an innovative platform for de-risking clinical development.The platform consolidates over a million clinical trials – which are unified and enriched with regulatory and epidemiological information – and offers predictive insights on crucial factors including trial feasibility and protocol complexity. Sponsors who use the platform can design and manage trials with greater precision, accountability, and confidence:

Ah, the tech industry. The same industry that once worshiped programmers now treats them like relics from an ancient civilisation, like scribes who refuse to accept the printing press. Companies are convinced AI is the answer to everything, and programmers? Well, they’re just expensive, opinionated, and worst of all, human. But here’s the thing – if you think cutting programmers in favour of AI is a genius move, you might want to remember the last time a company fired all its engineers: it ended in lawsuits, product failures, and a desperate rehiring spree. But sure, go ahead. Lay them off. You’ll regret it faster than you can say ‘syntax error.’

Let’s break this down properly. Three things are about to happen, and none of them are good for companies that think AI will replace programmers:

Clinical trials evaluate the safety and efficacy of new medical interventions under controlled conditions. Late-phase trials are essential for regulatory approval and typically involve large patient populations across multiple geographically distributed sites. Given their complexity and cost, even modest delays can have significant financial implications: daily operational expenses can exceed $40,000, and potential revenue losses may reach $500,000 per day of delay [1]. Patient enrolment stands as the most significant bottleneck in clinical trial success, with 70-80% of trials failing to meet enrolment targets on time, necessitating protocol amendments, timeline extensions or additional sites that dramatically increase costs and delay innovative treatments from reaching patients [2,3]

Recognising the persistent challenges in clinical trial execution, three partners – GoTrial, Rewire, and Munich Re – have come together to pioneer a new, datadriven approach to de-risking clinical development:

  • At its core is GoTrial’s unique platform, which consolidates and harmonises over one million clinical trials. This data is not only unified and cleaned, but also enriched with regulatory, epidemiological, and site-level information – and made accessible through a suite of analytical tools, including benchmarking KPIs, predictive models, and GenAI capabilities that extract structure and meaning from unstructured protocol documents.
  • Building on this foundation, Rewire contributes advanced analytics and deep operational expertise to surface predictive insights on trial feasibility, site performance, and protocol complexity. Where required, this is complemented by targeted expert input to interpret patterns and validate results.
  • Munich Re, in turn, adds a critical third layer: the ability to translate quantified operational risk into financial risk structures, enabling new forms of risk-sharing and exposure mitigation for sponsors.

Together, this approach represents a departure from conventional planning – shifting from experience-based assessments to a system of objective, evidence-led foresight that allows sponsors to design and manage trials with greater precision, accountability, and confidence than was previously possible. Because the approach draws on a complete set of global clinical study data, it removes the lingering uncertainty of ‘what might we have missed?’

This new approach to clinical development has begun to complement established trial operations by embedding deep, data-driven intelligence earlier in the process. Rather than focusing solely on the execution phase, this method brings structured analysis and predictive modelling into the design and planning stages, enabling sponsors to challenge assumptions, simulate scenarios, and anticipate risks before trials begin. Unlike traditional feasibility practices that often rely on local knowledge or prior experience, this approach leverages global insights at scale and transforms risk mitigation from a reactive task into a proactive, quantifiable discipline. In doing so, it opens up new opportunities for smarter planning – and even for risk-sharing models that were previously out of reach.

CHART 1.1:
Comparison of site-level recruitment profiles of two clinical studies

 width=

 

 

 

 

 

 

 

 

 

 

Site-level recruitment potential forecast for two clinical studies. Each axis represents an individual clinic, with the outer (red) boundary indicating theoretical site capacity and the inner (blue) area showing predicted actual enrolment performance. While Study A demonstrates alignment across most sites, Study B shows a significant performance gap across the majority of clinics – signalling elevated recruitment risk and the need for strategic site or protocol adjustments.

CHART 1.2

Site-level forecasting enables bottom-up recruitment prediction

 

 width=

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This chart demonstrates how a data-driven approach generates a site- and indication-specific forecast of patient enrolment. By integrating multiple data layers, the model estimates recruitment potential for each site and aggregates these predictions into the cumulative enrolment trajectory (dark blue curve) with its confidence range (light blue band).

The site colour coding provides immediate insight: Red indicates sites with elevated risk, Grey marks underperforming sites, Blue highlights sites performing well, and dashed blue outlines identify available high-performing sites recommended for inclusion.

In this example, rising competition in Year 2 coincides with declining recruitment performance, allowing early intervention before significant delays occur.

* The bottom row highlights the eight data dimensions feeding the model:

  1. Historic Enrolment Benchmark: enrolment performance distributions from analogous historical trials.
  2. Site Experience & Capability: therapeutic expertise, infrastructure readiness, and compliance history.
  3. Site-Specific Recrui tment Profiles: site track records including past recruitment rates and activation timelines.
  4. Competitive Trial Activity: ongoing and planned studies competing for patients and site resources.
  5. Population & Demographics: density, age, and local factors influencing patient availability.
  6. Disease Prevalence & Epidemiology: regional and indication-specific prevalence estimates.
  7. Regulatory & Operational Environment: country-level approval timelines and historical startup delays
  8. Protocol Complexity Factors: elements such as patient burden, procedure intensity, and inclusion/ exclusion complexity.

These eight data sources enable objective, granular, and dynamic predictions, transforming risk into actionable foresight.

1 | Data-driven enrolment forecasting and site selection Trial sponsors face a paradox: in an effort to reduce risk, they often fall back on familiar practices that unintentionally increase it. Frequently, sites are selected based on self-reported feasibility questionnaires [4] , prior working relationships or contracting convenience – leading to concentration at a small number of well-known institutions. For instance, leading academic centres may be running over 50 trials simultaneously, with predictable strain on patient access and investigator attention [2] . In other cases, top-tier hospitals are chosen despite lacking specific experience in the therapeutic area, as has been observed in complex oncology studies such as lung cancer trials [5]

To further manage perceived risk, protocols increasingly include a larger number of endpoints [6] and sites are selected based on historical performance – often limited to the sponsor’s or CRO’s own portfolio. While these instincts are understandable, they can overlook broader indicators of feasibility and miss higher-potential sites outside the known network.

The data-driven approach rethinks this process from the ground up. Using global clinical trial data, population density, disease prevalence, historical site performance, and real-time competition signals, it builds a bottom-up forecast of enrolment potential –modelled at the site and indication level. Rather than asking sites what they think they can recruit, this method estimates what they are likely to recruit, based on data from thousands of analogous studies and the regional context.

Two advances have made this possible at scale. First, large language models specialised in biomedical contexts now enable structured interpretation of eligibility criteria, protocol complexity, and therapeutic nuance – even when embedded in unstructured texts [7, 8] . Second, the GoTrial platform have aggregated and enriched clinical trial data globally, providing the statistical backbone needed to train predictive models across diverse study designs and settings.

The result is a site-level recruitment forecast that not only estimates expected enrolment at each site, but also flags geographic or operational risks – such as overlapping trials in the same indication or regulatory zone. These insights can be used not only to validate the current site plan, but also to recommend highpotential sites not yet part of the study. Sponsors can optimise their site portfolio for both efficiency and geographic reach – with a clear, data-backed view of how each region contributes to overall enrolment targets.

In one recent case, this methodology was applied to a late-phase paediatric asthma study. While the initial site plan appeared comprehensive, the eligibility criteria were significantly more restrictive than in analogous trials. Adjustments to both the site mix and protocol, guided by historical benchmarks, resulted in a more realistic and actionable enrolment strategy –increasing confidence without delaying study initiation.

CHART 2

Recruitment competition significantly reduces patient enrolment

 width=

 

 

 

 

 

 

 

 

 

 

 

 

An analysis of 9,336 global breast cancer trials shows a strong inverse correlation between site-level competition and actual enrolment. When no competing trials were present at a site, enrolment reached 70% of the planned target. However, as the number of concurrent studies increased, performance dropped sharply, falling to just 40% with more than seven overlapping trials. These findings highlight the importance of systematic competition analysis using global trial registry data to support reliable enrolment forecasting and informed site selection.

2| Quantifying and navigating recruitment competition

Even when eligibility and site selection are welloptimised, many clinical trials encounter recruitment delays due to an invisible force: competition from other studies, creating priority conflicts, depleting local patient pools, and stretching investigator capacity [5]

This bottleneck is especially pronounced in speciality care, where a small number of expert centres are tasked with enrolling across many overlapping protocols. Sponsors typically assess feasibility based on a snapshot of current activity, but overlook dynamic shifts in the trial landscape – including newly registered studies in the same indication, region, or at the same sites.

Compounding this, many sponsors and CROs rely heavily on established site networks, which can place investigators under significant strain. For example, leading U.S. cancer centres are involved in a median of 56 concurrent trials – creating priority conflicts, depleting local patient pools, and stretching investigator capacity [5]. The new data-driven approach addresses this by systematically and repeatedly scanning for competing studies – not only those already enrolling, but also those expected to launch in the near or mid-term. Using structured registry data enriched with GenAI-based interpretation of unstructured protocol texts, this method identifies overlapping inclusion criteria, geographic proximity, and shared therapeutic areas. It can quantify, for each site or region, the intensity of trial activity that could affect patient availability and investigator focus.

In a recent phase 3 trial feasibility analysis, over 80% of planned sites were found to be involved in multiple other studies with similar patient populations and overlapping timelines. This raised the risk of slower recruitment and site fatigue. Based on this insight, both site mix and enrolment projections required adjustment – avoiding high-competition clusters and increasing geographic diversity.

When analysed at scale, the impact of recruitment competition becomes measurable: in a study of over 9,000 global breast cancer trials since 2000, sites involved in more than seven concurrent studies experienced twice the enrolment shortfall compared to those with little or no competition. By treating recruitment competition as a measurable input – not a post hoc explanation – sponsors can preempt resource bottlenecks, stagger site activation, and gain a more accurate picture of the real operational landscape before the first patient is ever screened.

CHART 3

Global prevalence of trial risk factors

 width=

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3 | De-risking clinical trials with data at scale

While operational excellence remains critical to clinical trial success, many in the industry are now asking a more foundational question: how can we measure trial risk in a consistent, evidence-based way – before the study even begins? Until recently, such assessments relied heavily on expert judgment and anecdotal experience. But with the availability of large-scale clinical study datasets, modern analytics and GenAI, that is starting to change.

Drawing on a uniquely large and structured clinical trial dataset – spanning over 900,000 sites across more than 200 countries, and involving upwards of 900 million participants – it is possible to systematically model risk across critical trial design parameters. These models rely on empirical data from operationally and medically comparable studies, matched by indication, phase, geography, and protocol characteristics. Rather than relying on anecdote or institutional memory, this approach builds grounded expectations for key outcomes such as enrolment performance, study duration, and operational complexity at both the site and portfolio level.

What sets this new generation of risk analytics apart is its ability to go beyond traditional structured fields. Using advanced techniques, including natural language processing, even unstructured protocol text can be mined to extract risk-relevant features: from the procedural burden on patients to the complexity of eligibility criteria. These dimensions, long considered subjective or hard to compare, can now be evaluated and benchmarked against thousands of historical studies.

Furthermore, data-driven risk quantification not only highlights elevated risk factors but often points directly to actionable mitigation strategies.

A wide range of clinical trial risk factors – more than 30 in total – can now be quantified using datadriven benchmarks. Among them, several high-impact dimensions are especially relevant during trial planning and design.

Examples include:

Recruitment Ambition Risk

This metric evaluates how aggressively a study is targeting patient enrolment by benchmarking planned enrolment figures against historical norms for similar trials –matched by indication, phase, and design. When projected enrolment significantly exceeds the typical range, it signals a higher likelihood of recruitment delays or underperformance. The risk is quantified by analysing distributions of enrolment outcomes in comparable studies and flagging plans that deviate substantially from established patterns.

Timeline Feasibility Risk

This factor assesses whether the planned trial duration is realistic by comparing it to timelines observed in similar historical studies. Trials with unusually compressed schedules – particularly in complex therapeutic areas – face a greater risk of delays, protocol amendments, or extensions. To quantify this risk, planned durations are positioned within the distribution of actual timelines from comparable trials. Timelines falling below typical benchmarks are flagged as high-risk.

Operational Footprint Risk

This metric evaluates whether the number of planned sites is aligned with the trial’s scope and complexity. Excessively large site networks can introduce coordination challenges, increase onboarding and training burden, and compromise data consistency. Risk is quantified by benchmarking the proposed site count against historical distributions from comparable studies. Site numbers that exceed typical ranges are flagged as indicators of elevated operational complexity.

Geographic Complexity Risk

This factor assesses the number of countries involved in a trial relative to historical benchmarks. While a broader geographic reach can expand recruitment, it also increases exposure to regulatory variability, uneven site activation, and operational fragmentation. As the number of countries exceeds typical thresholds, the likelihood of coordination and compliance challenges rises accordingly.

Design Complexity Risk

This metric assesses the number of intervention arms in a trial relative to historical benchmarks from similar studies. Multi-arm designs place greater operational demands on site staff, increasing the need for training, coordination, and oversight. Trials with more arms than typically observed are associated with a higher risk due to the added logistical and compliance complexity.

The ability to quantify trial design risk across multiple dimensions marks a shift from intuition to evidence. It enables more grounded, objective decision-making in the early phases of trial planning – when the opportunity to prevent costly challenges is greatest.

And while no two trials are the same, they are no longer incomparable. With enough data, even the most complex trial designs can be seen through the lens of experience – not just from one company or one portfolio, but from the global record of clinical research. This provides a robust foundation for more informed risk management and consistently improved trial outcomes.

To make this new approach accessible and repeatable, the three partner companies have formalised it into a standardised product: Trial Success Assurance. Built on GoTrial’s global clinical data platform, enhanced by Rewire’s analytics and modelling capabilities, and supported by Munich Re’s risk transfer expertise, the product allows sponsors to apply this methodology in a structured, modular way. It is currently being used to support study design evaluation, site strategy optimisation, and data-driven feasibility planning. For more information, inquiries can be directed to [email protected].

As clinical development continues to grow in complexity, the ability to plan with precision – rather than react under pressure – is becoming essential. By leveraging comprehensive data, modern analytics, and collaborative expertise, this new approach offers a way to bring greater objectivity, foresight, and resilience into trial planning. It doesn’t replace the need for operational excellence – it strengthens it, by ensuring that trials are set up to succeed from the very beginning. In an industry where each decision carries high stakes, the ability to move from judgment to evidence is not just an advantage – it’s a necessary evolution.

INSURANCE FOR ENROLMENT SHORTFALLS

As part of the broader collaboration, Munich Re is developing a novel insurance solution to address one of the most persistent challenges in clinical trials: recruitment shortfalls.

Informed by the data-driven risk assessments generated through the Trial Success Assurance approach, this insurance product provides financial protection to sponsors if patient enrolment falls significantly short of plan.

The coverage is tailored based on quantitative risk indicators – such as protocol complexity, site saturation, and competitive trial activity – allowing for a data-grounded underwriting process. This marks a significant first step toward integrating risk-transfer mechanisms into clinical development, offering sponsors not only greater foresight but also a financial safety net when navigating complex or high-risk trials. If actual enrolment falls below a forecast generated through the Trial Success Assurance (TSA) approach, the policy provides financial compensation for additional costs – such as site reactivation or extended trial duration.

Who it’s for:

Sponsors with tight timelines, complex protocols, or a requirement for budget certainty.

Key features:

  • Trigger: Enrolment falls short of TSApredicted baseline
  • Coverage: Additional costs such as recruitment costs
  • Limits: $3M – $50M

This marks one of the first offerings to apply actuarial risk-transfer models to clinical trial operations – enabling smarter funding, planning, and execution decisions.

REFERENCES

[1] Getz K. How much does a day of delay in a clinical trial really cost? Appl Clin Trials. 2024 Jun 6;33(6).

[2] Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: a review. Contemp Clin Trials Commun. 2018 Aug 7;11.

[3] Bower P et al. Improving recruitment to health research in primary care. Fam Pract. 2009 Oct;26(5). doi:10.1093/fampra/cmp037

[4] Hurtado-Chong A et al. Improving site selection in clinical studies: a standardised, objective, multistep method and first experience results. BMJ Open. 2017 Jul 12;7(7):e014796. doi:10.1136/bmjopen-2016-014796

[5] Phesi. 2024 analysis of oncology clinical trial investigator sites [Internet]. Available from: www.phesi.com/news/global-oncology-analysis/ Accessed 2025 Jul 4.

[6] Markey N et al. Clinical trials are becoming more complex: a machine learning analysis of data from over 16,000 trials. Sci Rep. 2024 Feb 12;14(1):3514. doi:10.1038/s41598-024-53211-z

[7] Lee J et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi:10.1093/bioinformatics/btz682

[8] Jin Q et al. Matching patients to clinical trials with large language models. Nat Commun. 2024 Nov 18;15(1):9074. doi:10.1038/s41467-024-53081-z

Back to blogs
Share this:
© Data Science Talent Ltd, 2026. All Rights Reserved.