Bettina Knapp studied bioinformatics and earned a doctorate in computer science from Heidelberg University. In 2015, she started work at Boehringer Ingelheim as a team leader in biostatistics in Biopharmaceutical Manufacturing, and later as a principal statistician. Since 2018, Bettina has been leading a laboratory in cell culture development, and in February 2024 she took over the role of Product Owner in a scrum team to develop Biobase.
In this post, Bettina Knapp gives us an insight into the creation of Biobase, an innovative web application which has transformed biopharmaceutical development and production at Boehringer Ingelheim. Biobase integrates biologics development and manufacturing data to create a high-quality FAIR data landscape. Benefits of the application include enhanced data quality and consistency, and seamless data visualisation and analysis:
Boehringer Ingelheim is a family-owned pharmaceutical company founded in 1885 in Ingelheim, Germany. We focus on human pharma and animal health, and have research and development as well as commercial manufacturing sites all over the world.
Biobase is a Boehringer Ingelheim internallydeveloped web application for use within biopharmaceutical development and production. It provides access to a mapped and harmonised dataset of biopharmaceutical development and manufacturing data. The application was developed in an agile scrum team and uses commercial software parts to run the pipeline and store the data, mainly a Boehringer Ingelheim environment, which runs with tools on the AWS (Amazon Web Services) cloud. The Biobase team consists of five developers, one scrum master, four subject matter experts (SMEs) and one product owner. The stakeholders and users of Biobase come from different departments and areas within Biopharma of Boehringer Ingelheim itself, and from other sources around the world.
The mission of Biobase is the integration of biologics development and manufacturing data to create a high-quality FAIR data landscape. Biobase aims to enable fast data and model-driven decision-making and digital innovation for biologicals development and manufacturing.
The training material for Biobase users consists of short walkthrough videos of around 5-10 minutes, documentation, and in-person training sessions given by the Biobase team.
Benefits of the Biobase approach include enhanced data quality and consistency, as well as a facilitated and automated data visualisation and analysis.
The motivation for the development in Biobase is sourced in the following use cases:
The value proposition of Biobase spans various aspects of the biopharmaceutical development process. Firstly, it accelerates pipeline impact, affecting each biopharmaceutical product from the start of development through to submission to health authorities, and even during the commercial production phase of a product. Specific use cases, especially at interfaces during transfers between different sites, have the most significant impact. In instances of troubleshooting, the ability to instantly access data is crucial, making Biobase integral to different concepts and necessary for fast drug development.
Secondly, Biobase contributes to cost efficiency and savings. Almost every workflow requires data for reporting, team discussions, and much more. In terms of operational excellence, Biobase ensures that all workflows use a single source of truth for data. The workflow requirements are designed and implemented by SMEs and key users, enhancing customer centricity.
From a security and compliance perspective, Biobase operates at a non-GxP (GxP = good practice quality guidelines and regulation within pharmaceutical and food industry) compliance level, with full technical implementation of data integrity measures. Data security and compliance are ensured by utilising Boehringer’s infrastructure and technology.
In the realm of sustainability, environmental, and social considerations, Biobase’s ability to find and analyse historical data avoids the need for repeating similar experiments, fostering a data-first approach. This also aids in shifting mindsets by leveraging synergies across different departments.
Finally, for business continuity, it is imperative to establish data roles and responsibilities for data standards to ensure a continuous run mode for Biobase. As more and more workflows rely on Biobase data, the need for business continuity measures built into the data pipeline becomes increasingly important.
The biggest challenges within the development of Biobase are:
How to overcome these challenges is discussed in the following chapters.
The data coming from the very heterogeneous landscape of data sources is collected within Biobase. Several automated steps of data pre-processing and data harmonisation exist, as well as data linkage tools to ensure consistent data reporting and scientifically sound representation of data categories (see Figure 1).
The data can be shown as given in the source systems to reproduce the original presentation, or in a harmonised manner which allows an immediate reporting and analysis of data coming from different sources.
During the data processing, Biobase ensures documentation and traceability of the process by using the following tools and features:
Figure 1
Data Processing Pipeline. Raw Data collection from the different source systems is followed by a data preprocessing, data harmonisation, data linkage and results finally in a standardised and consistent set of data stored in the Biobase master data.
Technical tools:
Procedures:
Biobase employs a data lineage interactive notebook to represent the flow of data, simplifying the understanding of complex data journeys (see Figure 2). Data quality checks are performed during harmonisation and then reported in so-called data drift reports. In more detail, the basic principle of ensuring data lineage is to apply a unique hash value to each ingested raw data point that will always be kept throughout the data journey. As a result, it is technically possible to trace each data value in the results tables and graphs back to its original raw data point that was originally ingested.
Figure
Data journey. Raw data tables are transferred to intermediate tables where rigorous data validation checks and quality assessments are performed and reported (data drift reports). Final tables can be tracked back to the source system via hash values.
To overcome the challenge of having different vocabulary within biologicals development, Biobase harmonises the data coming from different data sources. Biobase uses a harmonisation tool which provides a clear and complete view of mapping rules defined by the SMEs, further enhancing transparency and traceability throughout the data transformation process. The developers of Biobase do have a specific account type that allows them to influence the harmonisation rules for the respective data pool they are working on. The release and the changes are documented. The users of the Biobase productive system do not have access to the harmonisation rules as such. The standard user cannot change the data pipeline and cannot change any data in the source systems. The Biobase team does not change any raw data value as the data is ingested from a data warehouse only. An audit trail records data transformation, source details, and any changes made during integration. Version control is implemented to track modifications and ensure transparency.
With the se measures, the segregation of duties is assured (see Table 1) and the workflow of the data harmonisation is separated into the two Biobase environments, i.e. development (Dev) and productive (Prod) environment:
Biobase Dev Environment – Harmonisation rules are settled, defined and verified Biobase Dev Environment – Data testing by SMEs Biobase Prod Environment – Documented implementation and release
Table 1
Segregation of duties
Standard user – Read-only, no access to harmonisation rules
Biobase SME – Harmonisation and verification of harmonisation rule s
Developer – Generation of harmonisation rules in (Biobase Dev)
Administrator – Releases new features and harmonisation rules into (Biobase Prod)
The following data-sharing terms are given in Biobase:
Biobase adheres to the application of the diligence measures of ALCOA principles, in the source systems and in good scientific practice:
In Biobase, all data is processed, as there is no pre-selection of data occurring. Biobase resides in an IT-controlled environment with backup and lifetime concept, so the data in Biobase will be protected from any loss or copy-paste error which can arise in manual data processing.
Finally, the availability and shareability of Biobase analysis makes a peer-to-peer review of findings much more transparent, and one can always refer to the dataset used in the analysis directly on Biobase.
Digitalisation is not a different discipline (such as IT) but should be accessible to everyone on all hierarchical levels. It is well known that a datapoint without the necessary metadata is more or less useless in modelling approaches. Thus, not only the amount of data matters, but also the description of the data, to overcome the inconsistency problem which often occurs in data analysis. Biobase has proven in many use cases to be the solution to this problem: having good, documented, harmonised data is worth the investment! Besides the allocation of budget, efficient planning of resources and prioritisation of different tasks is most important to achieve cross-functional and global stakeholder management during the Biobase development. A constant recruitment of new users and use cases is essential to meet and align to the needs of the whole biologicals organisation.
Biobase is an internally developed web application by Boehringer Ingelheim that provides access to a harmonised dataset of biological development and manufacturing data. The application, developed by a team of developers, SMEs, a scrum master, and a product owner, aims to integrate biologics data to create a high-quality FAIR data landscape.
The application serves various departments within Biopharma of Boehringer Ingelheim worldwide, and offers benefits such as enhanced data quality, consistency, and credibility. Major use cases include data-sharing, reporting and visualisation, data-driven experimental planning, faster decision-making, troubleshooting, data transfer, submission and filing, and automated modelling.
Despite the heterogeneous landscape of data sources and the cultural concerns related to digitalisation, Biobase manages to combine different data sources through a series of automated data pre-processing, harmonisation, and linkage steps. The application ensures data lineage and transparency through robust methods and welldocumented specifications.
Biobase harmonises data to create a joint data ‘vocabulary’ and maintains segregation of duties to ensure data integrity. The data-sharing terms are designed to ensure data ownership, privacy, security, access, usage, retention, quality, and integrity.
The application adheres to the ALCOA principles, ensuring that each data point can be traced back to the original data entry, is legible, contemporaneous, original, and accurate. Biobase emphasises that digitalisation should be part of everyone’s work, not a separate discipline and that the value of data lies not only in its quantity but also in its description. With that, we are very pleased to pave the way for any data analysis tools such as AI or machine learning to be used by everyone within Boehringer Ingelheim Biopharma in an easy and user-friendly manner.