Cover Story

Digital twins become the new battleground for CDMOs

CDMOs are now gaining a competitive advantage by converting manufacturing data into reliable, higher-yield operations using interpretable, knowledge-graph digital twins. By Bernard Banga

Main photo by metamorworks/iStock / Getty Images Plus via Getty Images
Background video supplied by SweetBunFactory/Creatas Video+ / Getty Images Plus via Getty Images

As CDMOs expand into complex biopharmaceuticals, gene therapy and advanced product sectors, their competitive position is shifting. Historically, sponsors selected partners largely based on installed capacity, geographic footprint and cost per litre.  

Today, differentiation increasingly depends on how effectively a manufacturer can convert operational data into reproducible, high-yield processes across development, technology transfer and commercial production.

From manufacturing capacity to process intelligence

Knowledge-graph digital twins are emerging as a key technology in this transition. Unlike conventional analytics platforms, they link sensors, process models and batch outcomes within an explicitly structured data framework which quality teams and regulators can interrogate.  

In practice, this approach enables CDMOs to evolve from providers of outsourced capacity into partners delivering measurable process insight across distributed manufacturing networks. 

The strategic context is clear. Demand for outsourced biopharmaceuticals manufacturing continues to expand rapidly. Market analysis suggests that global biopharmaceuticals CDMO revenues could exceed $55.1bn by 2035, implying compound annual growth of nearly 9%. Separate estimates indicate that the biopharmaceuticals CDMO segment alone may reach $38.2bn by 2031, up from $27.1bn in 2026, representing a CAGR slightly over 7%.

Demand for outsourced biopharmaceuticals manufacturing continues to expand rapidly.

Click to edit...

At the same time, high-value product sectors such as viral vectors and plasmid DNA are expanding even faster. The viral-vector manufacturing market, for example, is projected to grow from $6.68bn in 2025 to roughly $19.52bn by 2033, equivalent to a CAGR of 14.5%.

As manufacturing networks scale to meet this demand, data complexity is growing accordingly. In this environment, opaque “black-box” AI models increasingly struggle to meet Good Manufacturing Practice (GMP) expectations, while knowledge-graph digital twins offer a path towards interpretable and auditable analytics.

From early digital twins to the limits of black-box AI

Digital twins first appeared in biopharmaceutical manufacturing as relatively static models mirroring equipment or individual unit operations. Over the past decade, however, they have evolved into dynamic, data-driven representations of living bioprocesses. 

In upstream bioprocessing, modern twin architectures synchronise real-time sensor data with mechanistic and statistical models to simulate cell growth, scale-up behaviour and potential process disturbances before changes are implemented on the manufacturing floor. This two-way synchronisation between the physical system and its virtual counterpart enables manufacturers to monitor performance continuously, explore “what-if” scenarios and optimise operating conditions across development, technology transfer and commercial campaigns.

Yet the first generation of AI applied to these twins has often relied on deep-learning models whose internal logic remains largely opaque.

Click to edit...

Yet the first generation of AI applied to these twins has often relied on deep-learning models whose internal logic remains largely opaque. While such models can deliver strong predictive performance, they frequently provide limited visibility into which variables drive a given prediction, how robust the model remains under data drift, or how it behaves outside the training domain.

Regulatory expectations are evolving accordingly. In a 2025 communication addressing AI in drug and device trials, the US Food & Drug Administration (FDA) emphasised that models influencing decision-making must be transparent, well characterised and appropriately stress-tested, particularly when digital twins are used to generate simulated control arms or inform clinical trial design. Although these discussions initially focused on clinical applications, the same principles increasingly apply to manufacturing environments.

Models which cannot be explained, audited or challenged are more likely to encounter regulatory scrutiny. Industry forums, including technical seminars at INTERPHEX 2026 (the annual pharmaceutical and biotechnology manufacturing conference held from 21 to 23 April at New York’s Javits Center), are converging on hybrid architectures which combine mechanistic modelling with machine learning while making explicit the assumptions, inputs and outputs for each component of the model.

Knowledge graphs as architectural backbone

Knowledge-graph architectures address many of these challenges by structuring process knowledge as networks of explicitly defined entities and relationships. Within a knowledge-graph digital twin, batches, unit operations, process parameters, equipment assets, soft-sensor estimates, failure signatures and quality outcomes are represented as linked data. 

Technically, these relationships are expressed through Resource Description Framework (RDF) triples in the form subject–predicate–object. Once encoded in this structure, they can be queried using SPARQL (SPARQL Protocol and RDF Query Language), enabling engineers to pose diagnostic questions such as:

Which batches showing oxygen-uptake deviations also exhibited reduced final titre despite nominal control profiles?

Photo by S and V Design/iStock / Getty Images Plus via Getty Images

Research associated with the UK National Digital Twin Programme has demonstrated how knowledge graphs can enable interoperability between complex infrastructure systems by aligning data from multiple stakeholders around shared ontologies. Applied to biopharmaceutical manufacturing networks, the same principle allows data from different facilities, formats and analytical tools to be unified within a single, semantically consistent digital twin. 

Early industrial applications illustrate how these architectures translate into practice. Published case studies in bioprocess development describe digital-twin environments which shorten development timelines while improving the robustness of scale-up models. 

One example involves the integration of ambr micro-bioreactors, Raman spectroscopy and Sartorius’ BioPAT Spectro platform. By combining automated sampling, spectral analysis and data consolidation, this configuration can generate robust Raman models in less than half the time required by conventional bench-scale approaches. Within a knowledge-graph architecture, such models become nodes in a broader network where links between process parameters, failure signatures and batch performance remain explicitly traceable.

From Raman analytics to CDMO economics

Raman spectroscopy plays a pivotal role in many interpretable digital-twin architectures. Deployed through platforms such as BioPAT Spectro, it provides non-invasive, multi-analyte monitoring of cell cultures aligned with the principles of Process Analytical Technology (PAT). 

When combined with ambr 15 or ambr 250 mini-bioreactors, Raman instrumentation enables comprehensive design-of-experiments studies within a single campaign. Sampling, spectral analysis and data aggregation are automated, substantially reducing the time required to develop high-quality models for critical process variables. 

On top of this spectroscopic layer, CDMOs increasingly deploy soft sensors based on statistical and machine-learning approaches such as partial least squares regression (PLS), support vector regression (SVR) and gradient-boosting algorithms. These models estimate variables that are difficult to measure directly, such as product titre or specific metabolites, by combining compressed Raman spectra with standard online measurements.

Studies of model-based optimisation have reported measurable operational gains.

Click to edit...

Studies of model-based optimisation have reported measurable operational gains. In some pilot implementations, digital twins capable of running thousands of virtual experiments have delivered yield improvements of several percentage points and productivity gains of approximately 3-4 per cent compared with conventional experimental programmes.

While these improvements may appear modest individually, their economic implications are significant at commercial scale. By shifting process management from reactive troubleshooting to predictive optimisation, digital twins can reduce technology-transfer risk, lower batch-failure rates and strengthen GMP compliance. For sponsors, the value proposition therefore moves beyond manufacturing capacity alone.

Regulation, interpretable AI and multi-site resilience

Regulatory signals increasingly emphasise the importance of interpretability, data governance and human oversight in AI-enabled drug development and manufacturing. 

The FDA’s evolving framework for AI in clinical research highlights the need to document model assumptions, limitations and performance throughout their lifecycle, including when digital twins generate or interpret simulated datasets

Further clarity emerged in January 2026, when the FDA and the European Medicines Agency announced a joint initiative outlining principles for “good practice” in AI-enabled drug development. The guidance emphasises data quality, lifecycle monitoring, traceability and model transparency. 

Knowledge-graph digital twins align closely with these expectations. Because the architecture preserves explicit links between raw data, transformations, models and decisions, auditors can trace a recommendation back to its originating signal through a limited number of queries. 

For CDMOs operating multi-site networks across North America, Europe and Asia, this traceability also supports operational resilience. Sponsors increasingly expect partners to demonstrate how product and process knowledge is accumulated and transferred between facilities rather than rediscovered during each technology transfer. A knowledge-graph digital twin standardises the representation of batches, critical parameters, deviations and corrective actions, enabling rapid identification of recurring failure patterns, such as hyper-metabolic culture profiles, across sites.

Yield prediction accuracy

Black-box AI digital twins

70–80%

accuracy

for opaque models

Knowledge-graph digital twins

>90%

accuracy

for auditable models in PAT-enabled pilots

Technology-transfer duration

Black-box AI digital twins

6–12

months

to adapt and re-validate a process

Knowledge-graph digital twins

3-6

months

when process knowledge is structured within an operational digital twin

Value proposition for sponsors

Black-box AI digital twins

Offer

centred

on installed capacity and price per litre

Knowledge-graph digital twins

Offer

centred

on process intelligence, learning sites and transparent quality assurance

Source: https://www.towardshealthcare.com/insights/viral-vector-based-cell-and-gene-therapy-cdmo-market-sizing

Investment signals and the next decade

Investment trends in life-science data infrastructure reflect the growing strategic importance of digital twins. Industry analysis reports improved asset use, measurable productivity gains and significant cost reductions when digital-twin platforms are deployed across multi-site production networks. These systems also enable predictive maintenance strategies, earlier detection of equipment drift and reductions in material waste, contributing to both operational efficiency and sustainability objectives. 

Market projections reinforce the scale of the opportunity. By the mid-2030s, the biopharmaceuticals CDMO sector could more than double in size while viral-vector and plasmid-DNA manufacturing continue to expand rapidly

Within this landscape, knowledge-graph digital twins are likely to evolve from experimental initiatives into foundational infrastructure: a shared layer representing processes, data and decisions across facilities, across partnerships and across the product lifecycle. 

For CDMOs investing early in these architectures, combining spectroscopic PAT, soft sensors and explainable knowledge-graph models, the strategic objective is no longer simply to expand installed capacity. It is to build a robust competitive advantage based on demonstrable process intelligence and the trust of both regulators and sponsors.