Certara and Chemaxon hosted a two-day scientific conference and user group meeting in Frankfurt 4-5th November 2025, focusing on the challenges and opportunities posed by drug discovery’s evolution from small molecules to including biologics in discovery, aka new modalities.
Experts from the biopharma industry and academia addressed multiple aspects of the informatics of biologics discovery, and Certara and Chemaxon outlined their product strategies to address new modalities and small molecules.
Across two days of presentations, the speakers agreed on several key conclusions:
The meeting opened with the broader transformation in discovery. Rob Aspbury (Certara) framed today’s therapeutic entities spanning small molecules, peptides, oligos, and complex conjugates, along with multiple tools that integrate AI and visualization into the Design–Make–Test–Analyse cycle. Certara and Chemaxon’s strategy reflects this reality, presenting plans to enhance D360 and Design Hub as well.
Adrian Stevens (Chemaxon) reinforced the trend: biologics are overtaking small molecules in R&D spend and sales. But with growth comes complexity and challenges to basic processes like entity novelty check and registration. Chemaxon plans to address the new drug modality continuum via its core science with applications built on top (drawing, calculations, standardisation, and search) to deliver compound analysis & design, compliance, and registration.
Key new modality requirements are Draw – Depict – Register – Search, and initial steps will move beyond basic peptides and amino acid monomers to draw/depict peptides, create a central monomer dictionary, and work on visualization. Compound Registration works on the scale of 100M compounds, handles peptides with up to 200 amino acids, and RNA/DNA with up to 70 bases, and this will be extended to sequences, HELM support, custom monomers, and eventually complex modalities. Similarly, biologics design and property calculation is a focus area for D360 development and integration in the DMTA cycle.
The second day focused on Certara’s strategy and platform evolution to support modality-agnostic discovery.
Model-informed discovery and biosimulation. Rob Aspbury (Certara) highlighted Certara’s modelling and biosimulation tools which enable model-informed drug discovery (MIDD) and reduce costs and timelines significantly. Simcyp offers physiologically based pharmacokinetic (PBPK) modelling (what the body does to the drug) while Certara IQ provides Quantitative Systems Pharmacology (QSP) modelling. Simcyp is widely used to support label claims and predict DDIs, and Certara IQ is successfully predicting better first-in-human dose levels. Rob described how data integration and predictive analytics in Certara’s MIDD platform can deliver certainty across drug development and cited examples in early feasibility assessment. He concluded by discussing a new data platform which will underpin software applications, data engines, predictive analytics tools, and proprietary data sets; to further power MIDD, reduce attrition, and unify workflow silos.
Aligned early discovery strategy. David Lowis (Certara) and Adrian Stevens (Chemaxon) outlined plans for improved cross-modality support, consistent substance representations, tighter access to Certara predictive technologies, and convergence into a single web-based design/analysis application with a unified chemistry model. In the near term the new Marvin drawing package will be used in Design Hub (DH) and D360. DH will improve data synchronization with D360. D360 will add antibody and ADC (Antibody Drug Conjugate) handling, peptide design, dynamic data series, and integration of secondary intelligence.
Compound Registration as a data quality gateway. Roland Knispel (Chemaxon) provided an update on the Compound Registration application, with significant performance improvements (bulk registration 20–50×; downstream availability 5×; search 2–4×) at ~100M compound scale, plus new features such as secondary IDs, bulk updates, permanent deletion (for CRO use cases), group registration, and query recall. New Marvin will become the structure editor from Q1 2026. Moving on to biologics, the initial focus will be on natural and chemically modified peptides up to 200 amino acids, monomer library handling, HELM editing support on the canvas, peptide handling, and 2D clean. BILN and nucleic acid handling will follow, with longer term targets around complex conjugate registration.
Early feasibility assessment (EFA) as standard practice. Piet van der Graaf (Certara) had the final talk of the conference on the value of early feasibility assessment (EFA). The presentation provided an excellent bookend to Rob Aspbury’s opening talk. QSP offers mechanistic biosimulation, reusable platform models, and can create virtual patients. QSP has increasing regulatory adoption and supports New Approach Methods to reduce animal testing. Preclinical biosimulation can triple the rate of clinical success. Piet illustrated the successful application of EFA in case studies on fusion proteins, PROTACs, ADCs, and bispecific mAbs. He concluded by encouraging the use of EFA in every discovery program, with a reminder that Certara has the largest available library of validated EFA/QSP platform models.
Scientific talks explored how design rules change when moving beyond classical small molecules.
Macrocyclic peptides (MCPs). John G. Cumming (Roche) detailed how MCPs as bridge modalities offer a new way to reach previously ‘undruggable’ targets such as intracellular protein-protein interactions (PPI). MCP lead optimisation is challenging, and traditional small molecule design and SAR tools cannot handle their relative complexity, so novel approaches are needed to visualize and analyse features such as aromatic vs. aliphatic sidechains, total number of N-alkylations, lipophilicity, polar surface area (PSA).
PROTAC physicochemical optimization. Filip Miljković (AstraZeneca) adapted ChromLogD (RP-HPLC) and EPSA (SFC) assays and built random forest models using fingerprints and 1D/2D descriptors. Studies with PROTACs (a) showed that the EPSA model is predictive and transferable from a small-molecule warhead to its fully grown PROTAC and can be used to prioritize compounds with better oral bioavailability; and (b) indicated the optimum Shake Flask LogD, ChromLogD, and EPSA values plus Ro5 descriptors for bioavailability for use in making PROTAC synthesis decisions.
Targeted covalent inhibitors (TCIs). Matthias Gehringer (University of Tübingen) surveyed evolving warheads: beyond acrylamides for cysteine, heteroaryl acceptors, SnAr, dihaloacetamides, sulfamate acetamides. Alternative target amino acids beyond cysteine include lysine and tyrosine which can be targeted using SuFEx [Sulfur(VI)-Fluoride Exchange] chemistry with fluorosulfates and sulfonyl fluorides. Aspartic acid and glutamic acid have also been targeted with reliable results. Matthias stressed the need to expand the scope of ligandable cysteines beyond acrylamides, and to gain a broader understanding of the metabolic transformations of the many new warheads.
As structures become more intricate, shared standards enable unambiguous depiction, storage, and interoperability.
HELM for complex polymers and conjugates. Claire Bellamy (Pfizer) reviewed how HELM 1 encodes polymers via monomers, chains, and connection rules, while HELM 2 adds ambiguity-handling essential for ADCs and other conjugates. Outstanding gaps remain for lipids and glycans, with an IUPAC effort to address them. The HELM standard is a result of the joint effort of 25 biopharma, software vendors, regulatory agencies, and content providers who have contributed to its development.
BILN for peptides at scale. Thomas Fox (Boehringer Ingelheim) presented BILN (BI Line Notation)—a human-readable, extensible representation designed for peptides. It is paired with a Python/RDKit library which provides a hub to facilitate communication with lab scientists; interfaces with ELNs, registration, and peptide synthesis robots; and data management and peptide design.
Interoperability and explainability. Farah Egby (Pistoia Alliance) argued that instrument data, metadata, process/workflow data, context, reasons, and decisions are untapped data because they sit in heterogeneous, non-machine-readable formats. Standards should remain accessible and maintainable, and infrastructure must be decoupled from domain ontologies. Crucially, rather than seeming to emerge from opaque “black boxes,” informatics outputs need to be explainable so scientists can retain agency and trust decisions.
Next-generation platforms should be modality-agnostic, user-centric, and AI-enabled.
Design principles for modern discovery platforms. Drawing on experience at BenevolentAI and HealX, Nathan Brown (Independent Consultant) laid out key platform principles: flexible data models; user-centric interfaces; AI integration within the scientist’s workflow; and real-time collaboration across chemistry, biology, and informatics. These platforms should democratize access at the point of need to allow rapid hypothesis refinement to “stop great work on bad ideas sooner.”
Unlocking DEL data with automated pharmacophore families. Miklós Fehér (XChem) described an automated approach to generate and verify novel pharmacophore structures derived from DEL (DNA-encoded library) screening for use with SAR, virtual screening, and developing QSAR or ML models. Positive SAR is identified from the screened libraries and Thompson sampling is used to capture negative SAR, condensing the data into families. Miklós concluded with a series of case studies that validated this novel approach by confirming that DEL-derived pharmacophores show high selectivity and enrichment and identify compounds that conventional similarity search would miss.
AI throughout the DMTA loop. Gian Marco Ghiandoni (AstraZeneca) showcased the Predictive Insight Platform (PIP). Starting from 50 models in June 2021, PIP now has >400 models and processes ca. 500M structures per month for small-molecule and new modality property prediction. PIP interacts seamlessly with D360, so predictions and histories are available directly within analysis environments, thus enabling de novo ideation through lead optimisation with traceable, versioned insights.
User case studies highlighted practical wins (and hurdles) when operating across modalities.
AstraZeneca discovery analysis. William McCoull (AstraZeneca) described dynamic visualisation in D360, plug-ins, and data exchange with screening platforms. Progress and challenges were discussed in relation to new modalities, where even descriptors such as substance type, structure class, and modality need careful definition. There is ongoing work for addressing peptide design in D360 where HELM 2D sequences can now be analysed. ADCs are more problematic, with registration and uniqueness checking complicated by ADCs’ multiple components.
Modernizing registration and collaboration. Xin Zhang (Cellarity) detailed a modernization effort: cleaning data, standardizing vocabularies, fixing duplicate checks, and implementing robust compound registration. Besides the registration related improvements, teams now visualize SAR, collaborate with CROs, track design ideas, and employ predictive capabilities. Xin also emphasized how problems of unreliable duplicate checking, vendor SDfile issues, non-standard vocabularies, and CRO communications can be addressed via data clean-up and implementing Chemaxon Compound Registration. Next steps include getting assay data in Design Hub, incorporating Generative Chemistry AI, and adding further plug-in support.
NBE/ADC portals and sequence-aware queries. Sameh Eid (Merck) described how Merck has addressed the challenges of orchestrating multiple data streams for New Biological Entities (NBEs), ADCs, and bioassay data and presenting entity definitions and biological results in NBE Data and ADC Portals involving D360. A separate D360 ADC portal addresses unique needs caused by the complexity of the molecules, their ancestry requirements, and the need to handle small molecule assays for conjugated molecules and MBE assays for the whole molecule and antibody.
rAAV informatics for gene therapy. Shijun Yu (Roche) outlined Roche Informatics’ solution to provide a data foundation for intelligence, analytics, and design with D360 playing a key role. This has streamlined rAAV molecule assay results review, facilitated cross-project learning, enhanced manufacturability and developability knowledge, supplied overviews of batch quality and produced rAAV batches across departments, and provided insights of critical quality attributes and correlations between specific cellular readouts of the tested AAV variants. Next steps include complex dashboards and ML/AI models in capsid and payload design.
Compliance at scale. Ákos Papp (Chemaxon) and Michael Hofmann (Merck) demonstrated how Chemaxon’s Compliance Checker (CC) and cHemTS are integrated into Merck’s compound management systems. CC screens chemical structures against legislation and identifies controlled substances during design and synthesis, storage, and shipping. It is complemented by cHemTS which is a flexible, structure-based tool that assigns Harmonized Tariff System codes for international shipping and classifies substances for global trade. Merck in Darmstadt sends >1100 compound shipments per year to > 30 countries so legal and customs compliance is crucial.
The conference underscored that the next era of discovery will progress in several key areas, including standardizing representation for sequences and conjugates; making AI predictions explainable and accessible inside design workflows; treating registration as a quality gate tied to downstream analytics; and adopting biosimulation early to de-risk before costly synthesis and in vivo work.
The goal is not replacing scientists but amplifying them with interoperable standards, transparent AI, and a unified platform where small molecules and new modalities can be designed, registered, and analysed with equal confidence.