Improving Your Drug Discovery Workflow: Collaboration, Data Management, and Security

Posted by
Daniela Cintulova
on 10 01 2024

A long term Chemaxon client, a mid-size biopharmaceutical company, shared with us recently: “We desperately need a better solution for project tracking and information sharing because we still rely heavily on regular meetings, lengthy email exchange and our data is lying around on random slides, in unorganized spreadsheets and emails.”

In the world of small molecule drug discovery, efficient DMTA (Design-Make-Test-Analyze) cycle optimization is key. 


Researchers have long relied on static file systems to support their workflows. These include tools like MS PowerPoint which has been extensively used to document, share and present research findings and Excel spreadsheets to store chemical structures and respective biological assay results.


While a powerful software, it simply was not designed to manage the complex and dynamic datasets of pharmaceutical research. It has limitations when it comes to handling live data, providing complex chemical awareness or supporting effective cross-team collaboration. 

Lack of context and interactive live session support

One of the main questions to answer during the drug design process is: “Which compound should we make next?” You have to be time and resource efficient to come up with a prioritized set of compounds because the chemical space is huge - and it is getting bigger. According to Bellmann et al. the overlap of prominent chemical spaces like Enamine's REAL Space, WuXi's GalaXi Space, and Otava's CHEMriya Space comprises about only 76000 compounds. It is getting increasingly difficult to navigate through this space. In order to make a decision in which direction to move forward in the design process we have to be able to explain WHY we chose to move forward with a particular set of compounds. The reasons might seem apparent at the time of design of the compound but might become more and more difficult to track over time in the flood of novel data. A robust hypothesis-driven design tool to aid the prioritization and tracking of ideas is absolutely necessary. Although a data-driven approach initiated by rapid technological developments is getting into the forefront of drug discovery, hypothesis-driven approach is here to stay and striking the right balance between the two seems to be the way to maximize the chances of successfully addressing complex research questions.


The goal is to pursue short cycles of design, synthesis and biological feedback so that the hypothesis is validated or modified with each new set of data in an iterative manner.


Static file systems often lack the ability to provide comprehensive context to data, making it difficult to understand their implications and to draw meaningful conclusions. Regular team meetings have long been supported with the help of powerpoint presentations, and although they can convey the immediate message, they do not offer interactive support during live sessions. Imagine you are presenting new IC50 results to your team members. During the meeting a question arises: “How do these results compare to the other data set from a few months ago where the aromatic ring was substituted with a methyl group instead of a bromide?” Or “What was again the reason for prioritizing the synthesized batch of compounds from the methyl analogs?” Unless you happen to have the exact data on the back up slides, it might be difficult to know off the top of your head. When you regularly deal with live data, a static file system falls short of the requirements for such projects.  

Data fragmentation

This issue arises from the complex and collaborative nature of drug development, involving multiple stakeholders, research teams, and applications. A big pharma representative at the recent Lab of the Future conference mentioned during her talk that a medicinal chemist touches on average 20 applications a day, generating new data most of the time.


Lack of integration and standardization leads to unharmonized data isolated and scattered across multiple sources, slides, files and email, often in incompatible formats. Additionally, research institutions and contract research organizations (CROs) often maintain their own data repositories, leading to isolated data silos.


These silos inhibit efficient data sharing and collaboration, slowing down the research process. The inability of static legacy data management systems to be tightly integrated with other tools further aggravate data fragmentation and data quality, hindering decision-making process. Migrating data from these systems is costly and time-consuming. 

In the life science industry a lot is being currently discussed around FAIR (Findable, Accessible, Interoperable, Reusable) principles in data management. In order to fully harness the potential of artificial intelligence (AI) and machine learning to solve complex research questions, the  underlying training data has to be of high quality and consistency. Therefore efforts for FAIRification of data are getting more and more to the top of the priority lists of companies. Static systems cannot support these efforts as they struggle to satisfy the main principles of FAIR data management. 

Hindered interactivity and collaboration

Humans are most effective at processing data when it is displayed visually. However, PPT presentations are static, offering limited interactivity for data exploration and manipulation. One of static file systems’ most significant weaknesses relates to collaboration. Drug discovery requires collaboration between experts from diverse fields such as chemistry, biology, pharmacology and informatics. Each discipline may have its own specialized tools and databases, making it challenging to share data effectively. Imagine you want to answer the following question: “Which compounds from the halogen-derivatives that we have decided to synthesize have already been assigned to our synthetic collaborators?” To answer this question the old fashioned way, you would probably have to go to a spreadsheet to look for a column indicating which CRO the synthesis was assigned to. Additionally, you might have to check the email exchange to make sure that this was properly communicated and confirmed by the CRO in question. Successful synthesis of the selected compounds has to be yet again communicated via email accompanied by a pdf report generated from CRO’s own ELN. Ideally, data from this report will have to be transferred into an in-house database. Chemical information extraction from pdf documents is known to be a tricky endeavor, the bottleneck being mainly the optical image recognition of chemical structures which often yields unreliable results requiring manual intervention. Sharing spreadsheets and slide decks among team members or different groups often generates redundant files and leads to the notorious "versioning" problem, where users struggle to identify the most up-to-date file version. Although it's perfectly possible to create shared spreadsheets, maintaining them can be a complex task and may still give rise to versioning problems. Consequently, administrators have to invest extra time into managing file permissions or repeatedly reviewing spreadsheets to verify their accuracy. 

Non-effective workflows due to lack of chemical awareness

There is little doubt that chemists need their software to be chemically aware in order to create relationships between chemical structures or between chemical structures and additional metadata within a project, to perform exact or similarity searches across projects or to adjust a chemical structure in multiple locations simultaneously. These are not just nice-to-have features, they improve efficiency, prevent inconsistencies, transcription errors, generation of duplicates and endless cycles of copy-pasting. Unfortunately, static file systems often cannot handle chemical data by default and need additional extensions which can bring chemical awareness to Excel, PowerPoint or even Outlook. However, it can be effective only in fairly simple and straightforward research workflows where not much data is generated. Due to limited scalability potential, static file systems are unable to process large quantities of data, nor are they able to provide integration endpoints for other applications or plugins which creates a major limitation in modern workflows. Imagine you are in the process of designing a new compound and want to see in real time how replacing a pyridine ring with a tetrahydropyran will impact CNS MPO score, ideally without the need to jump between multiple applications. After you come up with your design you want to double check if anybody in our organization has already thought about swapping pyridine for a tetrahydropyran. And you want to make sure that the compound is not controlled in both your country and the country where CRO is located because you want to stay compliant with local legislations. In ideal case, a rich API and plugin interface would allow you to figure all these things out within one interface which static file systems can accommodate with great difficulty, if at all. 

Limited security and compliance

Pharmaceuticals are the 3rd costliest industry when it comes to data breaches. According to IBM's 2023 report, the average total cost of a data breach in the pharmaceutical industry was $4.82 million in 2023. 2023 Pharmaceutical Industry Cybersecurity Report compiled by RiskXchange identified  the two main critical security issues to be a) poor application security and configuration management and b) weak or missing encryption across platforms and applications, the former being responsible for almost 50% of security issues, the latter for approximately 30%.

Statics file systems often lack robust security features to effectively guard against cyber-attacks and data breaches, as well as options for granting restricted access and for setting up multi factor and multi-layered authentication. Password protection alone is often inadequate, and there's a risk of unauthorized access. Creating an audit trail to monitor who accessed or modified a file is often manual and prone to errors. A reliable application should be available round the clock with minimal downtime, always up to date with all upgrades and security patches, scalable with high performance and above all secure with robust and appropriate access control levels.


Drug discovery is no longer contained to a single premise, as companies often outsource parts of their activity to collaborators located all over the world.


Working with external collaborators however, brings its own challenges, in synchronizing in-house and external systems, ranging from project management to documentation or technology.  Sharing sensitive data through spreadsheets and emails can pose security risks if not properly managed. The key is to share only unnecessary information and keep all the other project-related data hidden by granting limited access. For example, you might only want your CRO to see the compounds you assigned to them without letting them know the scientific background behind them and not allowing them to access project-wide details.

Dynamic file system (Design Hub)

Static file system

Easily searchable, complex combined queries possible

Limited or no searchability

Real-time collaboration and interaction, reduced email exchange

No possibility for real-time collaboration

Consistent hierarchical organization of data

Random/No organization of data

Hypotheses that are automatically linked to the compound design process

Context of design process gets often lost 

Facilitated overview of the project  by use of  Kanban boards

Difficult for managers to keep an overview of the entire project

Audit friendly, capable of tracking desired information within seconds

Difficult to audit, often impossible to trace mistakes

External tool integration directly linked to the design process

Almost non-existent possibility for external tool integration

Single source of truth embedded into bespoke digital landscape

Data scattered over multiple spreadsheets and ppts - data fragmentation

Increased security - different access rights

Sharing data with co-workers via Excel or Google Docs can be very insecure

Generally shorter design cycle due to simplified decision-making

Generally longer design cycle

Always up to date (automatic)

Difficult to keep the files up to date (manual input necessary)

Automated version update

Versioning problems

Designed specifically to manage the data of pharmaceutical research projects

The software simply isn’t designed to manage the data of pharmaceutical research projects

Chemically aware

Lack of chemical structure awareness

Minimized risk of loss of data

Increased risk of loss of data

Easy consolidation of data

Manual and time consuming consolidation of data 

Readily accessible data to search/show during presentations

Need to click through multiple slides and entire ppt files

Rich/real-time visualization and integrated tools

Limited visualizations, need to constantly switch tools to visualize the content

Interactive support for live sessions

Static support for live sessions

Configurable and customizable

Limited configurability and no customizability

Can support some stages of data FAIRification efforts

Not compatible with FAIR data principles

Scalable

Limited scalability

Design Hub promotes collaborative working and information delivery

Design Hub is a team collaboration tool by Chemaxon for small molecule drug design within the DMTA cycle which connects scientific rationale with compound tracking and computational resources needed for rational design. It facilitates efficient compound design, synthesis and progression tracking, ensuring the success of drug development projects with external collaborators.

It enables teams to switch from powerpoint files to visually rich and chemically searchable hypotheses that are an integral part of the compound design process and can serve as interactive support for live sessions. The associated compounds are linked to their respective hypotheses at all times, helping researchers not to lose track of what guided the decision-making in the past. Project-wide views, kanban boards and notifications about change in compound status help managers and team members to easily guide compounds through synthesis challenges and to maintain an overall overview of the project’s progress.  Prioritization of which compounds to synthesize can be based on live property predictions, safety alerts or immediate feedback about patentability or building block availability in public databases. Rich plugin and API interface enables integrations with both external and internal data sources - registration systems, assay warehouses, compliance checkers, ELNs, modeling and docking tools.


Design Hub facilitates collaboration with CRO partners by offering shared collaboration space with granular role and project based access and makes it easy to communicate and share data while ensuring data security and intellectual property protection.


The platform is offered as a single tenant setup which is available using AWS infrastructure. The company and all its vendors are ISO certified, guaranteeing secure handling of your data.

Design Hub is thus ideal to serve as a single reliable source of truth embedded into the bespoke digital landscape of your company.