We need to talk about data sharing. While phrasing the obstacles in this area, the data sharing section in last year’s Gartner Hype Cycle for Healthcare Data, Analytics and AI document starts with the following sentences: “Data sharing is difficult. Even with the best of standards and new capabilities, it requires a complex architecture and set of governance policies, guidelines and guardrails that are challenging for healthcare organizations to manage.” It is not only the technical aspects that have to be resolved. That is probably the “easy part.” Processes, expectations, defined KPIs, and even culture and motivation around sharing results and data need to be adjusted, so that we can finally see a better adoption of emerging standards and technologies – otherwise we will continue building our data silos.
The pharmaceutical industry faces growing challenges in managing the ever-increasing R&D cost per drug compared to the flat expected revenue trend of these new drugs. The efficiency increase in “traditional small molecule” research, but also the establishment of best practices in the area of new entity types including various flavors of biologics, require a more fluent communication among departments, research groups, and companies. Introduction of these new entity types does further increase the complexity of processes and organizational setup. All in all, we are facing an efficiency problem in the industry in combination with another layer of complexity due to the introduction of some new biologics modality types.
To a certain extent it is natural, even prudent, to have well-separated and well-defined functional departments. On the other hand, when the complexity of the new projects continuously results in active collaboration between groups, steady data flow must be guaranteed.
Mitigation of this efficiency problem typically results in increased outsourcing, academic collaborations, CRO involvement, and a growing number of acquisitions. All these solutions create even more problems with respect to data sharing, data flow, and data availability, all of which have severe security and access control consequences. The end result: data sharing and the ability to connect data silos becomes one of the critical bottlenecks in this field.
For a brief overview, let’s check the current situation of the market in a few numbers.
At Chemaxon, we deeply care about the status and the future of cheminformatics and chemical data management. To this end, the generic problems captured above can be reduced to the following main topics.
It is a triviality, but data is key
As a result of frequent acquisitions, well-executed migration of data from different sources is key to having access to all the IP within a company. Migrations need to support reusability of legacy data and the ability to access data after acquisitions. Data has to be standardized in a way that different data sets become comparable; the source of the data, original identifiers are retained for future reusability, and everything complies with the same business rules – all these considerations are certainly part of a larger set of FAIR principles.
Connecting systems
As you connect different systems between departments and research groups, speeding up the access of data sources becomes essential. These activities are typically accomplished through the integration of different systems through their well-defined APIs. The standardization of the APIs would certainly decrease the required effort to integrate the otherwise disconnected systems, but API standards are rare, and adoption of such standards is slow.
Format conversion, standardization
Still connected to the interoperability of these systems, efficient collaboration heavily depends on reliable data format conversion, or in the best case scenario, data format standardization. All areas of R&D have their own proprietary formats, depending on the vendor of the source systems, instruments, and software tools. A number of pre-competitive initiatives deal with open standards, but as in the case of API standards, adoption struggles here as well. Typically the proprietary format that has the highest market share becomes the de-facto standard, but even that process can take many years, sometimes decades. Format conversions are essential, and we cannot afford the "human translators" with the increasing size of the exchanged data set.
Find and (re)use everything needed for efficient decision making
Federated search – finding data across multiple domains and systems – is a recurring keyword when we see discussions around efficient decision making. Making sure that one can easily find all relevant information can help us remove the manual tasks involved with compiling various reports for status update meetings. It also ensures that everyone can see the same data, the basis of the decisions about next steps.
How can we, the R&D IT community, contribute to the progress? We need to consider migrations, the reusability of legacy data sets (when useful), the standardization of data, and the adoption of FAIR data principles. Not just talk the talk, but also walk the walk, applying the FAIR principles that can make an impact on our overall efficiency. We need to learn how to balance between the architecture of an ideal world and an efficient implementation of the change right now. As a first step, I would like to invite you all to the ChemTalks event, organized by Chemaxon, where industry experts discuss how they use technology to bridge silos in early stage drug discovery.
References
- Gartner Hype Cycle for Healthcare Data, Analytics and AI, 2023 https://www.gartner.com/en/doc/788390-hype-cycle-for-healthcare-data-analytics-and-ai-2023
- KPMG Biopharma deal trends outlook for 2023 https://kpmg.com/kpmg-us/content/dam/kpmg/pdf/2023/biopharma-deal-trends-outlook.pdf
- CRO Industry Report, Contract Pharma, 06.19.24 https://www.contractpharma.com/issues/2024-06-01/view_features/cro-industry-report-645387/
- Citeline Pharma R&D Infographic https://www.citeline.com/-/media/citeline/resources/pdf/infographic_pharma-rd-2024.pdf
- Are reaction data FAIR… and what can we do with that? Gerd Blanke, UGM 2023 https://www.youtube.com/watch?v=8_iAiqmLaYg
- Improving Your Drug Discovery Workflow: Collaboration, Data Management, and Security, Daniela Cintulova, 2024-01-10 https://chemaxon.com/blog/improve-your-drug-discovery-workflow-collaboration-data-management-and-security
- FAIR Principles https://www.go-fair.org/fair-principles/
Did you find this blogpost relevant?
MSc in chemistry and computer sciences, PhD in chemistry on the field of theoretical mass spectrometry.
Certara Completes Acquisition of Chemaxon
The combined organization offers life sciences companies predictive biosimulation and scientific informatics capabilities, improving certainty in...
Certara to Acquire Chemaxon to Strengthen Drug Discovery Software Portfolio
We are excited to share the official announcement of Certara, our partner for over a decade, as they set to acquire Chemaxon.
Roadblocks of DMTA project success - and how to eliminate them
Discover how to navigate complex challenges in your DMTA projects - make informed compound decisions and streamline collaborative efforts in your drug design...