Data silos — an opinion

Posted by
Csaba Peltz
on 2024-09-02

Newsletter

2024-09-02 Reading time:

Data silos — an opinion

We need to talk about data sharing. While phrasing the obstacles in this area, the data sharing section in last year’s Gartner Hype Cycle for Healthcare Data, Analytics and AI document starts with the following sentences: “Data sharing is difficult. Even with the best of standards and new capabilities, it requires a complex architecture and set of governance policies, guidelines and guardrails that are challenging for healthcare organizations to manage.” It is not only the technical aspects that have to be resolved. That is probably the “easy part.” Processes, expectations, defined KPIs, and even culture and motivation around sharing results and data need to be adjusted, so that we can finally see a better adoption of emerging standards and technologies – otherwise we will continue building our data silos.

 

The pharmaceutical industry faces growing challenges in managing the ever-increasing R&D cost per drug compared to the flat expected revenue trend of these new drugs. The efficiency increase in “traditional small molecule” research, but also the establishment of best practices in the area of new entity types including various flavors of biologics, require a more fluent communication among departments, research groups, and companies. Introduction of these new entity types does further increase the complexity of processes and organizational setup. All in all, we are facing an efficiency problem in the industry in combination with another layer of complexity due to the introduction of some new biologics modality types.

 

To a certain extent it is natural, even prudent, to have well-separated and well-defined functional departments. On the other hand, when the complexity of the new projects continuously results in active collaboration between groups, steady data flow must be guaranteed.

 

Mitigation of this efficiency problem typically results in increased outsourcing, academic collaborations, CRO involvement, and a growing number of acquisitions. All these solutions create even more problems with respect to data sharing, data flow, and data availability, all of which have severe security and access control consequences. The end result: data sharing and the ability to connect data silos becomes one of the critical bottlenecks in this field.

 

For a brief overview, let’s check the current situation of the market in a few numbers.

 

 

At Chemaxon, we deeply care about the status and the future of cheminformatics and chemical data management. To this end, the generic problems captured above can be reduced to the following main topics.

 

It is a triviality, but data is key

 

As a result of frequent acquisitions, well-executed migration of data from different sources is key to having access to all the IP within a company. Migrations need to support reusability of legacy data and the ability to access data after acquisitions. Data has to be standardized in a way that different data sets become comparable; the source of the data, original identifiers are retained for future reusability, and everything complies with the same business rules – all these considerations are certainly part of a larger set of FAIR principles.

 

Connecting systems

 

As you connect different systems between departments and research groups, speeding up the access of data sources becomes essential. These activities are typically accomplished through the integration of different systems through their well-defined APIs. The standardization of the APIs would certainly decrease the required effort to integrate the otherwise disconnected systems, but API standards are rare, and adoption of such standards is slow.

 

Format conversion, standardization

 

Still connected to the interoperability of these systems, efficient collaboration heavily depends on reliable data format conversion, or in the best case scenario, data format standardization. All areas of R&D have their own proprietary formats, depending on the vendor of the source systems, instruments, and software tools. A number of pre-competitive initiatives deal with open standards, but as in the case of API standards, adoption struggles here as well. Typically the proprietary format that has the highest market share becomes the de-facto standard, but even that process can take many years, sometimes decades. Format conversions are essential, and we cannot afford the "human translators" with the increasing size of the exchanged data set.

 

Find and (re)use everything needed for efficient decision making

 

Federated search – finding data across multiple domains and systems – is a recurring keyword when we see discussions around efficient decision making. Making sure that one can easily find all relevant information can help us remove the manual tasks involved with compiling various reports for status update meetings. It also ensures that everyone can see the same data, the basis of the decisions about next steps.

 

How can we, the R&D IT community, contribute to the progress? We need to consider migrations, the reusability of legacy data sets (when useful), the standardization of data, and the adoption of FAIR data principles. Not just talk the talk, but also walk the walk, applying the FAIR principles that can make an impact on our overall efficiency. We need to learn how to balance between the architecture of an ideal world and an efficient implementation of the change right now. As a first step, I would like to invite you all to the ChemTalks event, organized by Chemaxon, where industry experts discuss how they use technology to bridge silos in early stage drug discovery.

 

ChemTalks

 

 

References



 


Did you find this blogpost relevant?

Not directly

Very much

Director of Chemistry
Spent 11 years in pharma R&D in the field of mass spectrometry and NMR spectroscopy. Joined one of the product development teams at Chemaxon in 2012. Worked there in different roles within the product organization, as a product owner/manager, then as product director responsible for portfolio level strategy, and recently with more focus on science and market trends as director of chemistry.
MSc in chemistry and computer sciences, PhD in chemistry on the field of theoretical mass spectrometry.

02 09 2024

Data silos — an opinion

Data is difficult to handle - increasingly so as for efficiency's sake processes dictate multiple handlers. Chemaxon's Csaba Peltz weighs in.

08 08 2024

Reactor in Large Library Workflows

Explore the evolution of billion-scale chemical libraries, driven by advancements in cloud computing, AI/ML, and automated labs.

19 07 2024

Tech debt in software ownership and SaaS

Technical debt is unavoidable. How do you mitigate associated risks?

 

We need to talk about data sharing. While phrasing the obstacles in this area, the data sharing section in last year’s Gartner Hype Cycle for Healthcare Data, Analytics and AI document starts with the following sentences: “Data sharing is difficult. Even with the best of standards and new capabilities, it requires a complex architecture and set of governance policies, guidelines and guardrails that are challenging for healthcare organizations to manage.” It is not only the technical aspects that have to be resolved. That is probably the “easy part.” Processes, expectations, defined KPIs, and even culture and motivation around sharing results and data need to be adjusted, so that we can finally see a better adoption of emerging standards and technologies – otherwise we will continue building our data silos.

 

The pharmaceutical industry faces growing challenges in managing the ever-increasing R&D cost per drug compared to the flat expected revenue trend of these new drugs. The efficiency increase in “traditional small molecule” research, but also the establishment of best practices in the area of new entity types including various flavors of biologics, require a more fluent communication among departments, research groups, and companies. Introduction of these new entity types does further increase the complexity of processes and organizational setup. All in all, we are facing an efficiency problem in the industry in combination with another layer of complexity due to the introduction of some new biologics modality types.

 

To a certain extent it is natural, even prudent, to have well-separated and well-defined functional departments. On the other hand, when the complexity of the new projects continuously results in active collaboration between groups, steady data flow must be guaranteed.

 

Mitigation of this efficiency problem typically results in increased outsourcing, academic collaborations, CRO involvement, and a growing number of acquisitions. All these solutions create even more problems with respect to data sharing, data flow, and data availability, all of which have severe security and access control consequences. The end result: data sharing and the ability to connect data silos becomes one of the critical bottlenecks in this field.

 

For a brief overview, let’s check the current situation of the market in a few numbers.

 

 

At Chemaxon, we deeply care about the status and the future of cheminformatics and chemical data management. To this end, the generic problems captured above can be reduced to the following main topics.

 

It is a triviality, but data is key

 

As a result of frequent acquisitions, well-executed migration of data from different sources is key to having access to all the IP within a company. Migrations need to support reusability of legacy data and the ability to access data after acquisitions. Data has to be standardized in a way that different data sets become comparable; the source of the data, original identifiers are retained for future reusability, and everything complies with the same business rules – all these considerations are certainly part of a larger set of FAIR principles.

 

Connecting systems

 

As you connect different systems between departments and research groups, speeding up the access of data sources becomes essential. These activities are typically accomplished through the integration of different systems through their well-defined APIs. The standardization of the APIs would certainly decrease the required effort to integrate the otherwise disconnected systems, but API standards are rare, and adoption of such standards is slow.

 

Format conversion, standardization

 

Still connected to the interoperability of these systems, efficient collaboration heavily depends on reliable data format conversion, or in the best case scenario, data format standardization. All areas of R&D have their own proprietary formats, depending on the vendor of the source systems, instruments, and software tools. A number of pre-competitive initiatives deal with open standards, but as in the case of API standards, adoption struggles here as well. Typically the proprietary format that has the highest market share becomes the de-facto standard, but even that process can take many years, sometimes decades. Format conversions are essential, and we cannot afford the "human translators" with the increasing size of the exchanged data set.

 

Find and (re)use everything needed for efficient decision making

 

Federated search – finding data across multiple domains and systems – is a recurring keyword when we see discussions around efficient decision making. Making sure that one can easily find all relevant information can help us remove the manual tasks involved with compiling various reports for status update meetings. It also ensures that everyone can see the same data, the basis of the decisions about next steps.

 

How can we, the R&D IT community, contribute to the progress? We need to consider migrations, the reusability of legacy data sets (when useful), the standardization of data, and the adoption of FAIR data principles. Not just talk the talk, but also walk the walk, applying the FAIR principles that can make an impact on our overall efficiency. We need to learn how to balance between the architecture of an ideal world and an efficient implementation of the change right now. As a first step, I would like to invite you all to the ChemTalks event, organized by Chemaxon, where industry experts discuss how they use technology to bridge silos in early stage drug discovery.

 

ChemTalks

 

 

References