We need to talk about data sharing. While phrasing the obstacles in this area, the data sharing section in last year’s Gartner Hype Cycle for Healthcare Data, Analytics and AI document starts with the following sentences: “Data sharing is difficult. Even with the best of standards and new capabilities, it requires a complex architecture and set of governance policies, guidelines and guardrails that are challenging for healthcare organizations to manage.” It is not only the technical aspects that have to be resolved. That is probably the “easy part.” Processes, expectations, defined KPIs, and even culture and motivation around sharing results and data need to be adjusted, so that we can finally see a better adoption of emerging standards and technologies – otherwise we will continue building our data silos.
The pharmaceutical industry faces growing challenges in managing the ever-increasing R&D cost per drug compared to the flat expected revenue trend of these new drugs. The efficiency increase in “traditional small molecule” research, but also the establishment of best practices in the area of new entity types including various flavors of biologics, require a more fluent communication among departments, research groups, and companies. Introduction of these new entity types does further increase the complexity of processes and organizational setup. All in all, we are facing an efficiency problem in the industry in combination with another layer of complexity due to the introduction of some new biologics modality types.
To a certain extent it is natural, even prudent, to have well-separated and well-defined functional departments. On the other hand, when the complexity of the new projects continuously results in active collaboration between groups, steady data flow must be guaranteed.
Mitigation of this efficiency problem typically results in increased outsourcing, academic collaborations, CRO involvement, and a growing number of acquisitions. All these solutions create even more problems with respect to data sharing, data flow, and data availability, all of which have severe security and access control consequences. The end result: data sharing and the ability to connect data silos becomes one of the critical bottlenecks in this field.
For a brief overview, let’s check the current situation of the market in a few numbers.
At Chemaxon, we deeply care about the status and the future of cheminformatics and chemical data management. To this end, the generic problems captured above can be reduced to the following main topics.
As a result of frequent acquisitions, well-executed migration of data from different sources is key to having access to all the IP within a company. Migrations need to support reusability of legacy data and the ability to access data after acquisitions. Data has to be standardized in a way that different data sets become comparable; the source of the data, original identifiers are retained for future reusability, and everything complies with the same business rules – all these considerations are certainly part of a larger set of FAIR principles.
As you connect different systems between departments and research groups, speeding up the access of data sources becomes essential. These activities are typically accomplished through the integration of different systems through their well-defined APIs. The standardization of the APIs would certainly decrease the required effort to integrate the otherwise disconnected systems, but API standards are rare, and adoption of such standards is slow.
Still connected to the interoperability of these systems, efficient collaboration heavily depends on reliable data format conversion, or in the best case scenario, data format standardization. All areas of R&D have their own proprietary formats, depending on the vendor of the source systems, instruments, and software tools. A number of pre-competitive initiatives deal with open standards, but as in the case of API standards, adoption struggles here as well. Typically the proprietary format that has the highest market share becomes the de-facto standard, but even that process can take many years, sometimes decades. Format conversions are essential, and we cannot afford the "human translators" with the increasing size of the exchanged data set.
Federated search – finding data across multiple domains and systems – is a recurring keyword when we see discussions around efficient decision making. Making sure that one can easily find all relevant information can help us remove the manual tasks involved with compiling various reports for status update meetings. It also ensures that everyone can see the same data, the basis of the decisions about next steps.
How can we, the R&D IT community, contribute to the progress? We need to consider migrations, the reusability of legacy data sets (when useful), the standardization of data, and the adoption of FAIR data principles. Not just talk the talk, but also walk the walk, applying the FAIR principles that can make an impact on our overall efficiency. We need to learn how to balance between the architecture of an ideal world and an efficient implementation of the change right now. As a first step, I would like to invite you all to the ChemTalks event, organized by Chemaxon, where industry experts discuss how they use technology to bridge silos in early stage drug discovery.
References