Linguamatics and ChemAxon Announce Project to Enhance Text Mining in Chemistry

news · 10 years ago
Linguamatics and ChemAxon are pleased to announce that they are partnering in a new, path-breaking project funded by EUREKA’s Eurostars Programme. The project is code-named “ChiKEL”, which means Chemically Informed Knowledge Extraction from Literature. ChiKEL will provide the first interactive text mining system designed for chemistry, integrating advanced chemical search and extraction of relationships between structures and other biological or chemical entities. By combining chemical search and text mining, users will be able to perform chemical structure and biological searches to extract structured information for further analysis from patents, scientific articles, and internal documents. This fully automated approach enables chemical structures to be found in documents where mark-up by hand has either not been done, done for some structures but not all, or is uneconomic, e.g., for a company’s internal reports. Importantly, the new approach is highly scalable and can be able to find chemical structures at particular points within a document, so questions can be posed such as “which chemicals are mentioned as inhibitors of a particular target” or “what role does the chemical have within this document”. The existing integration between Linguamatics’ and ChemAxon’s software products enables substructure and similarity searching for known compounds. For example, it is possible to interrogate the literature to find properties of compounds that have a particular substructure, such as the targets that a set of compounds inhibit. The ChiKEL project extends the existing integration to enable recognition of novel chemical compounds expressed in a variety of ways, including IUPAC names, and images. In addition to substructure or similarity searching according to a given structure drawn by a user, ChiKEL will also enhance the presentation of the results of searches so that users can view chemical structures and browse through clusters of structures found within the documents. Key aims of ChiKEL are to 1) develop gold standards for evaluation, 2) integrate name to structure to find novel chemicals , 3) structure visualization for search results and 4) exploring image to structure conversion. Applications include: scientific research, intellectual property and commercial intelligence. Specific areas include drug discovery, drug licensing and repurposing, drug safety and pharmacovigilance. Target customers include pharmaceutical and biotechnology companies and adjacent markets such as food, agrochemicals, and healthcare. Companies interested in being a beta tester for the ChiKEL software should contact Linguamatics at [email protected]. About Linguamatics Linguamatics is the world leader in deploying innovative “natural language processing” (NLP) based text mining technology for complex, high value problem solving. The Linguamatics approach enables organizations to maximize value from their information resources, synthesizing and distilling meaning from massive amounts of documents into meaningful results, to support decision making. Linguamatics software products are used by nine of the world’s Top-10 pharmaceutical companies, and many other prestigious commercial, academic and government organizations. This impressive customer base has been built on producing break-through insights from massive amounts of unstructured, textual data, which contains rich but often difficult-to-access, business-critical information. As a result, the company has been self-financing and profitable since its conception in 2001, growing at an average rate of more than 50% per annum, with a software license renewal rate that regularly exceeds 95% per year. Three of the original four founders still sit on the management team. Linguamatics partners and collaborates with companies, academic and governmental organizations to bring customers the right solution for their needs, and to develop next generation capabilities. Current partners include Oracle, Accelrys, ChemAxon and IO Informatics. The company has also received project research funding from the European Union and won a number of awards. The company operates globally, and has offices in Cambridge, UK, and Boston, MA, USA.For further information, visit About I2E Linguamatics flagship product, I2E, is an agile, scalable, high performance text mining system that aids organizations in discovering and synthesizing knowledge from unstructured text in large document collections, such as scientific papers, newsfeeds, patents, internal reports and social media such as Twitter. I2E’s agile nature allows tuning of query strategies to deliver the precision and recall you need for your specific task, at enterprise scale (millions of documents). I2E text mining queries can also be integrated into work flows to streamline the process even further. Furthermore, I2E’s unique capabilities may be delivered by deploying I2E Enterprise in-house, or via I2E OnDemand, the cloud version of I2E. The I2E platform is easily transferrable and has been applied across a range of different sectors, including: Pharmaceutical and Biotechnology, Chemicals, Healthcare, Government and Academia. Linguamatics has a large and growing user community, with its I2E text mining platform used by many different organizations. Customers benefit from dramatically improved commercial analysis and decision making, with substantial and measureable financial results. Primary application areas include: Product Research and Development, Market Research, such as Sentiment Analysis, Business Development, including Lead Generation and Opportunity Spotting, Patent Search and Analysis and Competitor Intelligence. About ChemAxon ChemAxon is a leader in providing cheminformatics software development platforms and desktop applications for the biotechnology, pharmaceutical and agrochemical industries. With core capabilities for structure visualization, search and management, property prediction, virtual synthesis, screening and drug design, ChemAxon focuses upon active interaction with users and software portability to create powerful, cost effective cross platform solutions and programming interfaces to power modern cheminformatics and chemical communication. The company is privately owned with European headquarters in Budapest and sales and support offices in Europe, Japan and North America. About Text Mining Natural Language Processing (NLP) based text mining is now a mainstream technology, with proven value and measureable results. Text mining systems have the ability to analyze unstructured or semi structured text and derive concepts, structure and relationships from it. It is an alternative approach to traditional search engines using keywords in documents as a way of finding information. Text mining interprets the meaning of the text. Text mining represents a step-change breakthrough beyond the capabilities of traditional search tools such as Google. A large proportion of knowledge is only available within text. For example, scientific knowledge may be found in scientific papers, internal reports, patents, news feeds or text fields within semi-structured data such as medical records or electronic lab notebooks. Given the unprecedented and continuing increase in available textual information, new ways were required to extract relevant, decision-critical information to inform commercial and academic research and development. Advanced text mining technology has recently emerged as the automated solution for extracting and connecting information from various information sources at large scale. Users “mine” large collections of documents, extracting, analyzing and synthesizing relevant facts, relationships and quantitative data from content. The technology has been making a large impact in life sciences, for example in pre-clinical safety, systems biology, and target selection.