Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes

publication · 7 years ago
by Andreas Bender, Muthukumarasamy Karthikeyan (National Chemical Laboratory)
A wide range of molecular representations exist today, ranging from human-readable structural diagrams over line notations such as Wiswesser Line Notation (WLN) and SMILES to several dozen computerreadable file formats. Still, to encode molecular structures in a computer-readable way for inputting structures in computer systems those formats are not the method of choice since they are not easily and faultlessly readable via optical recognition. In the present study a two-dimensional (PDF417) barcode representation of molecular structures in SMILES format is explored that enables the user to read and input molecular structures into computer systems in a fully automated fashion. A Lempel-Ziv-Welch (LZW) based compressed version of SMILES is suggested for cases where the size of the structure exceeds the storage capacity of PDF417 barcodes. Alternatively, the compact ACS format may be employed as a structural representation. The input via barcodes is fast, practically error free due to the 2D barcodes used which employ error correction and fully automatic. A Web application interface is developed which is able to interpret these barcodes and export them as optimized 3D chemical structures. Applications of this representation range from keeping automated storage systems to Web-based tracking systems of molecular samples. The National Chemical Laboratory, Pune, employs 2D barcode encoded structures for in-house repository management, where barcodes can also be used for querying the database for similar or substructures of the query structure.
