Implementing ISO standard 11238 compliance with ChemAxon tools
Without standards the world would be a place of utter chaos. With the ever increasing complexity of modern life, standards specifications ensure interoperability between the multitude of systems that society has grown to depend upon. In the world of cheminformatics, a recent trend has been the promotion of authority prescribed "de jure" standards (such as InChI and HELM), over the more traditional "de facto" standards (such as V2000 Mol files, SMILES strings). Voluntary "de facto" standards are selected by the community, reusing practical solutions that work well, and thereby become dominant, whilst less practical approaches are ignored, in a process akin to Darwinian selection. Obligatory "de jure" standards, however, are imposed on an industry rather than selected by it, often by bureaucrats and lawyers rather than experts. A recent example of this is ISO international standard 11238, entitled "Health informatics -- Identification of Medicinal Products -- Data elements and structures for the unique identification and exchange of regulated information on substances". This standard covers file formats for exchanging chemical structures between government agencies including the US Food & Drug Administration (FDA). Amongst the implementation challenges required by this standard is the ability to handle MDL V2000 connection tables with normalized whitespace (i.e. stripped of carriage-returns, linefeeds and the multiple spaces used to align columns). To the author's knowledge, no cheminformatics suite in the world could meet this requirement at the time the standard was ratified by several countries standards bodies in 2011. With luck, poor "de jure" standards will be ignored by legislators, rather than imposing unreasonable burdens on the communities they were designed to help.