Convert chemical file formats in Python using JChem Web Services

Posted by
Gábor Guta
on 2013-09-03

Convert chemical file formats in Python using JChem Web Services

Chemical file formats can always cause trouble, because most of the tools do not support all formats or cannot handle specific formats properly. ChemAxon's platform provides high quality chemical file format handling, but the library is in Java. We describe how our library can be accessed through Python.

We assume that you are familiar with Python (we assume 3.x, which is backward-incompatible with version 2.x line) to some extent so you would like to develop your solution in this programming language. Installing the JChem Web Services is not a big deal, all you have to do is download it from here and execute the installer (on linux or on mac you have to extract the archive into a folder and start the services with startup.sh in the bin folder).

There are a few Python interfaces available that make it easy to use the SOAP interface. We use the Suds library as it is very "pythonic". The Suds library can be downloaded at http://pypi.python.org/pypi/suds-jurko/0.4.1.jurko.4. Be sure to download Suds-jurko-0.4.1 or later version. Generally, Python modules can be loaded with setup.py install.

The MolConverterScript.py file can be found here: <JChemWebServices directory>\examples\python\MolChenverterScript.py. Let's see how we can convert a molecule file with the help of the Molecular Conversion Web Service. You will also see how the Suds library is used to connect, send requests, and read the responses. Using other web services provided by the JChem Web Services package will use similar steps as those described below.

First, the library has to be imported, then the wrapper object has to be created to access the web service.

[python] import suds.client as cl molconverter=cl.Client('http://localhost:8180/axis2/services/MolConvertWS?wsdl') [/python]

Then the options of the wrapper can be modified if necessary (e.g.: selecting port to invoke the service).

[python] molconverter.set_options(port='MolConvertWSHttpSoap12Endpoint') [/python]

The convert method of the service can be called as follows. Here, we save the reference of the returned object of the service call to the result variable.

[python] result= molconverter.service.convert('OC=O', 'mol') [/python]

In this example we can simply use the returned string of the convert function. Binary data returned in a Base64 encoded form. To decode Base64 encoded data, we can use the standard “binascii” library. Let’s see an example which converts a SMILES string into a JPEG picture.

[python] import suds.client as cl import binascii molconverter=cl.Client('http://localhost:8180/axis2/services/MolConvertWS?wsdl') encImg=molconverter.service.convert('OC=O', 'jpeg:w160,h100,Q95,#C0CDC0') imgFile=open('test.jpg', 'wb') imgFile.write(binascii.a2b_base64(bytes(encImg,'UTF-8'))) imgFile.close() [/python]

This example differs from the first one in the following way: we additionally import the binascii package, we call the convert function with a format parameter resulting in a binary output, we open a file in which we write the content of the returned string after conversion; the conversion of the string into a decoded byte stream is done by binascii.a2b_base64 function.

 

Chemical file formats can always cause trouble, because most of the tools do not support all formats or cannot handle specific formats properly. ChemAxon's platform provides high quality chemical file format handling, but the library is in Java. We describe how our library can be accessed through Python.

We assume that you are familiar with Python (we assume 3.x, which is backward-incompatible with version 2.x line) to some extent so you would like to develop your solution in this programming language. Installing the JChem Web Services is not a big deal, all you have to do is download it from here and execute the installer (on linux or on mac you have to extract the archive into a folder and start the services with startup.sh in the bin folder).

There are a few Python interfaces available that make it easy to use the SOAP interface. We use the Suds library as it is very "pythonic". The Suds library can be downloaded at http://pypi.python.org/pypi/suds-jurko/0.4.1.jurko.4. Be sure to download Suds-jurko-0.4.1 or later version. Generally, Python modules can be loaded with setup.py install.

The MolConverterScript.py file can be found here: <JChemWebServices directory>\examples\python\MolChenverterScript.py. Let's see how we can convert a molecule file with the help of the Molecular Conversion Web Service. You will also see how the Suds library is used to connect, send requests, and read the responses. Using other web services provided by the JChem Web Services package will use similar steps as those described below.

First, the library has to be imported, then the wrapper object has to be created to access the web service.

[python] import suds.client as cl molconverter=cl.Client('http://localhost:8180/axis2/services/MolConvertWS?wsdl') [/python]

Then the options of the wrapper can be modified if necessary (e.g.: selecting port to invoke the service).

[python] molconverter.set_options(port='MolConvertWSHttpSoap12Endpoint') [/python]

The convert method of the service can be called as follows. Here, we save the reference of the returned object of the service call to the result variable.

[python] result= molconverter.service.convert('OC=O', 'mol') [/python]

In this example we can simply use the returned string of the convert function. Binary data returned in a Base64 encoded form. To decode Base64 encoded data, we can use the standard “binascii” library. Let’s see an example which converts a SMILES string into a JPEG picture.

[python] import suds.client as cl import binascii molconverter=cl.Client('http://localhost:8180/axis2/services/MolConvertWS?wsdl') encImg=molconverter.service.convert('OC=O', 'jpeg:w160,h100,Q95,#C0CDC0') imgFile=open('test.jpg', 'wb') imgFile.write(binascii.a2b_base64(bytes(encImg,'UTF-8'))) imgFile.close() [/python]

This example differs from the first one in the following way: we additionally import the binascii package, we call the convert function with a format parameter resulting in a binary output, we open a file in which we write the content of the returned string after conversion; the conversion of the string into a decoded byte stream is done by binascii.a2b_base64 function.