Handling large volumes of chemical data in a scalable way and on a tight budget with the PostgreSQL database and the JChem PostgreSQL Cartridge

presentation · 5 years ago
by Ellert van Koperen (MedChemData)
JChem PostgreSQL Cartridge

We ran a pilot on the idea of pushing a very large amount of chemicals through reactor. That is a challenge in itself, not just performance-wise because of the amount of data, but also keeping the results structured, clean and correct. Coming from a data-processing background the approach was obvious: use a database! More specifically we chose for the PostgreSQL database in combination with the new JChem PostgreSQL Cartridge.

Rather then simply pumping millions of compounds into reactor, a route through a tool that can make smart selections makes it possible to scale the whole process up. Thus, after compiling a large library of compounds, we had to pre-filter the data before pumping it into reactor. This means substructure and feature searches, and optimizing those can be very tricky.

In the presentation I will elaborate on the various hurdles that had to be taken, and the ways those can be avoided.

We now know that Postgres and the JChem cartridge can be excellent performers, if handled with care. The huge operation that we plan on undertaking is probably possible with these very cost effective tools. Though not fully mature yet, the combination of an opensource database with a chemistry-aware backend does have the potential to be a game-changer.

Download slides in pdf