Automating database creation

news · 6 years ago
by Tim Dudgeon

Have you ever needed to create a database that includes JChem tables?
Maybe you've done this manually, but if you're doing this as part of a development process then it needs to be automated.

Its usually simple to create databases - all you need is to define the create table SQL (OK, maybe not always quite so simple, but you get the point). So you're probably tempted to reverse engineer a JChem table and just create the table with something that looks like this:

CREATE TABLE MOLECULES (
 cd_id int(11) NOT NULL AUTO_INCREMENT,
 cd_structure mediumblob NOT NULL,
 cd_smiles text,
 cd_formula varchar(100) DEFAULT NULL,
 cd_sortable_formula varchar(255) DEFAULT NULL,
 cd_molweight double DEFAULT NULL,
 cd_hash int(11) NOT NULL,
 cd_flags varchar(20) DEFAULT NULL,
 cd_timestamp datetime NOT NULL,
 cd_pre_calculated tinyint(1) NOT NULL DEFAULT '0',
 cd_fp1 int(11) NOT NULL,
 cd_fp2 int(11) NOT NULL,
 cd_fp3 int(11) NOT NULL,
 cd_fp4 int(11) NOT NULL,
...
...
)

It looks right, but its a very bad idea!
Why? Well lots of reasons.
Firstly, you've got to get the SQL completely correct. If you miss something or get a column definition wrong then there is no way that JChem is going to play ball with you.
Next, its not easy to get this working for different database types. The column definitions will be different. (e.g. the above snippet is for MySQL, but it will take a lot of work to get it working for Oracle).
Next, you've probably forgotten about the JChem property table (usually named JCHEMPROPERTIES). This is essential for JChem and it has to contain very precise information about the structure table you are creating. If you don't create this table as well and put exactly the right info into it then JChem either won't recognize your structure table at all, or even worse will recognize it but will show strange behaviours that are pretty well impossible to debug.
Next, you'll really struggle to use more sophisticated features like Chemical Terms columns or Standardizers.
And finally, even if you walk this tortuous path and manage to get it working, then there is no guarantee it will still work when the version of JChem changes as any of these details can change. The table definitions should be treated as a private implementation detail understood only by JChem. Not something you should mess with.

OK, so by now I assume you are convinced. But what is the right way to do this?
The correct approach is to use the JChem API. This means that you tell the JChem Java code to create the table, specifying all the options for that table and the JChem API handles all the implementation details.
How do you use this JChem API? Well, most people do this from end user applications like JChem Manager or Instant JChem where the application handles the calls to the JChem API. But in this case this won't work because we need a more direct, automatable approach.

So in our case the answer is to use the JChem Java API. So you need to write Java code that calls the JChem API?
Well, yes that will work but its a pretty painful way to go about it, especially if you need to also create standard tables using SQL. Surely there's an easier way? Well, yes, Groovy to the rescue. Groovy, being 100% Java, takes nothing away, but makes things so much easier. We can combine using the JChem API with executing normal SQL in a pretty simple manner using the Groovy Sql class. The full example is attached, but here are the key parts of a script that create a structure table in the correct way and it also creates a standard table using SQL, and its all without rigmarole and drama. The key parts are:

[java]
Sql db = Sql.newInstance('jdbc:derby:memory:myDB;create=true')
ConnectionHandler conh = new ConnectionHandler(db.connection, 'JCHEMPROPERTIES')
DatabaseProperties.createPropertyTable(conh)
StructureTableOptions opts = new StructureTableOptions()
opts.name = 'MOLECULES'
opts.tableType = TABLE_TYPE_ANY_STRUCTURES
opts.duplicateFiltering = true
UpdateHandler.createStructureTable(conh, opts)
[/java]

Full details and comments are in the full code, but basically:

Line 1: create the Groovy Sql instance
Line 2: Create the JChem ConnectionHandler
Line 3: Create the property table
Lines 4-7: specify the options for the JChem table
Line 8: Create the JChem table

The full script also shows how you can create plain SQL tables in the same script without rigmarole. Take a look for more details.

Now what about the automation bit? All we have now is a Groovy script that you can execute within your IDE or on the command line, but its not really automated.
Well, if you’re using Ant as your build process then all you have to do is use the Ant Groovy task and you can put this script into your build process.
http://groovy.codehaus.org/The+groovy+Ant+Task

Just remember. Don’t even think of creating those JChem tables with SQL!

code.zip