Friday, October 31, 2008

Notes on SqliteJDBC and Java Web Start

Java has a long standing limitation about loading native code from a jar, stemming from the unfortunate belief that native code was somehow impure. In the practical world, there was occasional need to do this, so people came up with the work-around of copying the native library out of the jar and into the local filesystem where it could then be loaded with System.load().

Another way do do this, within web start is to platform specific resource elements like these to your jnlp:

  
<resources os="Windows" arch="x86">
  <j2se href="http://java.sun.com/products/autodl/j2se" version="1.5+"/>
  <nativelib href="sqlitejdbc-v052-native-win.jar"/>
</resources>
<resources os="Mac OS">
  <j2se href="http://java.sun.com/products/autodl/j2se" version="1.5+"/>
  <nativelib href="sqlitejdbc-v052-native-mac.jar"/>
</resources>
<resources os="Linux">
  <j2se href="http://java.sun.com/products/autodl/j2se" version="1.5+"/>
  <nativelib href="sqlitejdbc-v052-native-lin.jar"/>
</resources>

This way, we get the native library for the right platform, if all goes according to plan. One wacky detail is that System.loadLibrary("foo") munges the library name in some platform specific and poorly documented way. So, you need a libfoo.jnilib for OS X, libfoo.so for linux, and a foo.dll for windoze. It's kind-of a hassle to set up all these separate jars. And System.load(...) just tries to do what it's told without munging the name, so people still frequently use the trick described above.

SqliteJDBC does just that to load the native Sqlite library. SqliteJDBC also includes "pure Java" drivers, which are considerably slower for some operations, but act as a nice fallback when the proper native libraries aren't handy.

This all works well enough. But, on OS X 10.5, Java 6, it ends up falling back on the non-native drivers. After digging a bit, I found that the call to System.load(..) was failing with this exception:

/private/tmp/libsqlitejdbc-19214.lib: 
java.lang.UnsatisfiedLinkError: /private/tmp/libsqlitejdbc-19214.lib: 
 at java.lang.ClassLoader$NativeLibrary.load(Native Method)
 at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1822)
 at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1702)
 at java.lang.Runtime.load0(Runtime.java:770)
 at java.lang.System.load(System.java:1005)
        ...

The same thing works find when using Java 5 on the same machine and on Java 6 on Windoze, so I'm guessing the native library needs to be recompiled to work correctly with OS X's 64-bit Java 6.

A little spelunking into SqliteJDBC's code reveals that the author has kindly provided hooks where you can insert the path and name of your own driver using these system properties:

  • org.sqlite.lib.path
  • org.sqlite.lib.name

The driver will then attempt this:

System.load(new File(libpath, libname).getAbsolutePath());

More Resources:

Monday, October 27, 2008

WTF NCBI?

A previous post, Hacking NCBI Entrez, dealt with how to retrieve sequence information from NCBI's databases. That method seems to work for prokaryotes and for yeast, but fails for most other eukaryotes.

For mammals, efetch for genome XML gives back crap like this (for rat):

<gbseq_contig>join(NW_001084776.1:1..691014,gap(182895),NW_001084777.1:1..1914699,gap(182895),NW_001084778.1:1..26673,gap(182895),NW_001084779.1:1..2730,gap(182895),NW_001084780.1:1..61755,gap(182895),NW_001084781.1:1..20466,gap(182895),NW_001084782.1:1..657670,gap(182895),NW_001084783.1:1..55883,gap(182895),NW_001084784.1:1..9292,gap(182895),NW_001084785.1:1..10599,gap(182895),NW_001084786.1:1..14198,gap(182895),NW_001084787.1:1..3561,gap(182895),NW_001084788.1:1..106511,gap(182895),NW_001084789.1:1..21205827,gap(182895),NW_001084790.1:1..11152534,gap(182895),NW_001084791.1:1..6015389,gap(182895),NW_001084792.1:1..686425,gap(182895),NW_001084793.1:1..9344793)</gbseq_contig>

Or this for XML for human Y chromosome. Totally useless. I take it I'm supposed to request each of the referenced sequences and parse out the regions for each? What a colossal pain in the ass!

By the way, BioJava has a parser called GenbankXmlFormat, but it's docs say, "Deprecated. Use org.biojavax.bio.seq.io.INSDseqFormat". What INSDseqFormat is or how that is supposed to replace GenbankXML is totally unclear.

Friday, October 10, 2008

eScience

Ed Lazowska's elements of eScience:
  • Sensors 
  • Networking 
  • Visualization 
  • Databases 
  • Data mining 
  • Machine learning
eScience is what happens when scalable mass produced computing infrastructure is applied to scientific problems. It's not about the computation, it's about the data.
...which reminds me of a Peter Karp paper.