Thursday, April 21, 2011

Genome Browser's Anonymous

As you may know, I'm starting a support group for those afflicted with the tragedy of having written a genome browser. Mine is called the Gaggle Genome Browser. About the time I was writing it, everyone and their uncle's dog decided to write a genome browser. New instruments with new data types were coming into the lab. Computers had more memory and CPU cores than ever. It seemed like a good idea at the time.

The Broad Institute's Integrative genomics viewer (shown below) got a write-up in the January Nature Biotechnology. IGV seems particularly well developed for next-gen sequencing data, nicely displaying coverage plots and alignments of short reads, with attention to the nuances of paired-ends.

IGV is a Java desktop app that pulls data down from a server component, the IGV Data Server. In my case, I cooked up a two level hierarchy for caching chunks of data in memory backed by SQLite. It's probably smart to add a server as a third level. IGV's multi-resolution data mode precomputes aggregations for zoomed-out views in which data is denser than the pixels in which to display it. IGV splits data into "tiles" stored in a custom indexed binary file format. "Hence a single tile at the lowest resolution, which spans the entire genome, has the same memory footprint as a tile at the very high zoom levels, which might span only a few kilobases." My GGB aggregates on the fly, which hurts performance in zoomed out views.

The IGV Data Server seems to derive a lot of it's data from the UCSC Genome Browser, which maintains nicely curated data mapped to genomic coordinates for a bunch of eukaryotes and also microbes. One thing I enjoyed hacking with on GGB was integration with R. I wonder if that would be worthwhile for IGV.

Which functionality to put on the client vs. which in the server is debatable. We considered building a browser based implementation, experimenting a bit with the super-cool protovis visualization library. We went with desktop. X:map is a nice counter-point, an interactive web-based genome browser. In their approach, the Google Maps API serves up pre-rendered image tiles, keeping the big data and heavy-weight computing tasks on the server. They also have an R and Java program that lets you plot custom data. JBrowse, from CSHL, does the rendering in the browser. Putting a data intensive and graphically interactive app in the browser is still somewhere near the edge of the envelope, but browsers are improving like crazy, as are programming models for this type of development.

For what it's worth, I like the format of the IGV paper. It concisely covers motivation, what the software does, a few unique features and a couple figures showing example applications, all at a high level overview in just two pages. A supplement contains the technical detail of interest to software developers along with more example applications. I like that better than trying to awkwardly shoehorn biology and software engineering together.

Anyway... Nice work, IGV team! Let me know if you'd like to join the support group. We're here to help.

Sunday, April 17, 2011

You can't optimize what you can't predict

In a post about the relationship between predictive analytics and operations research, Harlan Harris says, "You can't optimize what you can't predict." Predictive analytics is using statistical and machine-learning tools on large data sets to find complex relationships in the data and predict future trends. Operations research is the process of optimizing supply chains and industrial systems.

A synthetic oscillatory network of transcriptional regulators, Elowitz and Leibler, Nature, 1999

It's interesting because the same relationship exists between systems biology and synthetic biology. (At least we hope it does.) That is, understanding, modeling and predicting a system will eventually let you bend it towards your own ends. Same techniques, different domains. Systems biology is essentially predictive analytics on biological data. It hopes to build models and discover principles that will guide synthetic biology, which re-engineers biological systems toward novel and useful functions - everything from cleaning up toxic waste to producing energy. And the process of building entirely new biological processes inevitably feeds back into better understanding of natural biological systems.

It would be a great validation of systems biology methods to do a blind analysis of a synthetic biological circuit. Even better would be to predict the behavior of a synthetic system, then build it and see how well we did. If we do that enough times, we can't help but improve our ability to predict and optimize biological systems.

Sunday, April 10, 2011

Thinking about CRUD has damaged your karma

I'm spending some time trying out MongoDB. Mongo is a NoSQL database that stores documents in a binary variant of JSON called BSON.

Mongo is often compared to CouchDB. But, aside from the fact that they both store JSON documents, their approaches are quite different.

Mongo stores documents in collections, which are vaguely like tables in a SQL database. Mongo queries are partial JSON documents matched against existing documents in the database. Couch builds views with map-reduce and is, in general, a little more conceptually heavy while Mongo is more straight forward with direct analogs to most SQL features.

One feature I like is the mongo console. It's a full javascript interpreter, which means you can easily script bulk updates and maintenance tasks.

The MongoDB site has a quickstart guide for popular OS's and a tutorial that will get you started using the console, as well as specific guides for loads of client languages.

You have not yet reached enlightenment...

From within Ruby, we can communicate with MongoDB with the mongo, bson and bson_ext gems. One fun way to learn about using MongoDB from Ruby is through the nicely kooky MongoDB_Koans, a series of unit tests with small omissions or bugs. You fix the bugs and make the tests pass, while the test harness gently urges "Please meditate on the following code...".

Thursday, April 07, 2011

Installing the Ruby mysql gem

I've had issues installing the ruby mysql gem a couple of times, so I thought I'd document what finally worked here. I had Rails 2.3.8 running on the pre-installed Ruby 1.8.7 that comes with OS X 10.6. I needed to install Rails 3.x for another project. Having multiple versions of Rails is supposed to work OK, so I just did:

sudo gem install rails

Problems began here, but I flailed rather than keeping careful notes. At some point, I ended up thinking an fresh install of MySQL might help, so I installed version 5.5.9 from the DMG on dev.mysql.com. Maybe, I shoulda used MacPorts, cause that made things worse. After that, neither version of rails could connect to MySQL and my cubical-neighbors had new respect for my colorful vocabulary. Rails would fail, croaking up this cryptic message:

uninitialized constant MysqlCompat::MysqlRes

This thread helped. You need to be careful that MySQL, Ruby and MySQL/Ruby are compiled to a common architecture. You can do this by specifying ARCHFLAGS in the environment.

sudo env ARCHFLAGS="-arch x86_64" gem install --no-rdoc --no-ri mysql -- --with-mysql-config=/usr/local/mysql/bin/mysql_config

Compiling the gem properly is one step. The gem depends at run time on the mysql client library, which it needs to be able to find. Specify that, like so:

export DYLD_LIBRARY_PATH="/usr/local/mysql/lib:$DYLD_LIBRARY_PATH"

With the combination of these two clues, both versions of Rails seem to work happily.

Diagnosing problems with Ruby and Gems

A couple key commands for debugging RubyGems are gem list and gem env. Also, gem uninstall for removing the wreckage of failed attempts. Some recommend RVM, which I may try out some time. Also, I use the deprecated practice of installing gems with sudo. I guess I should learn to install them in my user directory.

The Ruby that comes with OS X 10.6, at least on my machine, looks like this:

$ /usr/bin/ruby --version
ruby 1.8.7 (2009-06-12 patchlevel 174) [universal-darwin10.0]

$ file /usr/bin/ruby
/usr/bin/ruby: Mach-O universal binary with 3 architectures
/usr/bin/ruby (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/ruby (for architecture i386): Mach-O executable i386
/usr/bin/ruby (for architecture ppc7400): Mach-O executable ppc

Installing a fresh Ruby from MacPorts probably wasn't necessary, but that didn't stop me:

$ which ruby
/opt/local/bin/ruby

$ /opt/local/bin/ruby --version
ruby 1.8.7 (2011-02-18 patchlevel 334) [i686-darwin10]

$ file /opt/local/bin/ruby 
/opt/local/bin/ruby: Mach-O 64-bit executable x86_64

I hope this helps someone. This is a pain, but the same thing in Python is worse.

Monday, April 04, 2011

Art house video games

Where are the art house video games? I remember reading that the novel was once considered a time waster for idlers well beneath the level of serious art. What is art, anyway? TV spent decades in the shallows before growing artistic pretensions. These days, you can take a university class about The Wire. Maybe soon we'll be able to take a class in Halo studies, or the semiotics of Grand Theft Auto?

Silly, maybe, but games show a lot more potential than, say, Twitter. Unless someone starts tweeting profound insights in haiku. Have you ever seen a Facebook page you'd describe as raw, edgy, or deep?

Games are, at least, amenable to a Lord of the Rings style quest where the main point is to explore a rich fantasy world. What games are lacking, so far as I know, is the ability to be transformative. How does the writer develop characters when the protagonist, or protagonists, are real people with a will of their own? To induce a change - growth, learning - in a character outside of the writer's control... that would be the real trick.

But the potential of games is there as well. The visuals and audio are already well developed. Interactivity with the game world and the shared experience of multiplayer games is where the untapped potential lies. The medium may be the message, but to succeed on an artistic level games need a message. They need more to say than, "Let's blow shit up!"

Know any games that rise to the level of real art? Put your nominations in comments...