Digithead's Lab Notebook: 11/01/2011

Wednesday, November 30, 2011

Support Vector Machines

Week 7 of Andrew Ng's Machine Learning class covers support vector machines, pragmatically from the perspective of calling a library rather than implementation. I've been wanting to learn more about SVMs for quite a while, so I was excited for this one.

A support vector machine is a supervised classification algorithm. Given labeled training data, typically high-dimensional vectors, SVM finds the maximum-margin hyperplane separating the positive and negative examples. The algorithm selects a decision boundary that does the best job of separating the classes with the extra stipulation that the boundary be as far as possible from the nearest samples on either side. This is where the large margin part comes from.

(from Andrew Ng's ml-class.)

The cost function used with SVMs is a slightly modified version of that used with logistical regression:

With SVMs, we replace the sigmoid functions with linearized version called cost₁ and cost₂.

Some error is accepted, allowing for misclassification of some training examples in the interest of getting the majority correct. The parameter C acts as a form of regularization, specifying tolerance for training error.

But, what if the boundary between classes is non-linear, like the one shown here?

(from Andrew Ng's ml-class.)

The SVM algorithm generalizes to non-linear cases with the aid of kernel functions. A straight line in n dimensions, a hyper-plane, can be viewed as a linear kernel. The other widely used class of kernel functions is the guassian kernel. It's my understanding that the kernel function maps a non-linear boundary in the problem space to a linear boundary in a higher dimensional space.

(from the wikipedia entry for support vector machine.)

The SVM algorithm is sped up by a performance hack called the kernel trick, which I understand just in general outline: The kernel trick is a way of mapping observations into a higher dimensional space V, without ever having to compute the mapping explicitly. The trick is to use learning algorithms that only require dot products between the vectors in V, and choose the mapping such that these high-dimensional dot products can be computed within the original space, by means of a kernel function.

There is some equivalence between SVMs and neural networks that I don't quite grasp. The process of computing the kernel function on the input vectors is something like the hidden layer of the neural network, which transforms and weighs the input features. I'm not sure if the analogy between SVMs and ANNs goes deeper. Also by virtue of the kernels, SVMs are a member of a more general class of statistical algorithms called kernel methods.

The exercise was to build a spam filter based on a small subset of the SpamAssassin public corpus of 6047 messages, of which roughly a third are spam. I trained an SVM and tried in on email from my spam-magnet yahoo email address, and it worked!

So, I guess the up-shot is that I'm still a little hazy on SVMs, if a bit less so than before. If I really want to know more, there's the source code that came with the homework. Or, I could read A training algorithm for optimal margin classifiers.

2012 conference dates

Here are a few conferences for 2012 in computing or bioinformatics:

Strata
February 28-March 1, 2012
Santa Clara, CA
Visualizing Biological Data (VIZBI 2012)
6-8 March 2012
Heidelberg, Germany
ISB Symposium
15-16 April 2012
Seattle, WA
Allen Brain Atlas Hackathon 18-22 June 2012
Seattle, WA
Google I/O
June 27-29, 2012
Moscone Center West, San Francisco
ISMB Intelligent Systems for Molecular Biology
13-17 July 2012
Long Beach, CA
OSCON 2012
July 16-20, 2012
Portland, Oregon
2012 Galaxy Community Conference (GCC2012)
July 25-27
Chicago, Illinois
International Conference on Systems Biology (ICSB-2012)
19-23 August 2012
Toronto, Canada
European Conference on Computational Biology (ECCB 2012)
9-12 September 2012
Basel, Switzerland
Strange Loop
September 23-25, 2012
St. Louis, MO
ACM International Conference on Bioinformatics, Computational Biology and Biomedicine
8-10 October 2012
Orlando, FL
VisWeek
October 14-19, 2012
Seattle, WA
ACM SPLASH formerly known as OOPSLA
October 19-26, 2012
Tucson, AZ

Also, here's the full list of O'Reilly conferences. Last year's Strata on big data/data science was really fun.

Biovis, not to be confused with VizBi (see above) is part of VisWeek and will be in Seattle next October.

We'll soon be adding our own Systems Bioinformatics Workshop to the schedule, probably sometime in the Summer. Hope to see you there.

Thursday, November 10, 2011

Matrix arithmetic

Here are a couple bits of basic linear algebra that'll come in handy in the Machine Learning class.

How to multiply matrices

Matrix Identities

If X and Y are two vectors of length m:

If X is (m x n) matrix and Y is (n x 1) vector

Paul's Online Math Notes on Linear Algebra

Digithead's Lab Notebook

Wednesday, November 30, 2011

Support Vector Machines

More SVM links

2012 conference dates

Thursday, November 10, 2011

Matrix arithmetic

How to multiply matrices

Matrix Identities

About

About Me

Blog Archive

Labels

Cheat Sheets

Featured on

Digithead's Lab Notebook

Wednesday, November 30, 2011

Support Vector Machines

More SVM links

2012 conference dates

Thursday, November 10, 2011

Matrix arithmetic

How to multiply matrices

Matrix Identities

About

About Me

Blog Archive

Labels

Cheat Sheets

Feedz

Featured on