Thursday, July 11, 2013

Generate UUIDs in R

Here a snippet of R to generate a Version 4 UUID. Dunno why there wouldn't be an official function for that in the standard libraries, but if there is, I couldn't find it.


## Version 4 UUIDs have the form:
##    xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
##    where x is any hexadecimal digit and
##    y is one of 8, 9, A, or B
##    f47ac10b-58cc-4372-a567-0e02b2c3d479
uuid <- function(uppercase=FALSE) {

  hex_digits <- c(as.character(0:9), letters[1:6])
  hex_digits <- if (uppercase) toupper(hex_digits) else hex_digits

  y_digits <- hex_digits[9:12]

  paste(
    paste0(
      sample(hex_digits, 8, replace=TRUE),
      collapse=''),
    paste0(
      sample(hex_digits, 4, replace=TRUE),
      collapse=''),
    paste0(
      '4',
      paste0(sample(hex_digits, 3, replace=TRUE),
             collapse=''),
      collapse=''),
    paste0(
      sample(y_digits,1),
      paste0(sample(hex_digits, 3, replace=TRUE),
             collapse=''),
      collapse=''),
    paste0(
      sample(hex_digits, 12, replace=TRUE),
      collapse=''),
    sep='-')
}
}

View as a gist: https://gist.github.com/cbare/5979354

Note: Thanks to Carl Witthoft for pointing out that my first version was totally broken. Turns out calling sample with __replace=TRUE__ greatly expands the possible UUIDs you might generate!

Carl also says, "In general, as I understand it, the value of UUID codes is directly dependent on the quality of the pseudo-random number generator behind them, so I’d recommend reading some R-related literature to make sure “sample” will be good enough for your purposes."

This sounds wise, but I'm not sure if I'm smart enough to follow up on it. It could be that the randomness of these UUIDs is less than ideal.

Saturday, July 06, 2013

Automate This!

The invention of the printing press by German blacksmith Johannes Gutenberg in 1439, the foundational event of the infomation age, is a common touchstone for technology stories, appearing in the opening chapter of both Nate Silver's The Signal and the Noise and Viktor Mayer-Schonberger and Kenneth Cukier's Big Data.

Automate This!, by Christopher Steiner, comes at current technology trends from a more mathy angle, tracing roots back to Leibniz and Gauss. Here, it's algorithms rather than data that take center stage. Data and algorithms are two sides of the same coin, really. But, it's nice to some of the heros of CS nerds everywhere get their due: Al Khwarizmi, Fibonacci, Pascal, the Bernoullis, Euler, George Boole, Ada Lovelace and Claude Shannon.

Automate This! is more anecdotal than Big Data, avoiding sweeping conclusions except the one announced in bold letters as the title of the last chapter: "The future belongs to the algorithms and their creators." The stories, harvested from Steiner's years as a tech journalist at Forbes, cover finance and the start-up scene, but also medicine, music and analysis of personality.

Many of the same players from Nate Silver's book or from Big Data make an appearance here as well: Moneyball baseball manager Billy Beane, game-theorist and political scientist Bruce Bueno de Mesquita, and Facebook data scientist and Cloudera founder Jeff Hammerbacher.

Finance

In the chapter on algorithmic trading we meet hungarian-born electronic-trading pioneer Thomas Peterffy, who built financial models in software in the 80's before it was cool by hacking a NASDAQ terminal.

In the same chapter, I gained new respect for financial commentator Jim Cramer. In contrast to his buffoonish on-screen persona, his real-time analysis of the May 2010 flash-crash was both "uncharacteristically calm" and uncannily accurate. As blue-chip stocks like JNJ dived to near-zero, he made the savvy assessment, "That's not a real price. Just go and buy it!" and, as prices recovered only minutes later, "You'll never know what happened here." There's little doubt that algorithmic trading was the culprit, but the unanswered question is whether it was a bot run amok or an intentional strategy that worked a little too well. Too bad, if you did buy they probably canceled it.

Music

Less scarily, algorithms can rate a pop song's chances of becoming a hit single. Serious composer and professor of music David Cope uses hand-coded (in LiSP) programs to compose music, pushing boundaries in automating the creative process.

Medicine

Having mastered Jeopardy, IBM's Watson is gearing up to take a crack at medical diagnostics, which is a field Hammerbacher thinks is ripe for hacking.

Psych and soc

Computers are beginning to understand people, which gives them a leg up on me, I'd have to say. Taibi Kahler developed a classification system for personality types based on patterns in language usage. Used by NASA psychiatrist Terry McGuire to compose well-balanced flight crews, the system divides personality into six bins: emotions-driven, thoughts-based, actions-driven, reflection-driven, opinions-driven, and reactions-based. If you know people more than superficially, they probably don't fit neatly into one of those categories, but some do (by which I mean they have me pretty well pegged).

At Cornell, Jon Kleinberg's research provides clues to the natural pecking order that emerges among working or social relationships - Malcolm Gladwell's influencers detected programmatically. One wonders if corporate hierarchies were better aligned with such psychological factors would the result be a harmonious workplace where everyone knows and occupies their right place? Or a brave new world of speciation into some technological caste system?

What next?

Perhaps surprisingly, Steiner cites the Kaufmann Foundation's Financialization and Its Entrepreneurial Consequences on "the damage wrought by wall street" - the brain drain toward finance and away from actual productive activity. The book ends with the hopeful message that the decline of finance will set quantitative minds free to work on creative entrepreneurial projects. For the next generation, there's a plea for urgently needed improvements in quantitative education, especially at the high-school level.

Automate This! is a quick and fun read. Steiner's glasses are bit rose-tinted at times and his book will make you feel like a chump if you haven't made a fortune algorithmically gaming the markets or disrupting some backwards corner of the economy. As my work-mates put it, we're living proof that tech-skills are a necessary but not sufficient condition.

Links

Kahler's personality types

  • Emotions-driven: form relationships, get to know people, tense situations -> dramatic, overreative
  • Thoughts-based: do away with pleasantries, straight to the facts. Rigid pragmatism, humorless, pedantic, controlling
  • Actions-driven: crave action, progress, always, pushing, charming. Pressure -> impulsive, irrational, vengeful
  • Reflection-driven: calm and imaginative, think about what could be rather than work with what is, can dig into a new subject for hours, applying knowledge to the real work is a weakness
  • Opinions-driven: see one side, stick to their opinions in the face of proof. Persistent workers, but can be judgmental, suspicious and sensitive
  • Reactions-based: rebels, spontaneous, creative and playful. React strongly, either, "I love it!" or "That sucks!" Under pressure, can be stubborn, negative, and blameful

RESTful APIs

At Clojure-West, this past March, former ThoughtWorker Siva Jagadeesan spoke on on how to build good web APIs using Resource Oriented Architecture (ROA) and Clojure.

The talk isn't really specific to Clojure, but is really more a primer on REST APIs.

Rants about true REST versus REST-ish, REST influenced, or partially RESTful are not especially interesting. That's not what this is. It's a nicely pragmatic guide to the architectural patterns and why you might use them.

  1. The swamp of POX (plain old XML, or more likely JSON, these days): At first glance, many, myself included, take REST to mean RPC over HTTP with XML/JSON.
  2. URI: Start thinking in terms of resources.
  3. HTTP: The HTTP verbs together with URIs define a uniform interface. CRUD operations are always handled the same way. So are error conditions.
  4. Hypermedia (HATEOAS): Possible state transitions are given in the message body, allowing the logic of the application to reside (mostly? completely?) on the server side. The key here is "discoverability of actions on a resource."

More