| Email: | pjscott@iastate.edu |
| GitHub: | https://github.com/PeterScott |
| Address: | 1313 Frederiksen Court, Ames IA 50013 |
| Phone: | (515) 572-7714 |
When you get right down to it, most of my relevant skills involve moving data around, and digging through it looking for interesting stuff. For example, how would you make a spell-checker? You might get a dictionary and do approximate matching based on a weighted sum of Levenshtein distance and word frequency -- look for common words that are close to what the user typed. You could extend that, and instead of using simple edit distance, assume that it's more common to make typos where the keys are right next to each other -- an S in place of a D, for instance. Or you could take more data, and do approximate string matching with sequences of two or three letters that occur frequently in a corpus of text, to get spelling suggestions that look not only at the word itself, but also at its context.
I say this not because of any particular obsession with spell-checking, but because it's how I tend to think, when I've got a lot of data and want to make sense of it. What statistical patterns can we find? How can we break it down, and get something from it that actual people can use? Even if it's just something like writing a hairy regular expression to recognize most of the ways people write dates and times, there's usually some way to get useful information out of anything.
Obviously it's not all simple math, and just storing that much data can be unwieldy, let alone actually querying it. I've got the technical skills to deal with that part -- scraping data from messy unstructured sources, storing and indexing it, moving it around a network, and making it queryable. I've programmed in a bunch of languages, but if I had to list a few that I'm particularly comfortable with, I'd list Python, JavaScript, Lisp, C, and Haskell. The easiest way to get a picture of my skills is to look at some of my projects and open source contributions.
Caprice is a multi-user text editor, similar to Etherpad or Google Wave; I did a whole bunch of realtime communication stuff for grad school. The browser communication happens over WebSockets or long polling, via socket.io and node.js. The database behind it is Redis. I've got a demo running on Amazon AWS right now if you'd like to poke at it.
MiRNA-prediction-nodejs is a web app for predicting whether an RNA sequence is a precursor micro-RNA, using a combination of a random forest and a support vector machine to do the machine learning and classification. It's kind of a hack (though, remarkably, less so than is typical in the surprisingly hairy world of bioinformatics), but it's very slick to use. Here's a very short screencast showing it in action.
pwstore is a secure password storage library, using PBKDF1-SHA256 to hash salted passwords slowly enough to prevent brute-force password cracking. I noticed there wasn't anything like this available for Haskell, and the solutions people were writing tended to have serious security holes, because doing the Right Thing was too much of a hassle, so I wrote a library to make it trivial. More about this on my blog, if you're curious.
hbeanstalk is a client library for the beanstalkd work queue server. I didn't write it. What I did do was re-write the socket-handling and protocol code, making it faster and fixing some serious bugs. (On that note, I'm also a contributor to the corresponding client library in Python, beanstalkc.)
I have a blog. I don't update it as often as I'd like to, but people seem to like some of the technical articles. If nothing else, it proves that I can write comprehensibly.
B.S. in Electrical Engineering (2009), and M.S. in Computer Engineering (2011, coming in May), both from Iowa State University.