Monthly Archives: November 2008

Project Euler

I’ve been doing a lot of Project Euler problems in my spare time, as a sort of relaxation and challenge at the same time. As of this evening, I’ve completed 25 of the 217 problems available so far. I’ve written a few hundred lines of code to solve the problems using Haskell, C, C++, Java, Javascript, and Python. Some of those are much more concise than others.

I highly recommend it to anyone who enjoys writing code and math (especially number theory). And I especially like my “language flavor of the moment” approach. It keeps you nimble. (Though, some problems just more difficult if your language of choice doesn’t have a built-in bignum implementation — fortunately Haskell and Python do.)

More Erlang, searching binaries

My enthusiasm is for erlang is somewhat dampened by the quality of its standard library, or at least it’s mismatch with some of the things I’ve come to expect through my use of Python, Perl, and other languages.

It is somewhat bizarre to me that there is not an obvious way to make a io_device() as returned by file:open/2 from a socket() returned by gen_tcp:accept/1. Why would this be useful? So as to make it easy to do things like io:get_line/2 on it for protocols that are line-based. Much like, in C, how you can fdopen on a socket to get a FILE * from any integer file descriptor including sockets.

I suppose the right thing to do is to create a server which wraps the socket() in a io_device() interface. I’m not sure how that was going to perform, and I was just going to try doing the simplest thing that would work — which is to write my own line-buffering.

Now, in Erlang, strings are (linked) lists of chars (which are actually integers), so you don’t really want to use them for high performance IO. You instead want to use “binaries” which are arrays of integers, essentially. The standard functions for manipulation of binaries are pretty few in number, as far as I can tell. You can efficiently split a binary with split_binary/2, if you know the position you want to split it on, but if I wanted to split on a newline character (or character sequence) I’d need to be able to find the offset of that first, and I don’t see a function for searching a binary.

The obvious thing is to use pattern matching. Unfortunately, pattern matching on binaries only allows variable length sections at the end. So, I cannot do it in a single pattern match stage. Instead, it seems that I need to write a function that recurses over the binary character-by-character:

[Until I figure out how to post code in wordpress, see http://paste.lisp.org/display/69677]

I’ve yet to benchmark this and see how slow it is. I suppose it might not be that bad if the erlang compiler and VM are good. May not be much better than turning it into a string (list) and using the functions for strings:

http://www.erlang.org/pipermail/erlang-questions/2004-December/013650.html
http://www.erlang.org/pipermail/erlang-questions/2004-December/013666.html

[Edit: indeed. I need to get to sleep, but a little benchmarking suggests this is vastly faster: http://paste.lisp.org/display/69680]

At least I’m not the only one thinking along these lines: EEP 9 – Library for working with binaries

As a return to the original reason I started down this line of thought, it appears that both mochiweb (used both internally by Mochi Media and by CouchDB) and this http server use an undocumented feature:

{packet, http} puts the socket into http mode. This makes the socket wait for a HTTP Request line, and if this is received to immediately switch to receiving HTTP header lines. The socket stays in header mode until the end of header marker is received (CR,NL,CR,NL), at which time it goes back to wait for a following HTTP Request line.

And, according to that page:

The undocumented features presented in this HOWTO are undocumented because they are not supported by Ericsson. On the other hand they are used in commercially shipping systems.

That, if true, rubs me the wrong way.

Haskell and Erlang, part 1

I’ve been rather quiet here lately. For the most part, I’ve been just too busy, and in what free time I’ve had, I’ve preferred to spend it learning programming languages and related stuff.

I spent a few days playing with Haskell. I should write some more about what I found interesting about a strongly-typed pure functional language, one where you have to essentially do side-effects (such as I/O) by passing the state of the world to the function and returning it from the function, albeit with some neat shortcuts (Monads, etc.) to make that less awkward than you’d think. It was possibly even more fascinating than when I first learned Standard ML back in college.

I found that it was really difficult to express non-trivial things in Haskell however. I think this may become easier with more exposure to it in the future, but for now I’m putting it on the back burner. I think I’m going to aim to make one day a month “Haskell day” wherein I advance my exposure to Haskell, because I find it very perspective-expanding.

Largely because this renewed my belief in the usefulness of functional programming languages, I then decided to take a closer look at Erlang than I have in the past, including ordering the Joe Armstrong book. At the moment, the toy project I’m working on is a twitter client that acts as a irc daemon, so I can just connect to this in my irc client (which easily handles multiple irc networks), and then I can read twitters and send twitters from within it, not wasting interface-space with another application, or time by going to the twitter homepage.

Erlang is rather ideal for applications that do a bunch of network stuff, it provides for the easy creation of lightweight processes and the easy sending of messages between them. Sending messages is non-blocking, and the system insures that the message will be in the receiving process’ mailbox, ready to be received with a pattern-matching construct. It does not provide object-oriented programming constructs in the usual way people think of OOP, however, it can be argued that passing messages to processes is directly analogous to invoking methods on objects. (See a good blog post by Ralph Johnson and a comment thread on LtU.)

So far, I feel like I understand 95% of the language, and about 50% of OTP (Open Telecom Platform, an included set of libraries and conventions for building applications.) It feels pretty good, and practically quite usable. It does feel a little less modern as functional programming languages go. It doesn’t have currying, and some of the syntax is a little less clean than you find in Haskell. But, on the whole, it does have a lot of the features I do want, including a decent pattern matching syntax.