More Erlang, searching binaries

My enthusiasm is for erlang is somewhat dampened by the quality of its standard library, or at least it’s mismatch with some of the things I’ve come to expect through my use of Python, Perl, and other languages.

It is somewhat bizarre to me that there is not an obvious way to make a io_device() as returned by file:open/2 from a socket() returned by gen_tcp:accept/1. Why would this be useful? So as to make it easy to do things like io:get_line/2 on it for protocols that are line-based. Much like, in C, how you can fdopen on a socket to get a FILE * from any integer file descriptor including sockets.

I suppose the right thing to do is to create a server which wraps the socket() in a io_device() interface. I’m not sure how that was going to perform, and I was just going to try doing the simplest thing that would work — which is to write my own line-buffering.

Now, in Erlang, strings are (linked) lists of chars (which are actually integers), so you don’t really want to use them for high performance IO. You instead want to use “binaries” which are arrays of integers, essentially. The standard functions for manipulation of binaries are pretty few in number, as far as I can tell. You can efficiently split a binary with split_binary/2, if you know the position you want to split it on, but if I wanted to split on a newline character (or character sequence) I’d need to be able to find the offset of that first, and I don’t see a function for searching a binary.

The obvious thing is to use pattern matching. Unfortunately, pattern matching on binaries only allows variable length sections at the end. So, I cannot do it in a single pattern match stage. Instead, it seems that I need to write a function that recurses over the binary character-by-character:

[Until I figure out how to post code in wordpress, see http://paste.lisp.org/display/69677]

I’ve yet to benchmark this and see how slow it is. I suppose it might not be that bad if the erlang compiler and VM are good. May not be much better than turning it into a string (list) and using the functions for strings:

http://www.erlang.org/pipermail/erlang-questions/2004-December/013650.html
http://www.erlang.org/pipermail/erlang-questions/2004-December/013666.html

[Edit: indeed. I need to get to sleep, but a little benchmarking suggests this is vastly faster: http://paste.lisp.org/display/69680]

At least I’m not the only one thinking along these lines: EEP 9 – Library for working with binaries

As a return to the original reason I started down this line of thought, it appears that both mochiweb (used both internally by Mochi Media and by CouchDB) and this http server use an undocumented feature:

{packet, http} puts the socket into http mode. This makes the socket wait for a HTTP Request line, and if this is received to immediately switch to receiving HTTP header lines. The socket stays in header mode until the end of header marker is received (CR,NL,CR,NL), at which time it goes back to wait for a following HTTP Request line.

And, according to that page:

The undocumented features presented in this HOWTO are undocumented because they are not supported by Ericsson. On the other hand they are used in commercially shipping systems.

That, if true, rubs me the wrong way.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*