Sunday, June 20, 2004

Regular Expressions with CL

Regex or regular expression search capabilities have become quite common now. They are a major factor in the popularity of Perl on *nix based systems. Perl has a highly optimized regex engine, written in C, and usually outperforms most others. Some CL dialects also come with their own regex engines like Clisp with the "REGEXP" package. However, using that would result in non-portable expressions.

So there is CL-PPCRE, or Portable Perl Compatible Regular Expressions for CL. For me the real test of any piece of open source software is how easy it is to get, install and use. CL-PPCRE was a breeze to install. The easiest way is untar the files to a location and then run

(load "load.lisp")

which takes care of all compilation and loading. I tested it with Clisp, Openmcl and SBCL on Mac OS X and it worked fine. Since it is written with ANSI CL, it should function with any compatible CL dialect, which makes it extremely portable. It also comes with MK:DEFSYSTEM and ASDF packaging if you prefer to install it that way.

Since it is so useful, I did an image save with Clisp and copied over my current Lisp image. This is with Clisp.

(ext:saveinitmem)

which saves the image file to lispinit.mem. I replaced my current lispinit.mem startup file with this which has my original lisp image + cl-ppcre. My lisp image resides in /sw/lib/clisp/full/. You can also always start clisp by pointing to an image file.

# clisp -M lispinit.mem

Most Common Lisp's have an image-save function, so it would be easy to replicate this. Once you have cl-ppcre, it has two packages - cl-ppcre and cl-ppcre-test. cl-ppcre has all the common regex functions like scan, replace, split etc.

When compiled with CMUCL, it outperforms Perl in most cases. This is one great package.

1 Comments:

Anonymous Anonymous said...

I'm not sure that Perl's regexp engine is known as much for being fast as for being featureful. Traditionally, it's pushed the boundaries with embedded code, interpolated subpatterns, various assertions, etc. The subpatterns actually make it (modulo bugs) a full context-free parser, and back-references probably make it context-sensitive. CL-PPCRE is indeed sweet, and Perl's no dog, but it's not the fastest engine out there for everyday matches.

7:20 PM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home