Tuesday, August 17, 2004

Using CL-PPCRE for weblog search

A little while ago, I wrote a search facility for my weblog. This was for personal use, and the code might be very primitive. However, here are the main functions that I used.

;;; This program depends on CL-PPCRE. It should be loaded
;;; as a package first or available in the lisp image

(defun file-to-string (file)
(apply #'concatenate 'string
(with-open-file (str file :direction :input)
(loop for line = (read-line str nil 'eof)
while (not (equal line 'eof))
collect line))))

(defun scan-in-file (srch file)
"scan for a single occurence of the string in the file"
(cl-ppcre:scan srch (file-to-string file)))

;;; Takes a list of pathnames as the arg for dir.
;;; (scan-in-directory "blah" (directory "*.txt"))
(defun scan-in-directory (srch dir)
(let ((files (loop for x in dir collect
(namestring x))))
(remove-if-not #'(lambda (file)
(scan-in-file srch file)) files)))

;;; searches the tree below current dir.
;;; (search-in-tree "question" "*.txt")
(defun search-in-tree (srch wildcard)
(let ((dir (nconc (directory wildcard)
(directory (concatenate 'string "*/" wildcard)))))
(scan-in-directory srch dir)))

;;; Creates a url that a browser can visit
(defun create-url (file-name base-dir)
(let ((base-url (ext:getenv "SERVER_NAME")))
(let ((fpname (pathname file-name)))
(let ((abs-path (pathname-directory fpname))
(name (pathname-name fpname))
(type (pathname-type fpname)))
(let ((p (position base-dir abs-path :test #'equalp)))
(let ((rem-dirs (nthcdr (1+ p) abs-path)))
(let ((rem-path (namestring (make-pathname :directory (cons :absolute rem-dirs)))))
(concatenate 'string "http://" base-url "/" base-dir rem-path name "." type))))))))

If someone is curious enough to try it, its here

Mini Weblog Search Feature

I've used my homegrown CGI library, although this server does have mod_lisp installed, which I would like to play with when I have time.


Blogger Xach said...

For what it's worth, I usually write file-to-string something like:

(defun file-to-string.xach (file)
(with-open-file (stream file)
(let* ((length (file-length stream))
(contents (make-array length :element-type 'character)))
(read-sequence contents stream)

[blogger has screwed the formatting up]

It's slightly different from your function, but much faster.

Also, it's a little uncommon to abbreviate stuff like "search" to "srch". Why not use the real word?

12:20 PM  
Blogger Sanjay Pande said...

Thanks for the code Zach. You could probably use "pre" tags for formatting. As far as the abbreviation to srch is concerned, I have no idea why I did that. Its understandable enough to not be confused with anything else I guess.

10:15 AM  

