Clojure’s new regex syntax

Posted in Uncategorized on November 19th, 2008

Last week, Rich Hickey announced a few notable changes to Clojure, including ahead-of-time compilation and a cleaner syntax for regular expressions. Both are improvements, but the syntax is especially interesting for a reason unrelated to its function. First, a quick overview.

1. What has changed

In a sentence, fewer backslashes. The notation is now more in line with that of scripting languages, where regular expressions are first-class literals, than that of general-purpose languages like C++ or Java, where regexes are just specialized strings.

Say we are given a stream including this text:

...
<img  src="images/11/apple1.gif"/>
<img   src="images/2/bulb2.jpeg"/>
<img src="images/354/citrus32_a.png"/>
...

We want to select IMG tags and capture the basename (without extension) of each source file. This can be done in many ways; here’s a blueprint which is just barely good enough:

<img [whitespace]+
     src=" [word-char]+ / [digit]+ / ([word-char]+) ...

Converting this to Clojure’s old syntax gives us a somewhat unwieldly #"<img\\s+src=\"\\w+/\\d+/(\\w+)". A quick test:

(let [lines "...
             <img  src=\"images/11/apple1.gif\"/>
             <img   src=\"images/2/bulb2.jpeg\"/>
             <img src=\"images/354/citrus32_a.png\"/>
             ..."]
  ;; Return only the captures, not the full matches.
  (map second
       (re-seq #"<img\\s+src=\"\\w+/\\d+/(\\w+)" lines)))
 
=> ("apple1" "bulb2" "citrus32_a")

The new update to the reader allows us to remove the double escaping of the regex specials in the literal:

(map second
     (re-seq #"<img\s+src=\"\w+/\d+/(\w+)" lines)))

2. Clojure vs foo

Since we’re on the topic, here’s how Clojure’s syntax compares to popular languages.

Ruby and Perl 5

# Regular usage
/<img\s+src="\w+\/\d+\/(\w+)/
 
# Choosing a different delimiter:
 m|<img\s+src="\w+/\d+/(\w+)|     # Perl
%r|<img\s+src="\w+/\d+/(\w+)|     # Ruby

The clearest of all extant languages (at least in this regard), Ruby and Perl can avoid some extra escaping by changing the delimiter character from / to |.

Emacs Lisp

"<img\\s-+src=\"\\w+/[[:digit:]]+/\\(\\w+\\)"

Well, the expression is long and ugly. An upside is that because of the quote delimiters, forward-slashes need not be escaped.

Java

"<img\\s+src=\"\\w+/\\d+/(\\w+)"

This is the same as Clojure’s original syntax. For reference, Clojure and Java share a regex engine and are equivalent in power.

Common Lisp

Edi Weitz’s professional CL-PPCRE package is essentially the standard for dealing with regular expressions in CL.

"<img\\s+src=\"\\w+/\\d+/(\\w+)"

Also, Edi’s CL-INTERPOL provides a reader macro which simplifies regex literals to the level of Perl’s:

#?r|<img\s+src="\w+/\d+/(\w+)|

Finally, the reader macro mastery of Doug Hoyte’s Let Over Lambda gives a method of making clear, functional literals:

;; This is a callable lambda:
;; #~m|<img\s+src="\w+/\d+/(\w+)|
 
'#~m|<img\s+src="\w+/\d+/(\w+)|
 
=> (LAMBDA (#:STR236)
     (CL-PPCRE:SCAN "<img\\s+src=\"\\w+/\\d+/(\\w+)"
                    #:STR236))

3. The real reason this is neat

The modification was proposed by Chris Houser (with a simple patch) on October 7, politely debated until October 10, and committed to Clojure in r1070 on October 15.* This syntax was better, and the discussion skipped if it should be applied, directly to how.

Turnaround time for a breaking change: one week. You have to respect that velocity.

There is a feeling in the development community that Clojure has a good chance of becoming an important language. Now is the brief time when any interested programmer could contribute something significant, in an environment which recognizes intelligent contribution and lacks — for the moment — politics and tradition.

Want to help? The Clojure mailing list is high signal-to-noise, and subscribing is a good way to get acclimated. Also, communicating realtime with Rich Hickey and other Clojure experts is no more difficult than joining an IRC channel: #clojure on freenode.


* There was also a similar discussion in March, but it didn’t include a patch.

Configuring Vim right

Posted in Uncategorized on November 6th, 2008

I have spent a lot of time looking at a Vim window, and correspondingly, a lot of time testing different configurations. These are the best non-standard options I’ve found or stolen from others over the years; listed below in order of descending usefulness — though I think everything in this article is worth skimming — are tips which should have value to anyone, no matter how they like to run Vim. That is, there is minimal editorializing.

Note: no plugins are covered here, just vanilla Vim.

Essential .vimrc configuration items

For whatever reason, the following options aren’t set by default, but they should be.

  1. Turn on hidden

    Don’t worry about the name. What this does is allow Vim to manage multiple buffers effectively.

    • The current buffer can be put to the background without writing to disk;
    • When a background buffer becomes current again, marks and undo-history are remembered.

    Turn this on.

    set hidden

  2. Remap ` to '

    These are very similar keys. Typing 'a will jump to the line in the current file marked with ma. However, `a will jump to the line and column marked with ma.

    It’s more useful in any case I can imagine, but it’s located way off in the corner of the keyboard. The best way to handle this is just to swap them:

    nnoremap ' `
    nnoremap ` '

  3. Map leader to ,

    The leader character is your own personal modifier key, as g is Vim’s modifier key (when compared to vi). The default leader is \, but this isn’t located standardly on all keyboards and requires a pinky stretch in any case.

    let mapleader = ","

    <SPACE> is also a good choice. Note: you can of course have several “personal modifier keys” simply by mapping a sequence, but the leader key is handled more formally.

  4. Keep a longer history

    By default, Vim only remembers the last 20 commands and search patterns entered. It’s nice to boost this up:

    set history=1000

  5. Enable extended % matching

    The % key will switch between opening and closing brackets. By sourcing matchit.vim, it can also switch among e.g. if/elsif/else/end, between opening and closing XML tags, and more.

    runtime macros/matchit.vim

    Note: runtime is the same as source except that the path is relative to the Vim installation directory.

  6. Make file/command completion useful

    By default, pressing <TAB> in command mode will choose the first possible completion with no indication of how many others there might be. The following configuration lets you see what your other options are:

    set wildmenu

    To have the completion behave similarly to a shell, i.e. complete only up to the point of ambiguity (while still showing you what your options are), also add the following:

    set wildmode=list:longest

Recommended .vimrc configuration items

Most people like these.

  1. Use case-smart searching

    These two options, when set together, will make /-style searches case-sensitive only if there is a capital letter in the search expression. *-style searches continue to be consistently case-sensitive.

    set ignorecase 
    set smartcase

    This is usually the most useful combination.

  2. Set the terminal title

    A running gvim will always have a window title, but when vim is run within an xterm, by default it inherits the terminal’s current title.

    set title

    This gives e.g. | page.html (~) - VIM |.

  3. Maintain more context around the cursor

    When the cursor is moved outside the viewport of the current window, the buffer is scrolled by a single line. Setting the option below will start the scrolling three lines before the border, keeping more context around where you’re working.

    set scrolloff=3

    Typing zz is also handy; it centers the window on the cursor without moving the cursor. (But watch out for ZZ!)

  4. Store temporary files in a central spot

    Swap files and backups are annoying but can save you a lot of trouble. Rather than spread them all around your filesystem, isolate them to a single directory:

    $ mkdir ~/.vim-tmp  # or whatever

    And in .vimrc:

    set backupdir=~/.vim-tmp,~/.tmp,~/tmp,/var/tmp,/tmp
    set directory=~/.vim-tmp,~/.tmp,~/tmp,/var/tmp,/tmp

    This is especially valuable after an unexpected reboot — you don’t have to track down all the leftover temp files. However: if you are editing files on a shared file system, it’ll be easier to clobber concurrent modifications, as other users’ Vim processes won’t see your swaps.

  5. Scroll the viewport faster

    <C-e> and <C-y> scroll the viewport a single line. I like to speed this up:

    nnoremap <C-e> 3<C-e>
    nnoremap <C-y> 3<C-y>

  6. Enable limited line numbering

    It’s often useful to know where you are in a buffer, but full line numbering is distracting. Setting the option below is a good compromise:

    set ruler

    Now in the bottom right corner of the status line there will be something like: 529, 35 68%, representing line 529, column 35, about 68% of the way to the end.

  7. A bunch of stuff your OS should already do

    If you are running Windows or OS X or a sloppy Linux distribution, you may not be using these:

    " Intuitive backspacing in insert mode
    set backspace=indent,eol,start
     
    " File-type highlighting and configuration.
    " Run :filetype (without args) to see what you may have
    " to turn on yourself, or just set them all to be sure.
    syntax on
    filetype on
    filetype plugin on
    filetype indent on
     
    " Highlight search terms...
    set hlsearch
    set incsearch " ...dynamically as they are typed.

    The filetype lines enable type-specific configuration, such as knowledge of syntax and indentation. E.g. foo.c will be opened with Vim’s pre-configured C settings, and bar.py will be opened with Python settings.

    If the search term highlighting gets annoying, set a key to switch it off temporarily:

    nmap <silent> <leader>n :silent :nohlsearch<CR>

  8. Catch trailing whitespace

    The following will make tabs and trailing spaces visible when requested:

    set listchars=tab:>-,trail:ยท,eol:$
    nmap <silent> <leader>s :set nolist!<CR>

    By default whitespace will be hidden, but now it can be toggled with ,s.

  9. Stifle many interruptive prompts

    The “Press ENTER or type command to continue” prompt is jarring and usually unnecessary. You can shorten command-line text and other info tokens with, e.g.:

    set shortmess=atI

    See :help shortmess for the breakdown of what this changes. You can also pare things down further if you like.

  10. Stop distracting your co-workers

    Vim is a little surly, beeping at you at every chance. You can either find a way to turn off the bell completely, or more usefully, make the bell visual:

    set visualbell

    Instead of emitting an obnoxious noise, the window will flash very briefly. This is similar to screen’s interpretation of the bell in its default configuration.

Here is my own .vimrc, which includes all these settings (and some more which are less generally useful). A fairly good source for other configuration tips is the Vim Tips Wiki.

Thanks to Adam Katz and Chris Gaal for their comments and suggestions.