Critiquing Clojure

Posted in Uncategorized on June 22nd, 2009

Last week Brian Carper published Five Things that Mildly Annoy Me in Clojure. Here’s another short list. Brian’s criticisms are perhaps more substantive than mine; I can’t claim to be more than a dilettante of the language.

Also: I hope this won’t be received as an attempt to dogpile what is perhaps the most compelling of a new wave of programming languages. The points below — in order of increasing subjectivity — are but a by-product of genuine interest.

1. No dynamic binding of dynamic variables

In Clojure, variables defined using def and its variants have dynamic scope. Dynamic variables provide a way to circumvent the functional, side effect-free nature of Clojure:

(def *foo* 0)
(defn print-foo []
  (println *foo*))
 
(print-foo)
=> 0
 
;; The binding form rebinds dynamic variables:
(binding [*foo* 1]
  (print-foo)
  (println *foo*)
  (binding [*foo* 2]
    (print-foo)))
=> 1
   1
   2
 
;; Compare with let, which binds lexically:
(let [*foo* 1]
  (print-foo)
  (println *foo*)
  (let [*foo* 2]
    (print-foo)))
=> 0
   1
   0

Common Lisp has analogues for the forms above, and also has the PROGV special form which allows dynamic re-binding of dynamic variables at runtime. That is, the names of the variables which are to be rebound don’t need to be known at compile time as they do in Clojure. See an example below:

(defvar *foo* 0)
(defvar *bar* 1)
 
(defun print-foo-bar ()
  (format t "~A ~A~%" *foo* *bar*))
 
(defun call-print-foo-bar (&optional vars vals)
  (progv vars vals
    (print-foo-bar)))
 
(call-print-foo-bar)
=> 0 1
(call-print-foo-bar '(*foo*) '(5))
=> 5 1
(call-print-foo-bar '(*foo* *bar*) '(5 7))
=> 5 7

The arguments to CALL-PRINT-FOO-BAR could come from a config file, an input stream, or anywhere else.

In practice, PROGV is used rarely — though effectively — and perhaps that’s an argument for excluding it from Clojure. A related argument is that use of dynamic variables should be discouraged in Clojure. But since PROGV represents functionality which can’t be built back into the language through macrology, its omission is a pity.

2. Functions must be declared before first call form

There may be a good technical reason for this, but it feels un-Lisplike. In Common Lisp, you can do this:

(defun fn1 ()
  (fn2 :foo))
 
(defun fn2 (arg)
  arg)

Even in Ruby:

def fn1
  fn2 :foo
end
 
def fn2(arg)
  arg
end

But in Clojure, you must do this:

(declare fn2)
 
(defn fn1 []
  (fn2 :foo))
 
(defn fn2 [arg]
  arg)

It’s a piece of bookkeeping which really should be the purview of the compiler.

3. No keyword arguments in lambda-lists

Functions in Common Lisp can accept named arguments:

(defun func (a b c &key d e)
  (list a b c d e))
 
(func 1 2 3 :e 5)
=> (1 2 3 NIL 5)

This becomes helpful as a function accumulates many arguments — undesirable but inevitable — and it’s also useful in a short function when the purpose of its arguments can’t be made clear by its name. Which of the following calls is more descriptive:

(set-permissions resource t nil t)
(set-permissions resource :read t :write nil :execute t)

The Clojure defn and fn forms can’t take keyword arguments. A somewhat idiomatic alternative is to include a map parameter which is destructured for the function’s arguments:

(defn func [a b c {d :d e :e}]
  (list a b c d e))
 
(func 1 2 3 {:e 5})
=> (1 2 3 nil 5)
 
(set-permissions resource {:read t :write nil :execute t})

This works, but it feels like a mild hack. Also, Common Lisp’s definition of func is much easier to parse at a glance than Clojure’s is.

A couple additional notes:

  • The Clojure functions atom and ref (at least) do take a couple keyword arguments, but this is done through a manual parse of the arg list instead of by way of built-in language support.
  • clojure-contrib includes a macro defnk which wraps defn to allow keywords. It’s nice to have, but only direct uses of defnk can benefit — that is, general fn or defn forms or the macros which wrap them are necessarily left out.

4. Commas are whitespace

I think the intent here is to appeal to programmers of popular languages, where invariably commas are used to separate function arguments. So you can do any of the things below, and maybe you prefer one way to the other:

(defn func1 [foo, bar, baz]
  ...)
 
(defn func2 [foo bar baz]
  ...)
 
(func1 1 2 3)
(func2 1, 2, 3)

I will admit it may be easier to quickly scan maps having commas, especially when both keys and values are keywords:

{:color :blue :input :keyboard :os :linux}
{:color :red, :input :mouse, :os :windows}

But speaking generally, commas-as-whitespace only adds line width and noise. And worse, it causes a break in convention from Lisp’s backquote facility:

;; Scheme / Common Lisp
`(foo bar ,baz)
;; Clojure
`(foo bar ~baz)

Commas are easier to scan for in backquoted lists than tildes are, as they hang lower than most other glyphs. Also, they’re a more sensible counterpart to backquotes. I do approve of moving away from tired Lisp conventions, but sometimes different isn’t better.

5. Syntax for type specification is ugly

To add type hints to variables, Clojure uses a shortcut version of the metadata reader macro:

(let [#^Integer a x
      #^Integer b y]
  (list a b))

It looks gross, though I admit I have no better alternative to offer. Common Lisp does an equally bad job here, probably worse, with its DECLARE placement restrictions and THE verbosity. Static languages like Java and C will probably always be able to specify types more clearly.

Type coercion is cleaner:

(let [a (int 2)
      b (int 3)]
  (list a b))

But it means something different, even though it’s similar in this context. Also, only certain types can be coerced.

To programmers reading this to discover reasons to chicken out learning Clojure: it would be only on rare occasions that you’d want to insert type hints in Clojure code anyway.

6. All the good names are taken

Lisp-1 vs. Lisp-2 is an old debate, but here are a few observations all the same. Like Scheme, Clojure is a Lisp-1, which means that functions and variables share the same namespace. Names of functions in Clojure are often kept short to allow for compact code. So, these words and many more are “reserved”, as they name core functions:

  • map
  • vec
  • fn
  • seq
  • set
  • str
  • count
  • key
  • val

The drawback is that these would also make great variable names when vagueness is a virtue, but since they’re already used, programmers must use names like a-fn, a-map, the-vec, etc. instead. (You could still name variables e.g. map or str, but if you do, you shadow those functions and confuse people who read your code.) By itself this would just be a little awkward, but as the API does often use parameter names like coll — since there is no function named coll — it’s also inconsistent.

If Clojure were a Lisp-2, you could and would go ahead and use those names as variables without shadowing the functions. I can understand why many people dislike Lisp-2-ness. I felt the same way before I learned Common Lisp. Now I prefer it, and I’m certain it’s something anyone can get used to.

Lisp-2 is little harder to bend your head around. Which of these two calls is right?

(funcall fn arg1 arg2)
(funcall #'fn arg1 arg2)

Answer: both, but they mean different things. In the first call, fn is a variable which holds a function as its value. In the second call, fn had previously been declared as a function, and #' is basically a lookup for the name ‘fn’ in the function namespace. (That’s not quite what is going on, but it’s an easy way to think about it.) So, the first form calls the function stored in fn and the second form calls the function which is actually named ‘fn’.

7. Conclusion

Clojure is becoming observably more popular. It’s a bridge allowing both Lisp and Java to move forward, and it’s well-positioned for the coming age of widespread parallelization. If these snags are some of the worst things to find in Clojure, that’s more a complement than a condemnation.

Extensibility in Vim and Emacs

Posted in Uncategorized on April 7th, 2009

Emacs and Vim both provide facility for extension. But as they represent divergent philosophies — Vim following the “small is beautiful” and “do one thing well” precepts of Unix, Emacs coming from a belief that the editor is an operational hub — they have different objectives here.

This is a comparison article with a focus on plugin development. The following articles offer more general comparisons of the editors, and are worth reading:

I’ve written previously on writing Vim plugins and using Emacs Lisp.

1. Development resources

For me, programming has never been a sit-down-and-go activity. It’s more like traditional writing: I think about what it is I want to say, do a little fact checking, and then find a way to say it. Programming is perhaps easier than writing, though, because the mechanism and the medium share the same space, so your tools can meet you half way.

i. Integrated help

The :help command in Vim is not just for new users; it’s also a great boon for experienced users. Each section and subsection of Vim’s extensive documentation includes one or more descriptive keywords. These appear in the :help command’s completion list to make it easier to find what you are looking for, even when you don’t know exactly what it is.

Imagine we wish to know the syntax for ignoring character case in a regex. Typing “:help ignore<TAB>” shows:

:help ignore
/ignorecase          filetype-ignore      +wildignore
'ignorecase'         g:netrw_ignorenetrc  'eventignore'
ignore-errors        'foldignore'         'noignorecase'
efm-ignore           'wildignore'

Trial-and-error will prove /ignorecase as the help section we want. “:help case<TAB>” shows a similar list.

The equivalent in Emacs is probably C-h d, apropos-documentation. It’s actually a little handier than :help: because the convention in Emacs Lisp is to supply docstrings for API symbols, apropos-documentation searches a larger space having a higher chance of relevancy.

Relatedly, there are C-h f, describe-function, and C-h v, describe-variable, which present full API documentation and can even jump to the definition of a given symbol. Really convenient.

ii. Reference material

When programming, no matter your skill or experience, it’s imperative to have some kind of reference material. Documentation is okay, but existing code is the real blessing. Here is a place where Emacs shines. The basic core of the editor is written in C (opaquely), but the rest of its functionality is written in Emacs Lisp and is immediately available for viewing.

$ dpkg -L emacs-snapshot-el | grep '\.el\.gz$' | wc -l
1125
$ gunzip --stdout `dpkg -L emacs-snapshot-el | grep '\.el\.gz$'` | wc -l
1243201

Over one million lines-of-code in 1125 files — a treasure trove.

This is harder to measure in Vim. Its base package includes about 1000 .vim files, but the vast majority are filetype or indentation specifiers, or colour themes, none of which are generally of use to a programmer. There is an unofficial vim-scripts package in Debian-based distributions, though it too skews toward themes. Here are the numbers anyway:

$ dpkg -L vim-scripts | grep '\.vim$' | wc -l
172
$ cat `dpkg -L vim-scripts | grep '\.vim$'` | wc -l
47972

2. Provisions for extension

As an environment for editor extensions, Emacs can be fairly called full-featured; Vim can’t. (I’m in a good position to make this statement having written sizable plugins for both editors.)

Rather than write a feature-by-feature comparison, I’ll just list some things Emacs natively supports which Vim does not, or does only poorly:

i. Key capture/filename entry

  • As a convenience for input, Emacs provides simple minibuffer functions such as read-file-name and read-buffer as well as more general read-from-minibuffer and read-string, which offer a lot of customization. Capturing key presses is made easy with read-key-sequence; to respond to general user action, watch functions can be added to hook variables such as post-command-hook.
  • Vim offers input() for using the command-line in a script, but it is limited. Key capture is very difficult; getchar() exists, but is incomplete. Inexplicably, there is no CharacterPress autocmd event (or anything similar). The only certain way to capture keys is to remap all of them to call a user function, then later restore all previous mappings. A bad hack.

ii. Specialized buffers

  • Creating a new temporary buffer or a special-purpose buffer in Emacs can be done using with-temp-buffer and generate-new-buffer, or several other calls.
  • Vim provides no standard means to create a special buffer; the documentation recommends to create a new buffer and to set these options:

    :setlocal buftype=nofile
    :setlocal bufhidden=hide
    :setlocal noswapfile

    The problem is, this new buffer will also inherit many user settings, such as wrap, number, foldcolumn, cursorline, spell, and sidescroll. There is no way — that I know of — to create a fresh, blank buffer. For correctness, all of these settings should be enumerated and explicitly turned off upon buffer creation. To make this even less satisfactory, not all buffer settings can be set per-buffer.

iii. Programmatic window cycling/traversal

Emacs has many useful functions:

  • walk-windows — equivalent to mapping (window-list) through a given function.
  • window-tree — returns a tree representing the window layout in the given frame.
  • save-window-excursion — screw around with the current layout temporarily, without repercussion.
  • Many more. And as a bonus, the minibuffer can be manipulated with standard window functions.

Similar functions in Vim:

  • Like walk-windows: there is :windo, but it only works on commands, not functions, and gives up completely on any minor error.
  • The closest analogue to window-tree may be winnr(), a multi-purpose function. It returns the following, depending on context:
    1. The number of the current window (to be used as an argument in other functions)
    2. The count of open windows
    3. The number of the last accessed window
  • Like save-window-excursion: winrestcmd() generates a sequence of storable window commands that can be called to restore the current window configuration; but it doesn’t always work. There are also winsaveview() and winrestview(), but they also don’t always work.

iv. Buffer ordering

When enumerating buffers for some purpose, the most suitable ordering is often most-recently used (MRU). This is how Emacs acts by default with e.g. buffer-list. Vim seems to order buffers by most recently opened, or maybe it’s more arbitrary. MRU is possible in Vim, but it must be hacked in manually by use of autocmds, watching these events:

  • BufEnter
  • BufDelete
  • BufWipeout

v. Environment variable interpretation

  • Emacs: getenv/setenv, process-environment, various helper functions.
  • Vim:

    :let $VAR = value  " set
    :let var = $VAR    " get

    Sometimes works when setting options:

    :set term=$TERM    " works
    :set history=$NUM  " does not work

vi. Text deletion

A minor problem (or convenience) of Vim is that every scripted delete action will clobber the unnamed register and numbered registers that we use for quick cut+pastes. (Emacs folks: this could be like losing your kill ring every time you run find-file or dired.) I struggled with this for a long time before learning of Vim’s blackhole register. In general, the behaviour of Vim’s delete registers is intricate and unlikeable.

vii. Completion

  • Emacs has heavy-duty completion support, from simple, high-level functions such as read-buffer and read-variable to accept values at the minibuffer, to try-completion and all-completions which can be used programmatically without action from a user.
  • Vim command definitions can be specified with a special argument to include completion support at the ex command line. It’s a little awkward, though. Copy/paste the following block and then type “:Finger <TAB>” in Vim for a demonstration:

    command -complete=custom,ListUsers -nargs=1 Finger !finger <args>
    fun ListUsers(A,L,P)
        return system("cut -d: -f1 /etc/ passwd")
    endfun

    See :help :command-completion for an explanation. I don’t think Vim has built-in support for programmatic completion.

Speaking broadly: Emacs feels engineered, while Vim gives the impression of having grown piecemeal. As a platform it is awkward and missing some useful bits. This can be liberating, as one need not worry about duplicating something already part of the API, but it’s also limiting.

Vim offers bindings into other languages, and you may choose to eschew Vim Script and mitigate some of these issues. But a side effect of doing so is that you’ll need to account for differences in string syntax when directly interfacing with the editor. Check out these Ruby-to-Vim special character escaping functions:

def vim_single_quote_escape(s)
  # Everything in a Vim single-quoted string is literal, except single
  # quotes.  Single quotes are escaped by doubling them.
  s.gsub("'", "''")
end
def vim_filename_escape(s)
  # Escape slashes, open square brackets, spaces, and double quotes
  # using backslashes.
  s.gsub(/[\['" \\]/, '\\\\\0')
end
def vim_regex_escape(s)
  # Escape lots of stuff.
    s.gsub(/[\]\[.~"^$\\*]/,'\\\\\0')
end

It took me a few releases to get these right. Admittedly, this glue is the cost of doing business with an external language.

3. Portability requirements

(To be clear: “portability” here refers to accounting for differences between versions of the same editor, whether running on the same operating system or not.)

When developing for Vim, it’s unlikely that portability will be an issue; Vim Script has changed little in the last few years. Also, since upgrading Vim is painless, users will do so, and therefore it’s common for a plugin writer to target only the most recent major release. There’s even an integrated feature to automatically update plugins since Vim 7.1, which suggests faith there won’t be breaking changes in the future.

As discussed in a previous article, portability must be a greater concern to developers of Emacs packages. It’s probably a good idea to choose a portability target during development instead of trying to hack it in after the fact.

4. Conclusion

I’d feel uncomfortable to mark this as a win for Emacs or a loss for Vim, even given all the evidence. The basic philosophies of the two editors are distinct enough to make the comparison unfair. To have a video editor integrated into Vim would be neither funny nor useful, while for Emacs it’s perhaps both.

But I do think it’s fair to say that Vim Script is an ugly language, a DSL which has been stretched to the breaking point. It’s an intellectual dead-end. Emacs Lisp feels more like a true programming language and has much the elegance of any other Lisp. And to learn it pays dividends, as it overlaps in respectable portion with Common Lisp.

Surveying Emacs Lisp

Posted in Uncategorized on March 31st, 2009

Emacs is well known for being highly configurable. In large part, this is due to its close tie to some form of Lisp over the majority of its long and varied history. Within all extant implementations of the editor, the Lisp is Emacs Lisp, a language which is hard to love but easy to like.

1. Overview

Emacs Lisp (inconsistently abbreviated to Elisp) is descended from MacLisp and looks a lot like Common Lisp.

(defun fibonacci (n)
  (if (< n 2)
      1
    (+ (fibonacci (- n 1))
       (fibonacci (- n 2)))))

Being a Lisp, it is inherently extensible, and so is well-suited as an embedded language. Actually, a case can be made that Emacs Lisp is more powerful than many general-purpose programming languages. Some of its notable features:

  • lambdas and first-class functions
  • macros
  • byte-code compilation
  • an interactive shell/REPL (through the *scratch* buffer)
  • an integrated debugger and profiler

Considering its complete isolation to a sub-platform, Emacs Lisp is very successful. Tools that have been written in it include several IRC clients, a popular mail/news reader, two different terminal emulators, a web browser, and even a video editor. Here are several excellent programming resources I’ve found helpful:

2. Portability considerations

Emacs users tend to customize their editor to a great extent, so upgrading to a new version can be a dog, and therefore extension writers are quite conservative about portability.

And alas, the situation is doubly complicated. Developers must consider not only compatibility between versions (e.g. 21 to 22), but also compatibility between variants, as GNU Emacs and XEmacs are both popular.[1] My informal survey of ITA Software, an Emacs-heavy development house, seems to show that engineers stick with what they know; those who picked up Emacs during the periods GNU Emacs lagged behind XEmacs continue to use XEmacs, while most of the others use GNU Emacs. It’s split pretty evenly.

It’d be useful to identify the actual usage breakdown among the different versions of the editor, but these numbers don’t exist; Debian Popularity Contest can at least provide an estimate:

Tracked usage of Emacs packages in Debian
Package Installs Notes
emacsen-common 15890 (required package for all Emacs installs)
emacs21 7522 GNU
emacs22 4738 GNU
xemacs21 2211
emacs-snapshot 427 GNU (at the time of writing, v23.0)

Likewise, Ubuntu Popularity Contest:

Tracked usage of Emacs packages in Ubuntu
Package Installs Notes
emacsen-common 86805 (required package for all Emacs installs)
emacs21 32358 GNU
emacs22 16640 GNU
xemacs21 7394
emacs-snapshot 4764 GNU (at the time of writing, v23.0)

Again, at best these tables should be considered datapoints in a larger census, but they do seem to show GNU Emacs as significantly more popular than XEmacs, and 21 as the widest-used GNU Emacs version. (Can anyone provide additional stats?)

[1] Other variants such as Aquamacs may deserve a mention here as well — I have no information about their install bases.

3. A short case study

Two years ago I wrote a filesystem explorer / buffer switcher plugin for Vim and then one year ago brought it to Emacs. It had been written in Ruby, which has rough language parity with Emacs Lisp. (No macros vs. no closures.)

i. Initial development

Since Emacs Lisp is not really object-oriented, I dropped the original formal inheritance model. Also, I removed several classes outright, as existing functionality in Emacs — stuff which didn’t exist in Vim — made them redundant.

A minimal port was finished in a few hours and 200 lines-of-code. The current version sits at 494 lines. For comparison, the Vim plugin (on which development continues in parallel) is 1466 lines.

From the Vim plugin, here is an inefficient Ruby function to convert an array of strings into column_count columns (reading downward, i.e. the way ls works):

def columnize(strings, column_count)
  rows = (strings.length / Float(column_count)).ceil
 
  # Break the array into sub arrays representing columns
  cols = strings.inject([[]]) { |array, e|
           if array.last.size < rows
             array.last << e
           else
             array << [e]
           end
           array
         }
 
  return cols
end

And below, the close equivalent in Emacs Lisp. It uses push+nreverse instead of append (<<) above, which obfuscates the algorithm a little:

(defun columnize (strings column-count)
  "Break the list STRINGS into sublists representing columns."
  (let ((nrows (ceiling (/ (length strings)
                           (float column-count)))))
    (nreverse
     (mapcar 'nreverse
             (reduce (lambda (lst e)
                       (if (< (length (car lst))
                              nrows)
                           (push e (car lst))
                         (push (list e) lst))
                       lst)
                     strings
                     :initial-value (list (list)))))))

There is room for improvement in both cases — 10 pride points to the person who codes up the best rewrite of either function. :-)

So far I’ve had to invest much less development time on the Emacs extension, though in fairness it has benefited from having an original to crib from. Also, it’s currently less featureful than the Vim plugin.

ii. Backporting

I had targeted GNU Emacs 23, since that’s the version I use. The port to GNU Emacs 22 was trivial. A port to GNU Emacs 21 is difficult for the lack of several convenience functions, so it’s on the back burner.

I spent about an hour on a tricky port to XEmacs before deciding that for two reasons, I couldn’t justify the effort:

  1. Ugly portability wrappers would complicate the otherwise compact code. A concise assert becomes an awkward block:[2]
    ;; Before
    (assert (minibufferp))
     
    ;; After
    (assert (if (boundp 'xemacsp)
                (eq (window-buffer (minibuffer-window))
                    (current-buffer))
              (minibufferp)))
  2. This package was written for my use, and I haven’t yet had reason to use XEmacs.

Perhaps if I had targeted GNU Emacs 21 as my baseline, portability wouldn’t have been as much of an issue; the library couldn’t have been as easy to write, however, and I believe side-projects especially should optimize for development time.

[2] A colleague — and also a more capable Emacs user — suggests to isolate portability wrappers as a partial solution. This is good practice in general. So, in perhaps a separate file:

(when (not (boundp 'minibufferp))
  (defun minibufferp ()
    (eq (window-buffer (minibuffer-window))
        (current-buffer))))
(provide 'lusty-xemacs)

Then, within the main code:

(when (boundp 'xemacsp)
  (require 'lusty-xemacs))
 
;; ...
 
;; Clean again!
(assert (minibufferp))

This is likely what I’ll do when I re-examine the XEmacs port.

4. The Common Lisp compatibility package

The comprehensive cl package provides many great extensions to Emacs Lisp which make the language more approachable, especially to Common Lisp programmers. Some of its nifty additions:

  • defun*, defmacro* (adding implicit blocks and keyword arguments)
  • flet, labels, macrolet
  • loop, do
  • lexical-let
  • multiple-value-bind, destructuring-bind
  • case, ecase, typecase
  • setf, incf/decf, define-modify-macro
  • push, pushnew, pop

Pragmatically, XEmacs loads this package by default. GNU Emacs, however, lightly discourages its use. From the reference manual:

… we have a policy that packages installed in Emacs must not load cl at run time. … If you are writing packages that you plan to distribute and invite widespread use for, you might want to observe the same rule.

And also:

Please don’t require the cl package of Common Lisp extensions at runtime. Use of this package is optional, and is not part of the standard Emacs namespace.

An imperfect workaround using eval-when-compile is advocated instead. I suggest to disregard the undertone of this note; Emacs Lisp benefits considerably from the inclusion of cl, and it’s time to move forward. In his article about Ejacs, Emacs expert Steve Yegge likewise recommends unrestrained use of cl.

5. Conclusion

Under the all-important light of getting things done, Emacs Lisp is not a bad language. Emacs users are fortunate, and they should be thankful — users of other editors are not so lucky.

Thanks to Ron Gut and Chris Burke for their comments and suggestions.

Clojure’s new regex syntax

Posted in Uncategorized on November 19th, 2008

Last week, Rich Hickey announced a few notable changes to Clojure, including ahead-of-time compilation and a cleaner syntax for regular expressions. Both are improvements, but the syntax is especially interesting for a reason unrelated to its function. First, a quick overview.

1. What has changed

In a sentence, fewer backslashes. The notation is now more in line with that of scripting languages, where regular expressions are first-class literals, than that of general-purpose languages like C++ or Java, where regexes are just specialized strings.

Say we are given a stream including this text:

...
<img  src="images/11/apple1.gif"/>
<img   src="images/2/bulb2.jpeg"/>
<img src="images/354/citrus32_a.png"/>
...

We want to select IMG tags and capture the basename (without extension) of each source file. This can be done in many ways; here’s a blueprint which is just barely good enough:

<img [whitespace]+
     src=" [word-char]+ / [digit]+ / ([word-char]+) ...

Converting this to Clojure’s old syntax gives us a somewhat unwieldly #"<img\\s+src=\"\\w+/\\d+/(\\w+)". A quick test:

(let [lines "...
             <img  src=\"images/11/apple1.gif\"/>
             <img   src=\"images/2/bulb2.jpeg\"/>
             <img src=\"images/354/citrus32_a.png\"/>
             ..."]
  ;; Return only the captures, not the full matches.
  (map second
       (re-seq #"<img\\s+src=\"\\w+/\\d+/(\\w+)" lines)))
 
=> ("apple1" "bulb2" "citrus32_a")

The new update to the reader allows us to remove the double escaping of the regex specials in the literal:

(map second
     (re-seq #"<img\s+src=\"\w+/\d+/(\w+)" lines)))

2. Clojure vs foo

Since we’re on the topic, here’s how Clojure’s syntax compares to popular languages.

Ruby and Perl 5

# Regular usage
/<img\s+src="\w+\/\d+\/(\w+)/
 
# Choosing a different delimiter:
 m|<img\s+src="\w+/\d+/(\w+)|     # Perl
%r|<img\s+src="\w+/\d+/(\w+)|     # Ruby

The clearest of all extant languages (at least in this regard), Ruby and Perl can avoid some extra escaping by changing the delimiter character from / to |.

Emacs Lisp

"<img\\s-+src=\"\\w+/[[:digit:]]+/\\(\\w+\\)"

Well, the expression is long and ugly. An upside is that because of the quote delimiters, forward-slashes need not be escaped.

Java

"<img\\s+src=\"\\w+/\\d+/(\\w+)"

This is the same as Clojure’s original syntax. For reference, Clojure and Java share a regex engine and are equivalent in power.

Common Lisp

Edi Weitz’s professional CL-PPCRE package is essentially the standard for dealing with regular expressions in CL.

"<img\\s+src=\"\\w+/\\d+/(\\w+)"

Also, Edi’s CL-INTERPOL provides a reader macro which simplifies regex literals to the level of Perl’s:

#?r|<img\s+src="\w+/\d+/(\w+)|

Finally, the reader macro mastery of Doug Hoyte’s Let Over Lambda gives a method of making clear, functional literals:

;; This is a callable lambda:
;; #~m|<img\s+src="\w+/\d+/(\w+)|
 
'#~m|<img\s+src="\w+/\d+/(\w+)|
 
=> (LAMBDA (#:STR236)
     (CL-PPCRE:SCAN "<img\\s+src=\"\\w+/\\d+/(\\w+)"
                    #:STR236))

3. The real reason this is neat

The modification was proposed by Chris Houser (with a simple patch) on October 7, politely debated until October 10, and committed to Clojure in r1070 on October 15.* This syntax was better, and the discussion skipped if it should be applied, directly to how.

Turnaround time for a breaking change: one week. You have to respect that velocity.

There is a feeling in the development community that Clojure has a good chance of becoming an important language. Now is the brief time when any interested programmer could contribute something significant, in an environment which recognizes intelligent contribution and lacks — for the moment — politics and tradition.

Want to help? The Clojure mailing list is high signal-to-noise, and subscribing is a good way to get acclimated. Also, communicating realtime with Rich Hickey and other Clojure experts is no more difficult than joining an IRC channel: #clojure on freenode.


* There was also a similar discussion in March, but it didn’t include a patch.

Configuring Vim right

Posted in Uncategorized on November 6th, 2008

I have spent a lot of time looking at a Vim window, and correspondingly, a lot of time testing different configurations. These are the best non-standard options I’ve found or stolen from others over the years; listed below in order of descending usefulness — though I think everything in this article is worth skimming — are tips which should have value to anyone, no matter how they like to run Vim. That is, there is minimal editorializing.

Note: no plugins are covered here, just vanilla Vim.

Essential .vimrc configuration items

For whatever reason, the following options aren’t set by default, but they should be.

  1. Turn on hidden

    Don’t worry about the name. What this does is allow Vim to manage multiple buffers effectively.

    • The current buffer can be put to the background without writing to disk;
    • When a background buffer becomes current again, marks and undo-history are remembered.

    Turn this on.

    set hidden

  2. Remap ` to '

    These are very similar keys. Typing 'a will jump to the line in the current file marked with ma. However, `a will jump to the line and column marked with ma.

    It’s more useful in any case I can imagine, but it’s located way off in the corner of the keyboard. The best way to handle this is just to swap them:

    nnoremap ' `
    nnoremap ` '

  3. Map leader to ,

    The leader character is your own personal modifier key, as g is Vim’s modifier key (when compared to vi). The default leader is \, but this isn’t located standardly on all keyboards and requires a pinky stretch in any case.

    let mapleader = ","

    <SPACE> is also a good choice. Note: you can of course have several “personal modifier keys” simply by mapping a sequence, but the leader key is handled more formally.

  4. Keep a longer history

    By default, Vim only remembers the last 20 commands and search patterns entered. It’s nice to boost this up:

    set history=1000

  5. Enable extended % matching

    The % key will switch between opening and closing brackets. By sourcing matchit.vim, it can also switch among e.g. if/elsif/else/end, between opening and closing XML tags, and more.

    runtime macros/matchit.vim

    Note: runtime is the same as source except that the path is relative to the Vim installation directory.

  6. Make file/command completion useful

    By default, pressing <TAB> in command mode will choose the first possible completion with no indication of how many others there might be. The following configuration lets you see what your other options are:

    set wildmenu

    To have the completion behave similarly to a shell, i.e. complete only up to the point of ambiguity (while still showing you what your options are), also add the following:

    set wildmode=list:longest

Recommended .vimrc configuration items

Most people like these.

  1. Use case-smart searching

    These two options, when set together, will make /-style searches case-sensitive only if there is a capital letter in the search expression. *-style searches continue to be consistently case-sensitive.

    set ignorecase 
    set smartcase

    This is usually the most useful combination.

  2. Set the terminal title

    A running gvim will always have a window title, but when vim is run within an xterm, by default it inherits the terminal’s current title.

    set title

    This gives e.g. | page.html (~) - VIM |.

  3. Maintain more context around the cursor

    When the cursor is moved outside the viewport of the current window, the buffer is scrolled by a single line. Setting the option below will start the scrolling three lines before the border, keeping more context around where you’re working.

    set scrolloff=3

    Typing zz is also handy; it centers the window on the cursor without moving the cursor. (But watch out for ZZ!)

  4. Store temporary files in a central spot

    Swap files and backups are annoying but can save you a lot of trouble. Rather than spread them all around your filesystem, isolate them to a single directory:

    $ mkdir ~/.vim-tmp  # or whatever

    And in .vimrc:

    set backupdir=~/.vim-tmp,~/.tmp,~/tmp,/var/tmp,/tmp
    set directory=~/.vim-tmp,~/.tmp,~/tmp,/var/tmp,/tmp

    This is especially valuable after an unexpected reboot — you don’t have to track down all the leftover temp files. However: if you are editing files on a shared file system, it’ll be easier to clobber concurrent modifications, as other users’ Vim processes won’t see your swaps.

  5. Scroll the viewport faster

    <C-e> and <C-y> scroll the viewport a single line. I like to speed this up:

    nnoremap <C-e> 3<C-e>
    nnoremap <C-y> 3<C-y>

  6. Enable limited line numbering

    It’s often useful to know where you are in a buffer, but full line numbering is distracting. Setting the option below is a good compromise:

    set ruler

    Now in the bottom right corner of the status line there will be something like: 529, 35 68%, representing line 529, column 35, about 68% of the way to the end.

  7. A bunch of stuff your OS should already do

    If you are running Windows or OS X or a sloppy Linux distribution, you may not be using these:

    " Intuitive backspacing in insert mode
    set backspace=indent,eol,start
     
    " File-type highlighting and configuration.
    " Run :filetype (without args) to see what you may have
    " to turn on yourself, or just set them all to be sure.
    syntax on
    filetype on
    filetype plugin on
    filetype indent on
     
    " Highlight search terms...
    set hlsearch
    set incsearch " ...dynamically as they are typed.

    The filetype lines enable type-specific configuration, such as knowledge of syntax and indentation. E.g. foo.c will be opened with Vim’s pre-configured C settings, and bar.py will be opened with Python settings.

    If the search term highlighting gets annoying, set a key to switch it off temporarily:

    nmap <silent> <leader>n :silent :nohlsearch<CR>

  8. Catch trailing whitespace

    The following will make tabs and trailing spaces visible when requested:

    set listchars=tab:>-,trail:·,eol:$
    nmap <silent> <leader>s :set nolist!<CR>

    By default whitespace will be hidden, but now it can be toggled with ,s.

  9. Stifle many interruptive prompts

    The “Press ENTER or type command to continue” prompt is jarring and usually unnecessary. You can shorten command-line text and other info tokens with, e.g.:

    set shortmess=atI

    See :help shortmess for the breakdown of what this changes. You can also pare things down further if you like.

  10. Stop distracting your co-workers

    Vim is a little surly, beeping at you at every chance. You can either find a way to turn off the bell completely, or more usefully, make the bell visual:

    set visualbell

    Instead of emitting an obnoxious noise, the window will flash very briefly. This is similar to screen’s interpretation of the bell in its default configuration.

Here is my own .vimrc, which includes all these settings (and some more which are less generally useful). A fairly good source for other configuration tips is the Vim Tips Wiki.

Thanks to Adam Katz and Chris Gaal for their comments and suggestions.

Extending the ITERATE macro

Posted in Uncategorized on October 26th, 2008

In a previous article I gave an overview of ITERATE but punted on showing its best feature, the ability to extend it to support new iteration constructs. These can be separated into two groups:

  1. Gatherers, like collecting or summing, for accumulating or reducing an expression;
  2. Drivers, introduced with for or generate, for changing a value in a pattern or looping over a data structure.

The generate driver type didn’t appear in the article comparing LOOP and ITERATE (it’s wholly a feature of ITERATE), so here first is a short introduction.

1. Generators

A generate clause is like a lazy for; its named variable will only update its value when told with NEXT. An example:

;; First, the regular `for`:
(iter (for el in '(a b c))
      (for i from 10)
      (when (primep i)
        (collect (list i el) into primes))
      (collect el into letters)
      (finally (return (values letters
                               primes))))
=> (A B C)
   ((11 B))
 
;; Now, using `generate` for el instead:
(iter (generate el in '(a b c))
      (for i from 10)
      (when (primep i)
        (collect (list i (next el)) into primes))
      (collect el into letters)
      (finally (return (values letters
                               primes))))
=> (NIL A A B B B B C C)
   ((11 A) (13 B) (17 C))

With these semantics, generate can appear anywhere for can appear:

(generate i upfrom 0)
(generate c in-string "black")
(generate form in-file "sexp.lisp")
(generate sym in-package :ITERATE)

2. Writing a new gathering clause

In the previous article I wrote a quick dividing clause:

(defmacro dividing (num &keys (initial-value 0))
  `(reducing ,num by #'/ initial-value ,initial-value))
 
(iter (for i in '(10 5 2))
      (dividing i :initial-value 100))
=> 1

This was easy, but it’s not fully idiomatic; :initial-value must be specified as a keyword, and I can’t collect the value into a variable without modifying the macro. Updating the macro with this functionality makes it noticeably more complex:

(defmacro dividing (num &key (initial-value 0) into)
  `(reducing ,num by #'/
             initial-value ,initial-value
             ,@(when into
                 `(into ,into))))

We can solve both problems cleanly by using instead ITERATE’s DEFMACRO-CLAUSE:

(defmacro-clause (DIVIDING expr
                  &optional
                  INTO var
                  FROM start)
  "Divide into a variable"
  `(reducing ,expr by #'/ into ,var initial-value ,start))
 
;; Now this works:
(iter (for i in '(10 5 2))
      (dividing i from 100 into quotient)
      (multiplying i into product)
      (finally (return (values product quotient))))
=> 100
   1

DEFMACRO-CLAUSE takes care of setting up support for both keyword and regular symbol syntax, provides more informative error messages when the clause is misused, and adds the clause to DISPLAY-ITERATE-CLAUSES:

(display-iterate-clauses 'dividing)
DIVIDING &OPTIONAL INTO FROM  Divide into a variable

Lastly, to continue the LOOP idiom of naming clause actions in both their infinitival and present-participle form, we can use DEFSYNONYM:

(defsynonym divide dividing)

3. Writing a new driver

The easiest way to write new for clauses is with DEFMACRO-DRIVER. ITERATE offers in, on, in-vector, in-string, the more general in-sequence, in-hashtable, and more. Say we want to write a driver to iterate over the leaves of a tree. This is a little tricky since tree traversal is usually written recursively. To start us off, here’s a function which will return all leaves:

(defun collect-leaves (tree)
  "Return all leaf nodes in depth-first order."
  (iter (with stack = (list tree))
        (while stack)
        (for node = (pop stack))
        (if (consp node)
            (destructuring-bind (l . r) node
              (unless (endp r)
                (push r stack))
              (push l stack))
            (collect node))))
 
(collect-leaves '(((a b) (c) d) e (f (g (h)) i) j))
=> (A B C D E F G H I J)

We need only splice this into a macro, with some modifications:

(defmacro-driver (FOR leaf IN-TREE tree)
  "Iterate over the leaves in a tree"
  (let ((gtree (gensym))
        (stack (gensym))
        (kwd (if generate 'generate 'for)))
    `(progn
       (with ,gtree = ,tree)
       (with ,stack = (list ,gtree))
       (,kwd ,leaf next
             (let ((next-leaf
                    (iter (while ,stack)
                          (for node = (pop ,stack))
                          (if (consp node)
                              (destructuring-bind (l . r)
                                  node
                                (unless (endp r)
                                  (push r ,stack))
                                (push l ,stack))
                              (return node)))))
               (or next-leaf (terminate)))))))
  • The (if generate 'generate 'for) bit makes in-tree compatible with both generate and for.
  • The form introduced by the symbol next is the code that will run on each iteration of for or on occurrence of the NEXT operator with generate.
  • Here we’ve slipped an ITERATE call inside the driver, effectively nesting ITERATE loops. This wasn’t necessary; DO or LOOP could also have been used.
  • TERMINATE is a special form in an ITERATE driver to indicate completion.

Finally, bringing it all together:

(iter (for leaf in-tree '(((2 3) (5) 1) 8 (4 (1 (2)) 2) 3))
      (collecting leaf into leaves)
      (multiplying leaf into product)
      (dividing leaf into quotient initial-value 2000)
      (finally (return (values leaves
                               product
                               quotient))))
=> (2 3 5 1 8 4 1 2 2 3)
   11520
   25/144

Appendix: Extending the LOOP macro

SBCL developer Richard Kreuter showed me an implementation-specific way to incorporate new syntax into LOOP. The linked code adds a for clause for binding multiple value returns (functionality that exists in ITERATE, but not the standard LOOP macro). This is meant less as a point of comparison than as proof that LOOP can indeed be extended.

loop-values-path.lisp

Usage:

(loop for dividend in '(1 2 3 4 5)
      for (quotient remainder) being the values of
          (floor dividend 2)
      do (format t "~D ~D~%" quotient remainder))
0 1
1 0
1 1
2 0
2 1

See also this file from CLSQL which provides LOOP extensions for iterating over records and supports seven Common Lisp compilers:

loop-extension.lisp

Thanks to Sam Freilich for his comments and suggestions.

Comparing LOOP and ITERATE

Posted in Uncategorized on October 19th, 2008

Looping is the most common non-trivial construct in imperative programming, so having a domain language for generating iteration code is unquestionably a good thing. I’ll explore two options within Common Lisp:

  1. LOOP is a standard macro with an expressive syntax and built-in support for several iterative patterns. It’s lauded and criticized in equal portion.
  2. ITERATE is a second looping macro created outside of the standard as an answer to the problems and limitations of LOOP.

Superficially, the differences between them are few:

(loop for i from 0
      for el in '(a b c d e f)
      collect (cons i el))
=> ((0 . A) (1 . B) (2 . C) (3 . D) (4 . E) (5 . F)) 
 
(iter (for i from 0)
      (for el in '(a b c d e f))
      (collect (cons i el)))
=> ((0 . A) (1 . B) (2 . C) (3 . D) (4 . E) (5 . F))

The Common Lisp standard also includes the looping macros DO and DO*. They’re widely used but can be terse and obscure; in any case, I’m leaving them out of the comparison.

1. Views on LOOP

The following are the three most frequent criticisms of LOOP:

  1. It doesn’t look like Lisp. The degenerate case is hash table iteration. I need to look this up every time:

    (loop for key being the hash-keys in *some-table*
              using (hash-value val)
          collect (list key val))

    As a general compromise, some developers use keyword syntax for LOOP forms, though it’s not a definite improvement:

    (loop :for counter :downfrom 20 :downto 0 :by 2
          :when (zerop (mod counter 3))
          :collect counter)
    => (18 12 6 0)
  2. Editors auto-indent LOOP poorly. I work for one of the preeminent employers of Lisp developers, yet no one I know has an Emacs configuration which indents all corners of LOOP correctly. (That is, for my definition of correct; there’s no standard.)
  3. LOOP can behave unpredictably when for clauses interact in a certain way. However, in my experience this behaviour only presents itself in examples contrived to show it.

For these reasons and others, Paul Graham doesn’t recommend LOOP, and Peter Seibel remains neutral about it. But it has its place.

See also these articles:

2. The purpose of ITERATE

The ITERATE website makes two main claims:

  1. It’s extensible
  2. It helps editors auto-indent by having a more Lisp-like syntax

Both are true. ITERATE is as much a mini-language as LOOP, but its clauses look like, and frequently are, regular Lisp forms.

;; A LOOP example
(loop for i upto 20
      if (oddp i)
        collect i into odds
      else
        collect i into evens
      finally (return (values evens odds))) 
=> (0 2 4 6 8 10 12 14 16 18 20)
   (1 3 5 7 9 11 13 15 17 19)
 
;; The equivalent in ITERATE
(iter (for i from 0 to 20)
      (if (oddp i)
          (collect i into odds)
          (collect i into evens))
      (finally (return (values evens odds))))
=> (0 2 4 6 8 10 12 14 16 18 20)
   (1 3 5 7 9 11 13 15 17 19)

The IF above is not an ITERATE construct — it’s the normal Common Lisp operator. The LOOP if clause will be too, but only after a couple layers of macroexpansion.

Another example, harder to reproduce in LOOP:

(macrolet ((divisorp (n m)
             `(zerop (mod ,n ,m))))
  (iter (for i from 0 to 30)
        (cond ((divisorp i 2)
               (collect i into twos))
              ((divisorp i 3)
               (collect i into threes))
              ((divisorp i 5)
               (collect i into fives)))
        (finally (return (values twos threes fives)))))
=> (0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30)
   (3 9 15 21 27)
   (5 25)

It’s true that distinguishing an ITERATE form from a regular form is less straightforward than it is with LOOP, but in practice the distinction is rarely important. A benefit of the intermixing is that ITERATE clauses such as collect may appear arbitrarily deep in the body, in contrast to LOOP where they must fall within the mini-language portion.

Also: for comparison with LOOP’s egregious hash table iteration syntax in (1), here is the cleaner equivalent in ITERATE:

(iter (for (key val) in-hashtable *some-table*)
      (collecting (list key val)))

ITERATE’s extensibility is discussed in-depth in a second article. Through some means, LOOP can apparently be extended, but it’s not done often or easily. SBCL, at least, provides LOOP extension hooks, but using them ties you to that compiler.

3. Comparing looping clauses

In most cases, ITERATE is a superset of functionality.

*Accumulation*

LOOP offers collecting, nconcing, and appending. ITERATE has these and also adjoining, unioning, nunioning, and accumulating.

(iter (for el in '(a b c a d b))
      (adjoining el))
=> (A B C D)
 
(iter (for lst in '((a b c) (d b a) (g d h)))
      (unioning lst))
=> (A B C D G H)

accumulating is an accumulator builder. Here’s how to implement unioning:

(iter (for lst in '((a b c) (d b a) (g d h)))
      (accumulating lst by #'union))
=> (A B C G D H)
*Reduction*

LOOP has summing, counting, maximizing, and minimizing. ITERATE also includes multiplying and reducing. reducing is the reduction builder:

(iter (with dividend = 100)
      (for divisor in '(10 5 2))
      (reducing divisor by #'/ initial-value dividend))
=> 1

A simple macro to lessen code noise:

(defmacro dividing (num &keys (initial-value 0))
  `(reducing ,num by #'/ initial-value ,initial-value))
 
(iter (for i in '(10 5 2))
      (dividing i :initial-value 100))
=> 1

(Obviously the above is better stated (/ 100 10 5 2), but imagine leveraging this clause in a more complicated loop.)

The ITERATE package provides a macro, DEFMACRO-CLAUSE, to create new clauses more idiomatically; read more about it in the follow up article.

For comparison, this article describes one way of writing new reduction constructs for LOOP.

*Boolean aggregation*

These are the same in both: always, never, thereis, corresponding to the functions EVERY, NOTANY, and SOME. ITERATE can specify both always and never in the same loop, but not to much purpose.

*Finding*

There is no analogue for finding in LOOP. The ITERATE website provides a good use case:

(iter (for lst in '((a) (b c d) (e f)))
      (finding lst maximizing (length lst)))
=> (B C D) 
 
;; The rough equivalent in LOOP:
(loop with max-lst = nil
      with max-key = 0
      for lst in '((a) (b c d) (e f))
      for key = (length lst)
      do
      (when (> key max-key)
        (setf max-lst lst
              max-key key))
      finally (return max-lst))
=> (B C D)

finding is a pattern for using the result of one expression based on the result of another.

*Control flow*

ITERATE has next-iteration, which is like continue in C or next in Perl. It’s a major inconvenience of LOOP that it doesn’t have this construct. ITERATE also offers the (if-first-time then else) form and the first-iteration-p var for conditioning on the initial iteration in cases which aren’t covered by patterns.

*Destructuring*

In a for clause, ITERATE and LOOP have the same syntax and same destructuring capabilities. However, ITERATE can also “destructure” multiple-value returns:

(for (values (a . b) c d) = (three-valued-function ...))

The consing equivalent in LOOP:

for ((a . b) c d) = (multiple-value-list
                     (three-valued-function ...))

LOOP can do destructuring in a with clause, but ITERATE cannot. The manual cites implementation difficulty.

*Parallel binding*

This is what DO does and DO* doesn’t, and it’s strictly unsupported in ITERATE. LOOP includes this optionally with an and clause:

(loop for el in '(a b c d e)
      and prev-el = nil then el
      collect (list el prev-el))
=> ((A NIL) (B A) (C B) (D C) (E D))

The ITERATE documentation states:

“My view is that if you are depending on the serial/parallel distinction, you are doing something obscure”

I have to agree. Also, the useful case of parallel binding in the LOOP example above can be accomplished with another ITERATE concept, variable backtracking:

(iter (for el in '(a b c d e))
      (for prev-el previous el)
      (collect (list el prev-el)))
=> ((A NIL) (B A) (C B) (D C) (E D))

4. Documentation

There are many resources out there for learning to use LOOP. The LOOP for Black Belts chapter of Practical Common Lisp is my favourite. The Common Lisp Quick Reference is also excellent in its treatment of LOOP.

For ITERATE, there really is only the manual, but it’s thorough. It includes another comparison of LOOP and ITERATE. There’s also a DISPLAY-ITERATE-CLAUSES function which comes with the package and can provide dynamic assistance:

(display-iterate-clauses)
INITIALLY    Lisp forms to execute before loop starts
AFTER-EACH   Lisp forms to execute after each iteration
ELSE         Lisp forms to execute if loop is not entered
FINALLY      Lisp forms to execute after loop ends
;; ...
 
(display-iterate-clauses 'multiply)
MULTIPLY &OPTIONAL INTO   Multiply into a variable

It’d be great to see this integrated into SLIME.

5. Obtaining ITERATE

It’s easiest to do this with ASDF-Install. To get ITERATE working in a repl or scratch buffer:

;; (Example below in SBCL.)
CL-USER> (asdf:oos 'asdf:load-op :ASDF-INSTALL)
;; stuff
 
CL-USER> (asdf-install:install "ITERATE")
;; lots of stuff
;; (only needs to be done once.)
 
CL-USER> (defpackage "MY-PACKAGE" (:use "CL" "ITERATE"))
 
#<PACKAGE "MY-PACKAGE">
CL-USER> (in-package "MY-PACKAGE")
 
#<PACKAGE "MY-PACKAGE">
MY-PACKAGE> (iter ...)  ;; etc.

6. Conclusion

In almost every case, ITERATE is more convenient to use and more powerful than LOOP. If you’re in control of your own project and aren’t religiously partial to DO (or functional programming), ITERATE is worth a try.

Thanks to Richard Kreuter, Alec Berryman, and John O’Laughlin for their comments and suggestions.

Writing a Vim plugin

Posted in Uncategorized on October 11th, 2008

Vim is fairly extensible. Unlike Emacs or Eclipse, it’s just an editor, not a platform. It is, however, very featureful, and includes its own slightly eccentric domain language as well as bindings into a few others. Note: this isn’t a HOWTO, it’s just a few things to consider before jumping in.

1. Vim Script

Here is a neat lineage: the Unix line editor ed evolved into the more advanced ex which was used as the basis for the command mode in vi and extended into the roughly Turing-complete mini-language of Vim. And development continues; in his latest release, Bram Moolenaar has added native support for floating point numbers.

Vim Script supports many regular programming concepts: loops, lists, dictionaries, exceptions, etc. But the language is odd. Here’s some code showing the hoops one must jump through to map a bit of functionality to a key:

function! s:doSomething()
  " stuff
endfunction
 
command DoSomething :call <SID>doSomething()
nmap k :DoSomething
  • A trailing ! on function enables redefinition.
  • The s: in the function definition and the <SID> in the command declaration are a thin but credible form of namespace management. They’ll expand to a unique name at read time so that similarly named functions in other files are not clobbered.
  • function could be replaced equivalently with fu, fun, func, etc. This follows for all Vim commands. As long as a token can uniquely complete into a keyword, it is valid.

As in many other languages, statements can be wrapped using a \ character. Unlike in those languages, in Vim Script it must appear at the beginning of the succeeding line:

if some_exceedingly_long_expression ||
   \ a_second_expression
  echo 'Success'
endif

2. Alternative plugin languages

It isn’t widely known that Vim has interfaces into several popular scripting languages: Python, Ruby, Perl, Scheme, and Tcl. These are more powerful than Vim Script but have certain drawbacks in use.

  1. Debugging is difficult. Foreign code is interpreted by what is essentially one giant eval. If you misplace a close parenthesis or an end keyword, you will have to track it down yourself.
  2. Integration with Vim is slight. The calling interface is in a table below. Most interaction with the editor is tunneled through Vim::evaluate or Vim::command (or equivalent) as a string argument.
  3. Many Vim installations don’t include external language support by default. It’s an easy fix for a user running a deb or rpm-based Linux distribution, but will require a recompile on Windows, something a Windows user is not wont to do.

Point (3) is particularly unfortunate if you are making an extension you intend to distribute. I wrote perhaps the largest Ruby-based plugin for Vim available; the majority of people who contact me about it are only looking for installation help!

This is what it looks like to interact with Vim from Ruby:

# Setting options inside the editor is pretty
# straightforward.
VIM::set_option "noinsertmode"
VIM::set_option "hlsearch"
 
# ...unless you want to set an option local to a buffer.
# There is no API call for this, so we must go up one
# layer of abstraction:
VIM::command "setlocal nowrap"
VIM::command "setlocal spell"
VIM::command "setlocal foldcolumn=0"

If we’re going to do an odd call in multiple places, it makes sense to add some glue:

def VIM::has_syntax?
  # All return values from `evaluate` are strings, and
  # "0" evaluates to true in ruby.
  VIM::evaluate('has("syntax")') != "0"
end

See below the partial interfaces for a few languages. Obviously there’s room for improvement in the API:

Editor concept Ruby Python MzScheme
eval VIM::evaluate vim.eval (eval)
command VIM::command vim.command (command)
option VIM::set_option (get-option),
(set-option)
output VIM::message sys.stdout
buffer VIM::Buffer vim.buffers (get-next-buff),
(get-prev-buff)
window VIM::Window vim.windows (get-win-list)
current
buffer
$curbuf vim.current.buffer (curr-buff)
current
window
$curwin vim.current.window (curr-win)
range vim.current.range (range-start),
(range-end)
Manual Manual Manual

(Tcl and Perl not shown.)

For some strange reason, the Scheme interface also offers (beep), and the Tcl interface, ::vim::beep.

As an interesting sidenote, these extension languages have access to window handles within Vim, allowing deterministic window management. Vim Script doesn’t seem to support this, so using an alternative language may offer a superset of functionality.

3. Choosing a language

This can be summarized like so:

Vim Script

Pros:

  • Great integrated :help system.
  • Lots of other plugins you can crib from.

Cons:

  • The language is awkward.

 

Other (Perl, Python, Ruby, Tcl, Scheme)

Pros:

  • Strong languages.
  • Experience carries over into other pursuits.

Cons:

  • Debugging is hard.
  • Interface to Vim is slight.
  • May require the user to install language libraries.
  • Syntax highlighting in a .vim file is easily confused.
  • Neglected by both plugin writers and Vim developers.

 

Using Vim Script means giving up the good stuff: closures, object-orientation, higher-order functions, reflection, and metaprogramming. So despite the extra work, selecting an alternative language is recommended for non-trivial extensions. Perhaps support within Vim will improve as more plugin writers follow this route.

Thanks to Jesse Funaro and Chris Gaal for their comments and suggestions.

Some notes about Clojure

Posted in Uncategorized on October 4th, 2008

Last week, Rich Hickey came to speak at the monthly Boston Lisp Meeting about his new-ish programming language, Clojure. His presentation went double length, but few left early. Rich showed an impressive breadth of knowledge, fielding some tough questions and making a strong case for his language.

I fooled around with Clojure a little and wanted to share some of the things which haven’t gotten much play so far.

1. Implicit destructuring in binding forms

See examples below:

(def flat "flat")
(def tree '(("one" ("two")) "three" ((("four")))))
 
;; Simple binding (like Common Lisp's LET*).
(let [var1 flat
      var2 tree]
  (list var1 var2))
 
-> ("flat" (("one" ("two")) "three" ((("four")))))
 
;; Full destructuring.
(let [var1 flat
      [[a [b]] c [[[d]]]] tree]
  (list var1 a b c d))
 
-> ("flat" "one" "two" "three" "four")
 
;; Partial destructuring.
(let [[[a [b]] & leftover :as all] tree]
  (list a b leftover all))
 
-> ("one" "two" ("three" ((("four"))))
    (("one" ("two")) "three" ((("four")))))
 
;; Works on strings, too.
(let [[a b c & leftover] "123go"]
  (list a b c leftover))
 
-> (\1 \2 \3 (\g \o))

2. Unnamed arguments for short lambdas

Pretty straightforward:

(map #(+ % 4) '(1 2 3)) 
 
-> (5 6 7)
 
;; Multiple arguments.
(map #(* %1 %2) '(1 2 3) '(4 5 6))
 
-> (4 10 18)

This is roughly equivalent to Arc’s [+ _ 4] form, though allows for more than one argument. The standard lambda form is also similar to Arc’s:

(map (fn [x] (+ x 4)) '(1 2 3)
 
-> (5 6 7)

3. Data structures as functions

When a map or vector is called as a function, its argument is used as a value lookup.

(let [myvec [100 200 300]
      mymap {:a 1 :b 2 :c 3}]
  (list (myvec 1)
        (mymap :c)))
 
-> (200 3)

For maps, symbols and keywords can also work in a functional context:

(:a {:a 1 :b 2 :c 3})
 
-> 1
 
('b {'a 1 'b 2 'c 3})
 
-> 2

4. Implicit gensyms

Perhaps a minor point, but handy for cleaner macros.

(defmacro inc-safe [var]
  `(let [var# ~var]
     (if (integer? var#)
       (inc var#)
       0)))

In a backquoted form, ~var means to unquote (evaluate) var (like ,var in other Lisps) and var# creates a gensym whose name persists within the form. Compare to the equivalent in Common Lisp:

(defmacro 1+-safe (var)
  (let ((gvar (gensym)))
    `(let ((,gvar ,var))
       (if (integerp ,gvar)
           (1+ ,gvar)
           0))))

5. “[...]” syntax when code is not in an executing context

Look at the argument lists in any of the excerpts above to see what I mean. I’m kind-of on the fence on this one — on one hand it is semantically clearer, on the other it overloads the vector syntax. Also, I think round parentheses are just nicer looking.

6. SLIME integration

SLIME is the Superior Lisp Interaction Mode for Emacs, and it’s something you can’t live without once you’ve tried it. Clone this git repository and follow the instructions in the README.

$ git clone git://github.com/jochu/swank-clojure.git

 


 

There are, however, a couple design choices which are less appealing, perhaps especially to Common Lisp programmers.

1. Only single-value function returns

I believe this was briefly explained as being limited by the rules of stack allocation in the JVM. The restriction can be mitigated somewhat by returning a list of values which are then destructured:

(defn foo []
  (list 'aaa 'bbb))
 
(let [[a b] (foo)]
  (list b a))
 
-> (bbb aaa)

Common Lisp emulation in Emacs Lisp follows this path. It is a leaky abstraction, though, because foo in the example will always return a list. Even if we only care about the first value, we must acknowledge in every call that it returns others. Compare with Common Lisp’s GETHASH: it has two return values, yet in most contexts it’s treated as having only one.

2. Iteration construct that looks more like recursion

Clojure is a functional language with immutable types. One might expect it would then tend toward recursive programming. Unfortunately, there’s no tail-call optimization in Clojure currently, as it isn’t supported by the JVM; since it’s certainly a sought feature, likely this will change soon. In the meantime, where efficiency matters one can use loop ... recur, which looks a lot like intra-function recursion.

(defn loop-test []
  (loop [collection (list)
         i 0]
    (if (< i 5)
      (recur (conj collection i) (inc i))
      collection)))
 
-> (4 3 2 1 0)

For simple iterative patterns, there are also for and doseq.

 


 

All in all, Clojure is very interesting. I should note I’ve only mentioned superficial stuff; the larger part of the second half of the presentation was Rich expounding on the ease of concurrency and the value of software transactional memory in the language, but these are larger topics (and over my head).

Thanks to Alec Berryman and George Polak for their comments and suggestions.