The less-familiar parts of Lisp for beginners — gensym

Our attention now lands on gensym.  The novice Lisp programmer coming from C++ has undoubtedly seen examples using gensym, but I know that its use tends to be cargo-culted at first, it’s this magic sauce whose behaviour is not really well understood at first.  I hope to be able to explain a little more about gensym, so that the novice Lisp programmer uses it for the appropriate reasons, and understands what’s happening.  If you haven’t seen gensym in action, you probably want to review my series of posts in which I developed some useful (to me) macros.

So, the newcomer to Lisp has read about gensym, that it produces a “new uninterned symbol” guaranteed to be unique.  This explanation is perfectly correct, but a newcomer might be fooled into thinking that “uninterned symbol” is why gensym has uniquely useful behaviour when writing macros.  A naive reading says, “oh, this returns a symbol that is not interned, so it’s not currently in use, so there are no possible collisions with other symbols in my code”.  This is not the case.  After all, most variables a programmer builds are not interned, they are local symbols created by let and its relatives, or passed as parameters.  How does gensym know it’s not going to conflict with one of those other uninterned symbols?  And what if a later form loading in the same file interns a symbol that wasn’t interned before, leading to a collision?  So, what is really going on with gensym?  For that, we’re going to discuss readtables a bit.

The readtable in Lisp is a mechanism for examining, and possibly altering, the text stream coming from a file during load/compile.  Forms that come from a file or from the REPL (the interactive reader of the Lisp instance) pass through the Lisp reader, and certain text constructs might be adjusted before the Lisp core itself sees them, using something called “read-macros”.  The readtable controls the invocation and behaviour of these read-macros.  An important feature of read-macros is that they are not applied to the result of macro expansion, so they do not get a shot at modifying text generated by macros.

OK, so what does this have to do with gensym?  Well, let’s see what gensym does:
 

CL-USER> (gensym)
#:G789

This is the output from SBCL, but you will see the same pattern with CLISP and ecl.  The symbol produced starts with the two-character sequence #:.  This isn’t merely an aesthetic choice, this is a symbol name that is difficult to get past the reader.  There is a read-macro that looks for symbols starting with those two characters, and when it sees them, it behaves differently.  It outputs a fresh symbol that is guaranteed not to be eq to any other symbol, whether interned or not.

Let’s see how that plays out.  Here’s a “normal” symbol, without the magic prefix sequence:
 

CL-USER> (let ((a 10)) a)
10

That’s pretty familiar.  We use a let form to make a new symbol ‘a’.  It’s not interned, but it is visible in the scope of the let, so when the last statement in the form is ‘a’, the Lisp instance returns the value of that local variable, 10.

Now, we’ll do exactly the same thing with a variable name that has the special prefix:
 

CL-USER> (let ((#:a 10)) #:a)
; in: LET ((#:A 10))
;     (LET ((#:A 10))
;       #:A)
; 
; caught STYLE-WARNING:
;   The variable #:A is defined but never used.

; in: LET ((#:A 10))
;     (LET ((#:A 10))
;       #:A)
; 
; caught WARNING:
;   undefined variable: #:A
; 
; compilation unit finished
;   Undefined variable:
;     #:A
;   caught 1 WARNING condition
;   caught 1 STYLE-WARNING condition
; Evaluation aborted on #<UNBOUND-VARIABLE A {10039012F3}>.

And… what just happened?  This construct is almost exactly the same, but the results are different.  We are told that the variable #:A is defined but not used, and that the variable #:A is undefined.  The reader has produced distinct symbols for the two occurrences of #:A, so as far as the Lisp instance is concerned, the first #:A and the second #:A are two different variables.  That explains why the first one is defined but not used, and the second one is undefined, and this is what’s behind the magic of gensym.  The gensym function produces a symbol that can’t be matched in any text that arrives at the Lisp reader while passing through the read-macros, so can never collide with your variable names in macro bodies.

Now, let’s revisit a bit what I mentioned about macro expansion.  Of course, a variable name that never matches itself isn’t very useful from a programming perspective, so how are these variables used in macros?  Because read-macros are not applied to the output of macro expansion, the special prefix characters lose their specialness in the reader, and so the symbol representing the variable is eq to itself, and behaves just like any other variable name, with or without the prefix sequence.

Finally, let’s look at some simple Lisp code:
 

CL-USER> (let ((my-sym '#:a))
           `(let ((,my-sym 10))
              ,my-sym))
(LET ((#:A 10))
  #:A)

This looks a bit like what you would see in a macro definition.  Note that I have deliberately used the #: prefix sequence.  The Lisp backquote is a bit simpler than it appears at first.  It’s a lot like the single-quote used to generate quoted lists, but has the additional property of allowing the comma to inject values from the surrounding scope inside the literal list.  So, in the code fragment above, I’ve defined a symbol in a let, then said, “return this literal list from the function, but where you see a comma, substitute the value of the variable from the surrounding scope.  The returned value is not “code”, it’s a list.  During macro expansion, this list is inserted into the code as if it were typed in, but without read-macros being in effect.  So, even though we saw above that this let form doesn’t work when typed at the line, it does work if you can get it into the Lisp instance without passing it through read-macros:
 

CL-USER> (eval
          (let ((my-sym '#:a))
            `(let ((,my-sym 10)) 
               ,my-sym)))
10

So, anyway, that’s the point I’m trying to make about gensym.  Its special properties in the writing of macros derives not from the fact that it always returns a different symbol (though that’s critically important when you need more than one new symbol in a macro expansion), but from the fact that these symbols won’t match any symbol that the user enters in code, even if they type a name that looks identical to that returned by gensym.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

反垃圾邮件 / Anti-spam question * Time limit is exhausted. Please reload CAPTCHA.