In an effort to avoid using confusing language in earlier postings, I produced a description of intern out of sequence. You can read it here.
Tag Archives: lisp
The less-familiar parts of Lisp for beginners — initialize-instance
Next in our list of functions that the newcomer to Lisp might not have encountered is initialize-instance. Now, you might immediately think of this as a function to be used by the implementation, of little interest to the programmer, but this is actually a very useful method when you’re programming with objects.
So, initialize-instance is used by the implementation as it constructs instances of objects, and is called by make-instance. What makes it interesting to the Lisp programmer? It’s that you can define a specialized :after method for make-instance. That is, after the make-instance call has done those things that you asked of it, such as filling in :initarg values and executing :initform forms, your :after method will be called to do additional work. Where the primary initialize-instance method looks like a C++ initialization list, the :after method looks like the body of the constructor. It is here that the programmer can perform initialization operations based on the values of slots in the class.
Here is an example, part of some code that I wrote for working with probability distributions:
(defclass uniform-distribution (distribution) ((low-bound :reader get-low :initarg :low :initform 0.0) (high-bound :reader get-high :initarg :high :initform 1.0) (width :reader get-width) (height :reader get-height))) (defmethod initialize-instance :after ((ud uniform-distribution) &key) (unless (< (get-low ud) (get-high ud)) (error "Bad values of low/high for uniform distribution.")) (setf (slot-value ud 'width) (float (- (get-high ud) (get-low ud)))) (setf (slot-value ud 'height) (/ 1.0 (- (get-high ud) (get-low ud)))) (set-limits ud (get-low ud) (get-high ud) (get-low ud) (get-high ud)))
This code allows the user to build a uniform-distribution object with make-instance, optionally supplying overrides for the lower and upper bounds of the distribution, the domain. Once the low-bound and high-bound slots are filled, the :after specialized initialize-instance method is called which sanity-checks the passed values and then uses them to fill in the other two slots. Finally, it calls a method on the base class (not shown) to fill in some data that the base class needs.
The less-familiar parts of Lisp for beginners — handler-bind and handler-case
Our next stop in reviewing the less commonly used parts of Lisp for beginners is handler-bind and handler-case. I have talked about these functions before, with code examples, in this series of posts.
The less-familiar parts of Lisp for beginners — get-dispatch-macro-character
Another feature of Lisp that newcomers might not have encountered in a brief introduction to the language is the manipulation of dispatch macro characters with get-dispatch-macro-character and set-dispatch-macro-character. We’ve just spent a bit of time talking about readtables, particularly the default readtable in Common Lisp, in quite general terms in the article on gensym. Now, let’s talk about them a bit more carefully.
As mentioned in the earlier article, readtables control read-macros which allow the Lisp reader to alter the forms it receives before they are executed or compiled. Important, also, is that those read-macros are not applied to the result of macro expansion. So, what is in the readtable?
From a programming perspective, a readtable can be considered to be a table of functions to invoke when a symbol begins with a specific character. This can further be broken into two categories, dispatching macro characters, and non-dispatching macro characters. A dispatching macro character opens on a second table of characters, defining two-character prefix codes. A non-dispatching macro character has no further sub-character control. You can see an example of the latter case in the article discussing eval-when.
The Common Lisp standard defines a default readtable in which the # character is a dispatching macro character. If the reader encounters that character at the beginning of a symbol, it parses out any decimal number following it, and looks for the first non-digit character that follows. The intercalary number, if any, becomes an argument to the two-character prefix. The readtable is then used to look up that two-character macro and calls the appropriate function with the numeric argument, if any. If no numeric argument is supplied, the value passed to the function for that parameter is nil, not zero.
You’ve encountered the # dispatching macro character many times, but if it isn’t pointed out you might not realize that they’re all just manifestations of the readtable, and act to modify input before it enters the Lisp instance proper. We’ve mentioned #’SYMBOL, which gets translated into (function SYMBOL). Other combinations you’ve probably used include #\ for literal characters, #X for hexadecimal constants, #O for octal constants, and #+ and #- for doing conditional reads.
So, with that introduction, what about get-dispatch-macro-character? This function returns the read macro function that corresponds to the two-character combination, if any. If there is no such sub-character option to the supplied dispatching macro character, it returns nil. If the first argument is not a dispatching macro character, it generates an error.
Then we have set-dispatch-macro-character. Not surprisingly, this sets the function to be invoked when that particular two-character combination is encountered by the reader.
One final note. There is no standard way to unset a dispatch macro character. Typically, one uses copy-readtable to store a copy of the current readtable, makes the appropriate modifications for the purpose, and then restores the readtable from the copy afterwards.
The less-familiar parts of Lisp for beginners — gensym
Our attention now lands on gensym. The novice Lisp programmer coming from C++ has undoubtedly seen examples using gensym, but I know that its use tends to be cargo-culted at first, it’s this magic sauce whose behaviour is not really well understood at first. I hope to be able to explain a little more about gensym, so that the novice Lisp programmer uses it for the appropriate reasons, and understands what’s happening. If you haven’t seen gensym in action, you probably want to review my series of posts in which I developed some useful (to me) macros.
So, the newcomer to Lisp has read about gensym, that it produces a “new uninterned symbol” guaranteed to be unique. This explanation is perfectly correct, but a newcomer might be fooled into thinking that “uninterned symbol” is why gensym has uniquely useful behaviour when writing macros. A naive reading says, “oh, this returns a symbol that is not interned, so it’s not currently in use, so there are no possible collisions with other symbols in my code”. This is not the case. After all, most variables a programmer builds are not interned, they are local symbols created by let and its relatives, or passed as parameters. How does gensym know it’s not going to conflict with one of those other uninterned symbols? And what if a later form loading in the same file interns a symbol that wasn’t interned before, leading to a collision? So, what is really going on with gensym? For that, we’re going to discuss readtables a bit.
The readtable in Lisp is a mechanism for examining, and possibly altering, the text stream coming from a file during load/compile. Forms that come from a file or from the REPL (the interactive reader of the Lisp instance) pass through the Lisp reader, and certain text constructs might be adjusted before the Lisp core itself sees them, using something called “read-macros”. The readtable controls the invocation and behaviour of these read-macros. An important feature of read-macros is that they are not applied to the result of macro expansion, so they do not get a shot at modifying text generated by macros.
OK, so what does this have to do with gensym? Well, let’s see what gensym does:
CL-USER> (gensym) #:G789
This is the output from SBCL, but you will see the same pattern with CLISP and ecl. The symbol produced starts with the two-character sequence #:. This isn’t merely an aesthetic choice, this is a symbol name that is difficult to get past the reader. There is a read-macro that looks for symbols starting with those two characters, and when it sees them, it behaves differently. It outputs a fresh symbol that is guaranteed not to be eq to any other symbol, whether interned or not.
Let’s see how that plays out. Here’s a “normal” symbol, without the magic prefix sequence:
CL-USER> (let ((a 10)) a) 10
That’s pretty familiar. We use a let form to make a new symbol ‘a’. It’s not interned, but it is visible in the scope of the let, so when the last statement in the form is ‘a’, the Lisp instance returns the value of that local variable, 10.
Now, we’ll do exactly the same thing with a variable name that has the special prefix:
CL-USER> (let ((#:a 10)) #:a) ; in: LET ((#:A 10)) ; (LET ((#:A 10)) ; #:A) ; ; caught STYLE-WARNING: ; The variable #:A is defined but never used. ; in: LET ((#:A 10)) ; (LET ((#:A 10)) ; #:A) ; ; caught WARNING: ; undefined variable: #:A ; ; compilation unit finished ; Undefined variable: ; #:A ; caught 1 WARNING condition ; caught 1 STYLE-WARNING condition ; Evaluation aborted on #<UNBOUND-VARIABLE A {10039012F3}>.
And… what just happened? This construct is almost exactly the same, but the results are different. We are told that the variable #:A is defined but not used, and that the variable #:A is undefined. The reader has produced distinct symbols for the two occurrences of #:A, so as far as the Lisp instance is concerned, the first #:A and the second #:A are two different variables. That explains why the first one is defined but not used, and the second one is undefined, and this is what’s behind the magic of gensym. The gensym function produces a symbol that can’t be matched in any text that arrives at the Lisp reader while passing through the read-macros, so can never collide with your variable names in macro bodies.
Now, let’s revisit a bit what I mentioned about macro expansion. Of course, a variable name that never matches itself isn’t very useful from a programming perspective, so how are these variables used in macros? Because read-macros are not applied to the output of macro expansion, the special prefix characters lose their specialness in the reader, and so the symbol representing the variable is eq to itself, and behaves just like any other variable name, with or without the prefix sequence.
Finally, let’s look at some simple Lisp code:
CL-USER> (let ((my-sym '#:a)) `(let ((,my-sym 10)) ,my-sym)) (LET ((#:A 10)) #:A)
This looks a bit like what you would see in a macro definition. Note that I have deliberately used the #: prefix sequence. The Lisp backquote is a bit simpler than it appears at first. It’s a lot like the single-quote used to generate quoted lists, but has the additional property of allowing the comma to inject values from the surrounding scope inside the literal list. So, in the code fragment above, I’ve defined a symbol in a let, then said, “return this literal list from the function, but where you see a comma, substitute the value of the variable from the surrounding scope. The returned value is not “code”, it’s a list. During macro expansion, this list is inserted into the code as if it were typed in, but without read-macros being in effect. So, even though we saw above that this let form doesn’t work when typed at the line, it does work if you can get it into the Lisp instance without passing it through read-macros:
CL-USER> (eval (let ((my-sym '#:a)) `(let ((,my-sym 10)) ,my-sym))) 10
So, anyway, that’s the point I’m trying to make about gensym. Its special properties in the writing of macros derives not from the fact that it always returns a different symbol (though that’s critically important when you need more than one new symbol in a macro expansion), but from the fact that these symbols won’t match any symbol that the user enters in code, even if they type a name that looks identical to that returned by gensym.