Tag Archives: obscure commands

The less-familiar parts of Lisp for beginners — packages and symbols

I’m now going to depart a bit from my alphabetical walk through Lisp features, and from my normal publishing schedule, to talk about packages.  This is specifically to try to better describe some of the ways that Lisp works following criticism of my terminology in articles about fboundp and fdefinition.

In trying to describe some Lisp concepts in terms familiar to C++ programmers, I tried to draw a parallel between interned symbols in Lisp and the symbol tables used, for example in ELF binary format objects, by the linker stage in C++ compilation.  While there is a logical parallel, it isn’t a very good correspondence, and if you attach too much significance to the logical connections it can lead you to a misleading understanding of Lisp.  I apologize for that, so here we will go into some more fine detail.

Now, you might be tempted to argue away the distinction, saying that the logical entities described are similar, but there are some very important differences, not just in the implementation, but in the language behaviour.  Programmers are wisely enjoined not to program to an implementation, but rather to program to the virtual model defined by the standard or API.  A language standard typically defines “as if” rules, that describe how a conforming implementation must behave as seen from within the program.  This allows the standard to define a common virtual platform, and it is the responsibility of the particular hardware and software implementation to present that virtual platform to the programmer.  When the programmer writes code against the standard-defined virtual platform, rather than against a particular implementation, he or she is much more likely to produce portable, readable code that remains correct as compilers are upgraded or as underlying hardware is replaced.  The differences between Lisp interned symbols and C++ linker symbol tables are not implementation differences, they are logical differences that must be understood by the programmer.

So, I’m inserting this article, out of sequence, so that I can stop referring to “function symbol tables” in Lisp.  That’s not what they are.  And to understand what they really are, we need to describe packages, which really means understanding the intern function.

I’ve been talking about packages throughout this series, describing them a bit like C++ namespaces but with the added feature of inheritance.  Now, though, it’s time to stop talking about parallels and similarities, and explain what a Lisp package really is.

So, a Lisp package is a namespace.  It’s a container for interned symbols (I’ll get to what we mean by that shortly) that allows the programmer to avoid symbol collisions.  Package inheritance is used to allow the reader to locate symbols from other packages when the symbol is not present in the package in which the reader is running (the “current package”).  EDIT: Please see additional text below, added 2014-02-25.

The Lisp language features themselves are present in a package, the COMMON-LISP package, also available under the nickname CL.  If you define a package which does not directly inherit from COMMON-LISP, you will find that the Lisp language itself appears to be unavailable within your new package!  Invoking defun will produce an error declaring that defun is an unknown function.  I say “appears to be unavailable” because the features are still there and reachable, but if you haven’t inherited from COMMON-LISP you will have to use an explicit CL: prefix just to invoke what you think of as the normal features of the Lisp language.  That CL: prefix is an example of a package prefix.  I’ll be using that term a bit in the text that follows.

So, how does a package work?  Well, you create it, and you typically use the :use argument to inherit from COMMON-LISP, and zero or more other packages.  You can then intern symbols in this new package.  When the reader is asked to interpret a symbol without a package prefix, it looks first in the internal and external symbols of the current package.  If there is no matching interned symbol in the current package, it then looks through the external symbols of the packages from which it directly inherits (it does not recurse through their :use lists).  The order in which these packages is searched is not defined, conflicts must always be explicitly resolved.  If the package inherits the same symbol from two other packages, the result is a correctable error.  The programmer is responsible for resolving the conflict, either by interning a symbol in the current package and so shadowing both conflicting symbols, or by adjusting the inheritance in order to specify which symbol is to be used.

You know, this is starting to get dry and abstract, and I’m trying to avoid that trap in these articles.  Let’s see if I can bring things back towards the concrete a bit.  An example of a symbol might be ‘my-worker-function.  A symbol with a package prefix could be ‘my-package:my-worker-function.  When the reader encounters ‘my-worker-function, it looks for that symbol in the current package.  If that symbol is present, either as an internal or external symbol, it is used, there is no ambiguity.  If that symbol is not present in the current package, then the query moves on to the packages from which the current package inherits, if any.  If no match is found, an error is signaled as if you had invoked an unknown function.  What if more than one package exports that symbol?  Well, that conflict is generally not detected at this point.  Symbol conflicts are checked whenever a change occurs that makes them possible.  That includes when evaluating a defpackage form, or if, in the course of the code executing, it invokes the unintern function on a symbol that was shadowing a conflict between two or more other packages.  When the reader is trying to look up a symbol, it knows that conflict resolution has already been performed, so the possible outcomes are limited to a failure to find any such symbol, or a successful and unambiguous resolution of the symbol.

Let’s see what this looks like with a transcript from a Lisp session:
 

CL-USER> (defpackage :pkg-a 
           (:use :CL) 
           (:export :FCN-A))
#<PACKAGE "PKG-A">
CL-USER> (in-package :pkg-a)
#<PACKAGE "PKG-A">
PKG-A> (defun fcn-a ()
         (format t "FCN-A in PKG-A~%"))
FCN-A
PKG-A> (fcn-a)
FCN-A in PKG-A
NIL
PKG-A> (in-package :CL-USER)
#<PACKAGE "COMMON-LISP-USER">
CL-USER> (defpackage :pkg-b
           (:use :CL) 
           (:export :FCN-A))
#<PACKAGE "PKG-B">
CL-USER> (in-package :pkg-b)
#<PACKAGE "PKG-B">
PKG-B> (defun fcn-a ()
         (format t "FCN-A in PKG-B~%"))
FCN-A
PKG-B> (fcn-a)
FCN-A in PKG-B
NIL
PKG-B> (in-package :CL-USER)
#<PACKAGE "COMMON-LISP-USER">
CL-USER> (defpackage :pkg-c 
           (:use :pkg-a :pkg-b)
           (:export :fcn-b))
; Evaluation aborted on #<NAME-CONFLICT {10036CD023}>.
CL-USER> 

I created two packages, :PKG-A and :PKG-B, both inheriting from the COMMON-LISP package.  Both of these packages exported the same function, fcn-a.  When I tried to create a new package, :PKG-C, inheriting from both :PKG-A and :PKG-B, I got an error due to the name conflict, as each of them was exporting its own version of fcn-a.

So, now we’ll resolve the issue:
 

CL-USER> (defpackage :pkg-c
           (:use :pkg-a :pkg-b)
           (:shadowing-import-from :pkg-b :fcn-a)
           (:export :fcn-b))
#<PACKAGE "PKG-C">
CL-USER> (in-package :pkg-c)
#<COMMON-LISP:PACKAGE "PKG-C">
PKG-C> (fcn-a)
FCN-A in PKG-B
COMMON-LISP:NIL
PKG-C> (in-package :CL-USER)
; in: IN-PACKAGE :CL-USER
;     (PKG-C::IN-PACKAGE :CL-USER)
; 
; caught COMMON-LISP:STYLE-WARNING:
;   undefined function: IN-PACKAGE
; 
; compilation unit finished
;   Undefined function:
;     IN-PACKAGE
;   caught 1 STYLE-WARNING condition
; Evaluation aborted on #<UNDEFINED-FUNCTION IN-PACKAGE {1003DBF613}>.

Here, I told the system that when defining :PKG-C, the name conflict between the two versions of fcn-a should be resolved by allowing that from :PKG-B to shadow the one from :PKG-A.  But why did I get an error when I tried to switch back to the CL-USER package?  By omitting COMMON-LISP from the :use list of :PKG-C, I found myself in an environment where the bare in-package symbol, without a package prefix, was not recognized.  This is what I meant earlier about symbols not being resolved recursively through the inheritance tree.  Even though both :PKG-A and :PKG-B inherit from COMMON-LISP, :PKG-C does not have automatic access to the symbols in COMMON-LISP when invoked without a package prefix.  If, instead, I had invoked (COMMON-LISP:in-package :CL-USER), it would have worked as expected.

OK, so that’s a rough look at packages, so what about intern?  This is where the separation from C++ symbol tables becomes obvious.  The intern function causes a symbol to be “known to” a package.  Nothing is said about what this symbol represents, it’s simply added to the list of recognized symbols for the package.  By itself, an interned symbol isn’t very interesting.  You typically want the symbol to allow you to look up something, like a variable, or a function.  So, an interned symbol has five associated fields.  Note that these are language features, not implementation details.

  1. The name of the symbol, as a string
  2. The package in which the symbol is interned
  3. The property list associated with the symbol
  4. The value the symbol should return, when queried
  5. The function the symbol should return, when queried

The name of the symbol is pretty obvious.  The package field indicates where the symbol is interned, so it allows the programmer to determine whether the symbol was defined in the current package or inherited from another.  The property list field allows key/value pairs to be strung onto a symbol.  The value and the function fields decide what the symbol represents in contexts where a value or a function are being requested.  The point to notice, however, is that the assignment of the last three fields is independent of the intern operation itself.  So, defining a function with defun has the effect of first intern-ing the symbol (if it is not yet interned) and then attaching a function definition to the function field of the symbol.  Similarly, defining a variable with setq first interns the symbol and then attaches a value to the value-field of the symbol.  This is why I mentioned earlier that the function and value namespaces are distinct in Lisp, unlike in C++.  A symbol can routinely be used in both function and value contexts, and the correct value will be retrieved.

So, you might ask what the big deal is about talking about a function symbol table, if the functions are isolated from the values, then you can talk about a logical entity with the qualities of a function symbol table.  The problem, however, that could mislead the novice, is in the way these fields are all tied together under the same symbol.  If you unintern a symbol, you delete the reference not just the function associated with it, but also to the value and to the property list.  It’s an over-simplification to talk about functions as strictly separate from values.  In order to understand what is really happening and to avoid surprises when coding, this internal logical structure of packages to symbol lists to associated fields must be understood.

And so, with this lengthy overview posted, I’m going to stop talking about packages, symbols, and namespaces in round-about and imprecise ways.  We have symbols, and packages, and the language defines how they behave, and that’s how I’ll be referring to them from now on.

EDIT #1: 2014-02-25

Following the discussion with Janis in the comments below, I’m adding some more text to help clarify packages and symbols and, I hope, avoid engendering confusion in newcomers.

So, let’s start by clarifying symbols.  A symbol object has, as described above, 5 associated data fields.  A symbol objects can be uninterned, which means that there is no package that maps to that symbol.  The make-symbol command, which we have not covered at the time of this edit, can be used to construct an uninterned symbol object.  You might also look over the description at gensym for examples of uninterned symbols and what it means for symbols to be “equal”.

A package implements a mapping from the names of interned and inherited symbols to the symbol objects themselves.  Inherited symbols must themselves have been interned in the package from which they are inherited, one cannot inherit uninterned symbols.  Modification of that mapping is through defpackage, export, import, intern, shadow, shadowing-import, unexport, unintern, unuse-package, and use-package.  However packages are implemented, they behave as if a package contains mappings for all symbols that it itself interns, plus also all symbols that it imports or inherits from other packages.  The Lisp instance maintains a state in a dynamic variable, *package*, which denotes the “current package”.  When the Lisp instance needs to look up a symbol by name, then if that name does not contain a package prefix, the package specified by *package* is used.  Naturally, if the symbol name contains a package prefix, that package is used for the lookup.

So, symbols are objects, and some symbols may be interned.  A package is a mapping from the package-unique name of an interned symbol to the symbol object.

The less-familiar parts of Lisp for beginners — fill pointers

Continuing the series, I’ll just briefly touch on fill pointers.  These are basically just convenience features that can be attached to one-dimensional arrays.

If a one-dimensional array (vector) has a fill pointer, it informs the system that the vector’s length is equal in value to the fill pointer.  One might use this feature when incrementally filling a vector, maybe a cache.  To avoid the cost of reallocation on resizing, you could create a large initial vector, but set the fill pointer to zero.  Then, as elements are added to the vector, you increase the fill pointer.

The fill pointer does not forbid references to offsets of the vector that are above the limit of the fill pointer, it merely allows length to return something other than the actual length of the allocated vector.  When printing a vector, no elements past the fill pointer are displayed.  Other than those two cases, the fill pointer doesn’t have any other effect.  Here’s a series of operations:
 

CL-USER> (defparameter *fvec* (make-array 10 
                                          :fill-pointer 4 
                                          :initial-element 1))
*FVEC*
CL-USER> *fvec*
#(1 1 1 1)
CL-USER> (setf (aref *fvec* 6) 15)
15
CL-USER> (aref *fvec* 6)
15
CL-USER> (array-dimension *fvec* 0)
10
CL-USER> (length *fvec*)
4
CL-USER> (setf (fill-pointer *fvec*) 10)
10
CL-USER> *fvec*
#(1 1 1 1 1 1 15 1 1 1)

You’ll notice that I can modify and retrieve elements past the end of the region the fill pointer declares as active.  You’ll also see that array-dimension and length return different values.

If you use the fill-pointer accessor on a vector that doesn’t have a fill pointer, an error is raised.  You can check whether or not a vector has a fill pointer with the function array-has-fill-pointer-p.

The less-familiar parts of Lisp for beginners — fdefinition

We move on now to fdefinition.  This is a fairly short topic, but before reviewing it, you might want to read the article on fboundp, if you have not yet read it.

The fdefinition accessor provides a means for setting or reading the function object associated with a particular symbol in the function symbol table.  As such, it is, like fboundp, also unaffected by lexical functions or macros created with flet, labels, or macrolet.

The uses of this accessor are fairly esoteric.  It can certainly be used as part of the implementation of the defun macro, or in the implementations of functions that are passed function symbols, like apply, funcall, and mapcar.  It can also be used to sniff the type of function that is bound, after fboundp returns non-nil.  There might be some rare contexts in which the programmer needs the code to determine whether a particular passed symbol points to a function or macro, or to a generic function, and fdefinition is part of the means for answering that question.  In everyday programming, though, you are unlikely to need to use this feature.
 

CL-USER> (defun my-adder (x)
           (+ 1 x))
MY-ADDER
CL-USER> (my-adder 2)
3
CL-USER> (setf (fdefinition 'my-adder) #'(lambda (x) (+ x 10)))
#<FUNCTION (LAMBDA (X)) {100673943B}>
CL-USER> (my-adder 2)
12

An example of examining the type of a symbol:
 
(defun what-symbol (sym)
  (cond
    ((fboundp sym)
     (typecase (fdefinition sym)
       (generic-function
        (format t "~A is a generic function~%" sym))
       (function
        (format t "~A is a (non-generic) function~%" sym))
       (t
        (format t "~A is not a function~%"))))
    (t
     (format t "~A is not bound in the function symbol table.~%" sym))))

Edit #1:  2014-02-02

Please do not attach too much to the vague and possibly misleading “function symbol table” language I’ve used above.  Instead, look over this later article that I hope provides some more precise details on what is really happening.  It’s not merely a semantic distinction, the material there is important to understand when programming in Lisp.

The less-familiar parts of Lisp for beginners — fboundp

Next, we have fboundp.  This is another function for querying the state of the Lisp environment, asking whether a particular name is bound to a function or macro.  This function does not determine whether the name is interned as variable name.  Recall that, unlike C++, the Lisp language syntax unambiguously distinguishes between variable names and function names.  Consequently, these two types of symbols are not at risk of namespace collisions.

Now, you might think that this makes fboundp very simple.  The temptation is to say that if fboundp on a symbol returns nil, then calling that symbol in a function context will always fail, but things are a bit more complicated than that.  The fact is that it is possible to create named functions and macros which are not inserted into the global/persistent Lisp function namespace, but instead have a limited scope, following which time the functions effectively disappear.  The features labels, flet, and macrolet do this.  If, in your reading, you’ve seen a reference to “lexical functions”, the labels and flet features are what create them, and fboundp does not see lexical functions or macros.

And from this starting point, we can gain some helpful insight into the way Lisp works…

When one of labels, flet, or macrolet is used, it does not insert a new entry into the function symbol table, but it does shadow any existing functions with that name in the same scope.  If labels did insert an entry into the function symbol table, that would interfere with the view seen by other threads of execution.

This distinction between names in the function symbol table and names constructed by labels, flet, or macrolet manifests itself in one of the less obvious syntax requirements.  There are many contexts where the programmer can pass a function by referring to its symbol.  If the programmer has decided to use labels to build a function that happens to collide with an existing name in the function symbol table, how can he tell the program which one to use?  When passing a name that is not in the function symbol table, then rather than using a single-quote to quote the name, one must use a number sign followed by the single quote.  This is demonstrated here:
 

CL-USER> (defun my-adder (x)
           (+ x 2))
MY-ADDER
CL-USER> (labels
             ((my-adder (x) (+ x 3)))
           (mapcar 'my-adder '(1 2 3)))
(3 4 5)
CL-USER> (labels
             ((my-adder (x) (+ x 3)))
           (mapcar #'my-adder '(1 2 3)))
(4 5 6)

I begin by creating a function called my-adder in the function symbol table, one that adds 2 to the numbers passed to it.  Then, I use labels to shadow my-adder with a new definition, one that adds 3 to the numbers passed.  Note, though, that when I use the symbol ‘my-adder in mapcar, the one that is used is the one in the defun, not the one that I supposedly used to shadow it.  In order to use my new definition of my-adder, I have to use the #’my-adder syntax.  It is important to understand the difference between these two cases, and why the different forms are necessary.

It helps to understand what these notations are doing.  What, exactly, is the difference between mapcar acting on ‘my-adder and acting on #’my-adder?  In the first case, the mapcar function is called with a symbol as a parameter.  Inside mapcar, the symbol is resolved in the function symbol table to obtain the function that is to be used for the operation.  That is, mapcar is told “use the function whose symbol is designated by ‘my-adder“.  The second case, #’my-adder, is entirely different.  The # prefix is a reader macro that converts the text at load time.  The sequence #’my-adder is replaced by the text sequence (function my-adder) before the Lisp instance even sees it.  The function special operator resolves the name in the current lexical environment (before the mapcar function call), and returns a function object, not a symbol.  The mapcar function, rather than receiving a symbol and being told to look it up, receives a bare function itself, and uses that, without further resolution.

Perhaps now, there is a realization dawning.  You know that familiar construct for passing anonymous functions, lambdaWe start with a quoted list, the car of which is the keyword lambda.  Then, by prefixing the #, we convert this list into a function in the current lexical environment, and that function is what is passed downwards.  In effect, when we pass a function as a parameter, we can either choose to pass it by name, by sending a symbol down, or by value, by resolving the symbol into an actual function object and passing that instead.  And that is the difference between these two syntaxes, and the # prefix is a read-time shorthand for the second case.  Edit #3: 2014-02-27.  This phrasing was awkward, and now that we’ve covered read-macros under gensym, and have a separate article for lambda, this struck-out text should be ignored.  Instead, please refer to the article about lambda for more details.  The final point is this: many Lisp features which accept functions as arguments can be passed either a symbol from which the function is to be retrieved, or simply the bare function itself.  For functions that are not bound to a symbol, like those created with labels, flet, or lambda, it it necessary that the function object itself be passed.  The function object is retrieved within the scope where it is visible with the function special operator, for which #’ is a read-macro shortcut.  Finally, Common Lisp defines a lambda macro that expands to include the invocation of function, so the #’ is technically optional on lambda forms.

Edit #1:  2014-02-01

Some criticism has been raised on other sites about the terms and descriptions above.  It has been noted that referring to something like the “function symbol table” may confuse more than illuminate, as that is not a term commonly used.  I apologize for this, and it’s a good point to bring up.  My tendency in this series has been to try to describe logical parallels directed at the C++ programmer.  Not an excuse, merely an explanation.  But I certainly appreciate that the language I use above is strange or off-putting to a veteran Lisp programmer, a category in which I emphatically do not include myself.

I sincerely hope that when I reach the intern function, that the description I provide then will be more familiar and provide a better view of what is really going on.

For now, if you’ve read the original text above, and seen me talking about a “function symbol table”, don’t attach too much to that.  The underlying concept, I hope, is helpful.  There exists an mechanism in the Lisp image that allows a function to be referenced by its symbol.  The labels form, while superficially similar to defun, is, in fact, notably different in that it does not influence this aforementioned mechanism, and so does not interfere with the resolution of that symbol in other forms.

I also invite others to post copies of their comments and criticisms here, if they so desire.  I don’t want confusing or misleading postings to sit uncorrected on this site, while helpful criticism sits on other web sites that the casual visitor might not have come across.  I’ll point out that my motivation for writing this series of postings is to improve my understanding of Lisp, and deliberately researching every feature that I haven’t had the opportunity to use.  These posts are primarily an educational tool for myself, but I hope they help others.  If I say something wrong or confusing, please let me know.

Edit #2:  2014-02-02

I’ve posted an out of sequence article about interned symbols and packages here.  I hope that this provides more clarity, and invite the reader to go over that material carefully in order to avoid being mislead by some of the less precise terms I’ve been using in this series of posts.  Once again, I invite comments if the more experienced readers feel I’m failing to give helpful explanations.

The less-familiar parts of Lisp for beginners — export

Next in our list of Lisp features not necessarily encountered in a brief introduction is export.  This is related to the package system of Lisp.  You may find it useful to review the earlier article on delete-package, to understand what a package is in Lisp, and how it differs from C++ classes and namespaces.

Once again, the use of this function ties back to a fundamental difference in the creation of Lisp programs, as distinct from C++ programs.  In C++, the author edits one or more disc files, compiles them into a single executable unit, and then runs that executable.  If changes are to be made, the programmer edits the disc files once more, recompiles, and then restarts the C++ binary.

Lisp programs, on the other hand, are assembled by inserting code into a Lisp image.  Rather than creating a new stand-alone binary, you should think of this in terms of adding things to a blank Lisp image.  Functions, classes, structure, packages, and more, are added, serially, to a Lisp image in order to achieve a desired program state.  Some of the things that C++ would do with keywords and compiler or linker directives are, in Lisp, done with functions that act at the time of invocation, not at the time of compilation.

So, we mentioned packages earlier, and how they are a bit like namespaces, but have the private/public symbols the C++ programmer might associated with private/public methods in classes.  A symbol in a Lisp package is not exported by default.  To the beginner Lisp programmer, this looks like a small difference, calling those symbols requires using a double-colon after the package name, rather than a single colon.  In the context of package inheritance, however, an exported symbol is visible in derived packages that inherit from it, and exported symbols raise the possibility of namespace collisions in that context.

So, what does export do?  Well, unsurprisingly, it causes a particular symbol to be exported from the package.  Normally, the programmer lists the exported symbols in the package definition, as, for instance, in this earlier article, with the :DL-LIST package.  The implementation of the defpackage macro, however, generally calls export or something of equivalent functionality.  The programmer may have reason to choose to export certain symbols at runtime, for instance if optional packages that would supply those symbols are not loaded.  In most circumstances, this function will be only rarely used.