The less-familiar parts of Lisp for beginners — packages and symbols

I’m now going to depart a bit from my alphabetical walk through Lisp features, and from my normal publishing schedule, to talk about packages.  This is specifically to try to better describe some of the ways that Lisp works following criticism of my terminology in articles about fboundp and fdefinition.

In trying to describe some Lisp concepts in terms familiar to C++ programmers, I tried to draw a parallel between interned symbols in Lisp and the symbol tables used, for example in ELF binary format objects, by the linker stage in C++ compilation.  While there is a logical parallel, it isn’t a very good correspondence, and if you attach too much significance to the logical connections it can lead you to a misleading understanding of Lisp.  I apologize for that, so here we will go into some more fine detail.

Now, you might be tempted to argue away the distinction, saying that the logical entities described are similar, but there are some very important differences, not just in the implementation, but in the language behaviour.  Programmers are wisely enjoined not to program to an implementation, but rather to program to the virtual model defined by the standard or API.  A language standard typically defines “as if” rules, that describe how a conforming implementation must behave as seen from within the program.  This allows the standard to define a common virtual platform, and it is the responsibility of the particular hardware and software implementation to present that virtual platform to the programmer.  When the programmer writes code against the standard-defined virtual platform, rather than against a particular implementation, he or she is much more likely to produce portable, readable code that remains correct as compilers are upgraded or as underlying hardware is replaced.  The differences between Lisp interned symbols and C++ linker symbol tables are not implementation differences, they are logical differences that must be understood by the programmer.

So, I’m inserting this article, out of sequence, so that I can stop referring to “function symbol tables” in Lisp.  That’s not what they are.  And to understand what they really are, we need to describe packages, which really means understanding the intern function.

I’ve been talking about packages throughout this series, describing them a bit like C++ namespaces but with the added feature of inheritance.  Now, though, it’s time to stop talking about parallels and similarities, and explain what a Lisp package really is.

So, a Lisp package is a namespace.  It’s a container for interned symbols (I’ll get to what we mean by that shortly) that allows the programmer to avoid symbol collisions.  Package inheritance is used to allow the reader to locate symbols from other packages when the symbol is not present in the package in which the reader is running (the “current package”).  EDIT: Please see additional text below, added 2014-02-25.

The Lisp language features themselves are present in a package, the COMMON-LISP package, also available under the nickname CL.  If you define a package which does not directly inherit from COMMON-LISP, you will find that the Lisp language itself appears to be unavailable within your new package!  Invoking defun will produce an error declaring that defun is an unknown function.  I say “appears to be unavailable” because the features are still there and reachable, but if you haven’t inherited from COMMON-LISP you will have to use an explicit CL: prefix just to invoke what you think of as the normal features of the Lisp language.  That CL: prefix is an example of a package prefix.  I’ll be using that term a bit in the text that follows.

So, how does a package work?  Well, you create it, and you typically use the :use argument to inherit from COMMON-LISP, and zero or more other packages.  You can then intern symbols in this new package.  When the reader is asked to interpret a symbol without a package prefix, it looks first in the internal and external symbols of the current package.  If there is no matching interned symbol in the current package, it then looks through the external symbols of the packages from which it directly inherits (it does not recurse through their :use lists).  The order in which these packages is searched is not defined, conflicts must always be explicitly resolved.  If the package inherits the same symbol from two other packages, the result is a correctable error.  The programmer is responsible for resolving the conflict, either by interning a symbol in the current package and so shadowing both conflicting symbols, or by adjusting the inheritance in order to specify which symbol is to be used.

You know, this is starting to get dry and abstract, and I’m trying to avoid that trap in these articles.  Let’s see if I can bring things back towards the concrete a bit.  An example of a symbol might be ‘my-worker-function.  A symbol with a package prefix could be ‘my-package:my-worker-function.  When the reader encounters ‘my-worker-function, it looks for that symbol in the current package.  If that symbol is present, either as an internal or external symbol, it is used, there is no ambiguity.  If that symbol is not present in the current package, then the query moves on to the packages from which the current package inherits, if any.  If no match is found, an error is signaled as if you had invoked an unknown function.  What if more than one package exports that symbol?  Well, that conflict is generally not detected at this point.  Symbol conflicts are checked whenever a change occurs that makes them possible.  That includes when evaluating a defpackage form, or if, in the course of the code executing, it invokes the unintern function on a symbol that was shadowing a conflict between two or more other packages.  When the reader is trying to look up a symbol, it knows that conflict resolution has already been performed, so the possible outcomes are limited to a failure to find any such symbol, or a successful and unambiguous resolution of the symbol.

Let’s see what this looks like with a transcript from a Lisp session:
 

CL-USER> (defpackage :pkg-a 
           (:use :CL) 
           (:export :FCN-A))
#<PACKAGE "PKG-A">
CL-USER> (in-package :pkg-a)
#<PACKAGE "PKG-A">
PKG-A> (defun fcn-a ()
         (format t "FCN-A in PKG-A~%"))
FCN-A
PKG-A> (fcn-a)
FCN-A in PKG-A
NIL
PKG-A> (in-package :CL-USER)
#<PACKAGE "COMMON-LISP-USER">
CL-USER> (defpackage :pkg-b
           (:use :CL) 
           (:export :FCN-A))
#<PACKAGE "PKG-B">
CL-USER> (in-package :pkg-b)
#<PACKAGE "PKG-B">
PKG-B> (defun fcn-a ()
         (format t "FCN-A in PKG-B~%"))
FCN-A
PKG-B> (fcn-a)
FCN-A in PKG-B
NIL
PKG-B> (in-package :CL-USER)
#<PACKAGE "COMMON-LISP-USER">
CL-USER> (defpackage :pkg-c 
           (:use :pkg-a :pkg-b)
           (:export :fcn-b))
; Evaluation aborted on #<NAME-CONFLICT {10036CD023}>.
CL-USER> 

I created two packages, :PKG-A and :PKG-B, both inheriting from the COMMON-LISP package.  Both of these packages exported the same function, fcn-a.  When I tried to create a new package, :PKG-C, inheriting from both :PKG-A and :PKG-B, I got an error due to the name conflict, as each of them was exporting its own version of fcn-a.

So, now we’ll resolve the issue:
 

CL-USER> (defpackage :pkg-c
           (:use :pkg-a :pkg-b)
           (:shadowing-import-from :pkg-b :fcn-a)
           (:export :fcn-b))
#<PACKAGE "PKG-C">
CL-USER> (in-package :pkg-c)
#<COMMON-LISP:PACKAGE "PKG-C">
PKG-C> (fcn-a)
FCN-A in PKG-B
COMMON-LISP:NIL
PKG-C> (in-package :CL-USER)
; in: IN-PACKAGE :CL-USER
;     (PKG-C::IN-PACKAGE :CL-USER)
; 
; caught COMMON-LISP:STYLE-WARNING:
;   undefined function: IN-PACKAGE
; 
; compilation unit finished
;   Undefined function:
;     IN-PACKAGE
;   caught 1 STYLE-WARNING condition
; Evaluation aborted on #<UNDEFINED-FUNCTION IN-PACKAGE {1003DBF613}>.

Here, I told the system that when defining :PKG-C, the name conflict between the two versions of fcn-a should be resolved by allowing that from :PKG-B to shadow the one from :PKG-A.  But why did I get an error when I tried to switch back to the CL-USER package?  By omitting COMMON-LISP from the :use list of :PKG-C, I found myself in an environment where the bare in-package symbol, without a package prefix, was not recognized.  This is what I meant earlier about symbols not being resolved recursively through the inheritance tree.  Even though both :PKG-A and :PKG-B inherit from COMMON-LISP, :PKG-C does not have automatic access to the symbols in COMMON-LISP when invoked without a package prefix.  If, instead, I had invoked (COMMON-LISP:in-package :CL-USER), it would have worked as expected.

OK, so that’s a rough look at packages, so what about intern?  This is where the separation from C++ symbol tables becomes obvious.  The intern function causes a symbol to be “known to” a package.  Nothing is said about what this symbol represents, it’s simply added to the list of recognized symbols for the package.  By itself, an interned symbol isn’t very interesting.  You typically want the symbol to allow you to look up something, like a variable, or a function.  So, an interned symbol has five associated fields.  Note that these are language features, not implementation details.

  1. The name of the symbol, as a string
  2. The package in which the symbol is interned
  3. The property list associated with the symbol
  4. The value the symbol should return, when queried
  5. The function the symbol should return, when queried

The name of the symbol is pretty obvious.  The package field indicates where the symbol is interned, so it allows the programmer to determine whether the symbol was defined in the current package or inherited from another.  The property list field allows key/value pairs to be strung onto a symbol.  The value and the function fields decide what the symbol represents in contexts where a value or a function are being requested.  The point to notice, however, is that the assignment of the last three fields is independent of the intern operation itself.  So, defining a function with defun has the effect of first intern-ing the symbol (if it is not yet interned) and then attaching a function definition to the function field of the symbol.  Similarly, defining a variable with setq first interns the symbol and then attaches a value to the value-field of the symbol.  This is why I mentioned earlier that the function and value namespaces are distinct in Lisp, unlike in C++.  A symbol can routinely be used in both function and value contexts, and the correct value will be retrieved.

So, you might ask what the big deal is about talking about a function symbol table, if the functions are isolated from the values, then you can talk about a logical entity with the qualities of a function symbol table.  The problem, however, that could mislead the novice, is in the way these fields are all tied together under the same symbol.  If you unintern a symbol, you delete the reference not just the function associated with it, but also to the value and to the property list.  It’s an over-simplification to talk about functions as strictly separate from values.  In order to understand what is really happening and to avoid surprises when coding, this internal logical structure of packages to symbol lists to associated fields must be understood.

And so, with this lengthy overview posted, I’m going to stop talking about packages, symbols, and namespaces in round-about and imprecise ways.  We have symbols, and packages, and the language defines how they behave, and that’s how I’ll be referring to them from now on.

EDIT #1: 2014-02-25

Following the discussion with Janis in the comments below, I’m adding some more text to help clarify packages and symbols and, I hope, avoid engendering confusion in newcomers.

So, let’s start by clarifying symbols.  A symbol object has, as described above, 5 associated data fields.  A symbol objects can be uninterned, which means that there is no package that maps to that symbol.  The make-symbol command, which we have not covered at the time of this edit, can be used to construct an uninterned symbol object.  You might also look over the description at gensym for examples of uninterned symbols and what it means for symbols to be “equal”.

A package implements a mapping from the names of interned and inherited symbols to the symbol objects themselves.  Inherited symbols must themselves have been interned in the package from which they are inherited, one cannot inherit uninterned symbols.  Modification of that mapping is through defpackage, export, import, intern, shadow, shadowing-import, unexport, unintern, unuse-package, and use-package.  However packages are implemented, they behave as if a package contains mappings for all symbols that it itself interns, plus also all symbols that it imports or inherits from other packages.  The Lisp instance maintains a state in a dynamic variable, *package*, which denotes the “current package”.  When the Lisp instance needs to look up a symbol by name, then if that name does not contain a package prefix, the package specified by *package* is used.  Naturally, if the symbol name contains a package prefix, that package is used for the lookup.

So, symbols are objects, and some symbols may be interned.  A package is a mapping from the package-unique name of an interned symbol to the symbol object.

5 thoughts on “The less-familiar parts of Lisp for beginners — packages and symbols

  1. Hi Christopher,

    I’d just like to clear up a little misconception here:

    > When the reader is asked to interpret a symbol without a package
    > prefix, it looks first in the internal and external symbols of the
    > current package. If there is no matching interned symbol in the
    > current package, it then looks through the external symbols of the
    > packages from which it directly inherits

    The thing is — a symbol either is or is not present in a package. intern will just return the symbol if it is present in the package, or “intern” the symbol, if it is not.

    This is because package “inheritance” (and hence symbol conflicts) are resolved at package definition time. No crazy lookups or symbol chasing after that.

    If anybody’s interested, there’s an article (which I personally have not read from end to end), but which supposedly explains stuff about packages: http://www.flownet.com/gat/packages.pdf

    In case anybody cares.

    Cheers!

    1. Janis,
      I’m not sure what you’re trying to make clear here. The quoted paragraph isn’t about interning a symbol, it’s about how the Lisp reader acts when it encounters a bare symbol without a package prefix. For instance, you’ve loaded the package and used in-package to make that package current. Now, at the REPL prompt you type (myfunc). If there is a ‘myfunc symbol in the current package, it is invoked. If there is not, but there is one in one of the direct ancestor packages, that is invoked, otherwise the reader signals a condition of type ‘UNDEFINED-FUNCTION.
      The point about symbol conflicts was also made in the article. Note, however, that it is not just at package definition time that conflicts are checked and resolved. The potential presence of conflicts is checked every time a form executes that could possibly lead to a symbol conflict, that includes defpackage, but also intern and unintern. The reason unintern can lead to a symbol conflict is that you might unintern a symbol that was shadowing same-name symbols in multiple ancestor packages.
      Anyway, Janis, I’m not certain what your objection was to the original text, so if you could please re-express it, I’d like to make sure, with your help, that the article is both accurate and clear.

    2. Janis,
      I’ve been thinking some more about your comment, and I believe I understand the point you were trying to make. It’s a valid point, which I will try to re-express here. Please let me know if this is not an accurate portrayal of your position.
      As mentioned in the post, packages are namespaces that hold symbols. You’ll note the second field associated with symbols, “the package in which the symbol is interned”. While I’ve been talking about first checking the current package and then, if not found, checking the direct ancestor packages, the implementation of packages is permitted (though I’m not certain it’s required) to maintain a copy of the symbols it has inherited. That is to say, when a symbol without a package prefix is looked up in the current package, it does not actually have to query the ancestor packages directly, as the current package already has its own copies of inherited symbols, each with an associated data field that tells it the package that interned the inherited symbol.
      This copying of inherited symbols into the package directly, rather than querying ancestor packages, though, seems to me to be an implementation detail. The Common Lisp standard permits manipulation of the list of interned symbols only through intern/unintern/export, there is no longer the oblist/obarray mechanism that programmers could use to examine the symbol list directly. The standard does require that symbol conflicts be resolved as soon as a change has occurred that might cause them, which supports, but does not require, the copying of inherited symbols into the package.
      Janis, is this a reasonable explanation?

      1. I’ll try to be short.

        Packages do not contain symbols — they contain mappings from symbol names (strings) to symbols (the objects).

        There are different ways how to get those mappings in place. One way is to intern a symbol in a package. Another way to get the mapping is using import function. Then there is shadow function (and shadowing-import).

        Reader does not do any "querying." Depending on the syntax used, reader will use intern (for internal symbols) or find-symbol (for external symbols):


        CL-USER> (defpackage "FOO" (:use))
        #
        CL-USER> 'foo::a
        FOO::A
        CL-USER> (ignore-errors (read-from-string "foo:b"))
        NIL
        #
        CL-USER> (describe (second /))
        #
        Class: #
        Wrapper: #
        Instance slots
        CCL::FORMAT-CONTROL: "Reader error: No external symbol named ~S in package ~S ."
        CCL::FORMAT-ARGUMENTS: ("B" #)
        ; No value
        CL-USER> (find-symbol "A" "FOO")
        FOO::A
        :INTERNAL
        CL-USER> (find-symbol "B" "FOO")
        NIL
        NIL
        CL-USER>

        And yes, you are right, the package conflicts are checked not only at package definition time, but also whenever packages are modified (well, in a way that may lead to conflits). But that's besides the point I was trying to make: there is no package "querying" or package hierarchy traversal involved. And it may sound like nitpicking, but in my opinion it just gives the wrong impression of how it actually works.

        I'll just drop another link for anybody interested in packages and symbols: http://www.lispworks.com/documentation/HyperSpec/Body/11_aa.htm. Section 11.1.1.2, and especially 11.1.1.2.1, is the relevant one to this discussion. Maybe.

        1. Janis,
          Thank you for taking the time to detail your thoughts for my understanding.
          Pointing out that the symbol objects themselves exist independently of packages is a helpful clarification, and I ought to have expressed that in the post. The discussion of gensym and read-macros in another post hints at this, but certainly doesn’t make it obvious. It is a bit imprecise of me to say that a package “contains” symbols, when, as you say, its particular function is to allow the lookup of symbol objects by their names. I will update the text of the post later today to make this more accurate.
          I do disagree with your objection that the reader doesn’t “query”, as I take that to be the function of find-symbol, but that’s a minor semantic point.
          Thank you again for your helpful comment, and for taking the time to express it for my understanding.

Leave a Reply to Janis Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

反垃圾邮件 / Anti-spam question * Time limit is exhausted. Please reload CAPTCHA.