11. Collections

Like most programming languages, Common Lisp provides standard data types that collect multiple values into a single object. Every language slices up the collection problem a little bit differently, but the basic collection types usually boil down to an integer-indexed array type and a table type that can be used to map more or less arbitrary keys to values. The former are variously called arrays, lists, or tuples; the latter go by the names hash tables, associative arrays, maps, and dictionaries.

Lisp is, of course, famous for its list data structure, and most Lisp books, following the ontogeny-recapitulates-phylogeny principle of language instruction, start their discussion of Lisp's collections with lists. However, that approach often leads readers to the mistaken conclusion that lists are Lisp's only collection type. To make matters worse, because Lisp's lists are such a flexible data structure, it is possible to use them for many of the things arrays and hash tables are used for in other languages. But it's a mistake to focus too much on lists; while they're a crucial data structure for representing Lisp code as Lisp data, in many situations other data structures are more appropriate.

To keep lists from stealing the show, in this chapter I'll focus on Common Lisp's other collection types: vectors and hash tables.[119] However, vectors and lists share enough characteristics that Common Lisp treats them both as subtypes of a more general abstraction, the sequence. Thus, you can use many of the functions I'll discuss in this chapter with both vectors and lists.

Vectors

Vectors are Common Lisp's basic integer-indexed collection, and they come in two flavors. Fixed-size vectors are a lot like arrays in a language such as Java: a thin veneer over a chunk of contiguous memory that holds the vector's elements.[120] Resizable vectors, on the other hand, are more like arrays in Perl or Ruby, lists in Python, or the ArrayList class in Java: they abstract the actual storage, allowing the vector to grow and shrink as elements are added and removed.

You can make fixed-size vectors containing specific values with the function

VECTOR
, which takes any number of arguments and returns a freshly allocated fixed-size vector containing those arguments.

(vector)     ==> #()

(vector 1)   ==> #(1)

(vector 1 2) ==> #(1 2)

The

#(...)
syntax is the literal notation for vectors used by the Lisp printer and reader. This syntax allows you to save and restore vectors by
PRINT
ing them out and
READ
ing them back in. You can use the
#(...)
syntax to include literal vectors in your code, but as the effects of modifying literal objects aren't defined, you should always use
VECTOR
or the more general function
MAKE-ARRAY
to create vectors you plan to modify.

MAKE-ARRAY
is more general than
VECTOR
since you can use it to create arrays of any dimensionality as well as both fixed-size and resizable vectors. The one required argument to
MAKE-ARRAY
is a list containing the dimensions of the array. Since a vector is a one-dimensional array, this list will contain one number, the size of the vector. As a convenience,
MAKE-ARRAY
will also accept a plain number in the place of a one-item list. With no other arguments,
MAKE-ARRAY
will create a vector with uninitialized elements that must be set before they can be accessed.[121] To create a vector with the elements all set to a particular value, you can pass an
:initial-element
argument. Thus, to make a five-element vector with its elements initialized to
NIL
, you can write the following:

(make-array 5 :initial-element nil) ==> #(NIL NIL NIL NIL NIL)

MAKE-ARRAY
is also the function to use to make a resizable vector. A resizable vector is a slightly more complicated object than a fixed-size vector; in addition to keeping track of the memory used to hold the elements and the number of slots available, a resizable vector also keeps track of the number of elements actually stored in the vector. This number is stored in the vector's fill pointer, so called because it's the index of the next position to be filled when you add an element to the vector.

To make a vector with a fill pointer, you pass

MAKE-ARRAY
a
:fill-pointer
argument. For instance, the following call to
MAKE-ARRAY
makes a vector with room for five elements; but it looks empty because the fill pointer is zero:

(make-array 5 :fill-pointer 0) ==> #()

To add an element to the end of a resizable vector, you can use the function

VECTOR-PUSH
. It adds the element at the current value of the fill pointer and then increments the fill pointer by one, returning the index where the new element was added. The function
VECTOR-POP
returns the most recently pushed item, decrementing the fill pointer in the process.

(defparameter *x* (make-array 5 :fill-pointer 0))


(vector-push 'a *x*) ==> 0

*x*                  ==> #(A)

(vector-push 'b *x*) ==> 1

*x*                  ==> #(A B)

(vector-push 'c *x*) ==> 2

*x*                  ==> #(A B C)

(vector-pop *x*)     ==> C

*x*                  ==> #(A B)

(vector-pop *x*)     ==> B

*x*                  ==> #(A)

(vector-pop *x*)     ==> A

*x*                  ==> #()

However, even a vector with a fill pointer isn't completely resizable. The vector

*x*
can hold at most five elements. To make an arbitrarily resizable vector, you need to pass
MAKE-ARRAY
another keyword argument:
:adjustable
.

(make-array 5 :fill-pointer 0 :adjustable t) ==> #()

This call makes an adjustable vector whose underlying memory can be resized as needed. To add elements to an adjustable vector, you use

VECTOR-PUSH-EXTEND
, which works just like
VECTOR-PUSH
except it will automatically expand the array if you try to push an element onto a full vector—one whose fill pointer is equal to the size of the underlying storage.[122]

Subtypes of Vector

All the vectors you've dealt with so far have been general vectors that can hold any type of object. It's also possible to create specialized vectors that are restricted to holding certain types of elements. One reason to use specialized vectors is they may be stored more compactly and can provide slightly faster access to their elements than general vectors. However, for the moment let's focus on a couple kinds of specialized vectors that are important data types in their own right.

One of these you've seen already—strings are vectors specialized to hold characters. Strings are important enough to get their own read/print syntax (double quotes) and the set of string-specific functions I discussed in the previous chapter. But because they're also vectors, all the functions I'll discuss in the next few sections that take vector arguments can also be used with strings. These functions will fill out the string library with functions for things such as searching a string for a substring, finding occurrences of a character within a string, and more.

Literal strings, such as

"foo"
, are like literal vectors written with the
#()
syntax—their size is fixed, and they must not be modified. However, you can use
MAKE-ARRAY
to make resizable strings by adding another keyword argument,
:element-type
. This argument takes a type descriptor. I won't discuss all the possible type descriptors you can use here; for now it's enough to know you can create a string by passing the symbol
CHARACTER
as the
:element-type
argument. Note that you need to quote the symbol to prevent it from being treated as a variable name. For example, to make an initially empty but resizable string, you can write this:

(make-array 5 :fill-pointer 0 :adjustable t :element-type 'character)  ""

Bit vectors—vectors whose elements are all zeros or ones—also get some special treatment. They have a special read/print syntax that looks like

#*00001111
and a fairly large library of functions, which I won't discuss, for performing bit-twiddling operations such as "anding" together two bit arrays. The type descriptor to pass as the
:element-type
to create a bit vector is the symbol
BIT
.

Vectors As Sequences

As mentioned earlier, vectors and lists are the two concrete subtypes of the abstract type sequence. All the functions I'll discuss in the next few sections are sequence functions; in addition to being applicable to vectors—both general and specialized—they can also be used with lists.

The two most basic sequence functions are

LENGTH
, which returns the length of a sequence, and
ELT
, which allows you to access individual elements via an integer index.
LENGTH
takes a sequence as its only argument and returns the number of elements it contains. For vectors with a fill pointer, this will be the value of the fill pointer.
ELT
, short for element, takes a sequence and an integer index between zero (inclusive) and the length of the sequence (exclusive) and returns the corresponding element.
ELT
will signal an error if the index is out of bounds. Like
LENGTH
,
ELT
treats a vector with a fill pointer as having the length specified by the fill pointer.

(defparameter *x* (vector 1 2 3))


(length *x*) ==> 3

(elt *x* 0)  ==> 1

(elt *x* 1)  ==> 2

(elt *x* 2)  ==> 3

(elt *x* 3)  ==> error

ELT
is also a
SETF
able place, so you can set the value of a particular element like this:

(setf (elt *x* 0) 10)


*x* ==> #(10 2 3)

Sequence Iterating Functions

While in theory all operations on sequences boil down to some combination of

LENGTH
,
ELT
, and
SETF
of
ELT
operations, Common Lisp provides a large library of sequence functions.

One group of sequence functions allows you to express certain operations on sequences such as finding or filtering specific elements without writing explicit loops. Table 11-1 summarizes them.

Table 11-1.Basic Sequence Functions

Name Required Arguments Returns
COUNT
Item and sequence Number of times item appears in sequence
FIND
Item and sequence Item or
NIL
POSITION
Item and sequence Index into sequence or
NIL
REMOVE
Item and sequence Sequence with instances of item removed
SUBSTITUTE
New item, item, and sequence Sequence with instances of item replaced with new item

Here are some simple examples of how to use these functions:

(count 1 #(1 2 1 2 3 1 2 3 4))         ==> 3

(remove 1 #(1 2 1 2 3 1 2 3 4))        ==> #(2 2 3 2 3 4)

(remove 1 '(1 2 1 2 3 1 2 3 4))        ==> (2 2 3 2 3 4)

(remove #\a "foobarbaz")               ==> "foobrbz"

(substitute 10 1 #(1 2 1 2 3 1 2 3 4)) ==> #(10 2 10 2 3 10 2 3 4)

(substitute 10 1 '(1 2 1 2 3 1 2 3 4)) ==> (10 2 10 2 3 10 2 3 4)

(substitute #\x #\b "foobarbaz")       ==> "fooxarxaz"

(find 1 #(1 2 1 2 3 1 2 3 4))          ==> 1

(find 10 #(1 2 1 2 3 1 2 3 4))         ==> NIL

(position 1 #(1 2 1 2 3 1 2 3 4))      ==> 0

Note how

REMOVE
and
SUBSTITUTE
always return a sequence of the same type as their sequence argument.

You can modify the behavior of these five functions in a variety of ways using keyword arguments. For instance, these functions, by default, look for elements in the sequence that are the same object as the item argument. You can change this in two ways: First, you can use the

:test
keyword to pass a function that accepts two arguments and returns a boolean. If provided, it will be used to compare item to each element instead of the default object equality test,
EQL
.[123] Second, with the
:key
keyword you can pass a one-argument function to be called on each element of the sequence to extract a key value, which will then be compared to the item in the place of the element itself. Note, however, that functions such as
FIND
that return elements of the sequence continue to return the actual element, not just the extracted key.

(count "foo" #("foo" "bar" "baz") :test #'string=)    ==> 1

(find 'c #((a 10) (b 20) (c 30) (d 40)) :key #'first) ==> (C 30)

To limit the effects of these functions to a particular subsequence of the sequence argument, you can provide bounding indices with

:start
and
:end
arguments. Passing
NIL
for
:end
or omitting it is the same as specifying the length of the sequence.[124]

If a non-

NIL :from-end
argument is provided, then the elements of the sequence will be examined in reverse order. By itself
:from-end
can affect the results of only
FIND
and
POSITION
. For instance:

(find 'a #((a 10) (b 20) (a 30) (b 40)) :key #'first)             ==> (A 10)

(find 'a #((a 10) (b 20) (a 30) (b 40)) :key #'first :from-end t) ==> (A 30)

However, the

:from-end
argument can affect
REMOVE
and
SUBSTITUTE
in conjunction with another keyword parameter,
:count
, that's used to specify how many elements to remove or substitute. If you specify a
:count
lower than the number of matching elements, then it obviously matters which end you start from:

(remove #\a "foobarbaz" :count 1)             ==> "foobrbaz"

(remove #\a "foobarbaz" :count 1 :from-end t) ==> "foobarbz"

And while

:from-end
can't change the results of the
COUNT
function, it does affect the order the elements are passed to any
:test
and
:key
functions, which could possibly have side effects. For example:

CL-USER> (defparameter *v* #((a 10) (b 20) (a 30) (b 40)))

*V*

CL-USER> (defun verbose-first (x) (format t "Looking at ~s~%" x) (first x))

VERBOSE-FIRST

CL-USER> (count 'a *v* :key #'verbose-first)

Looking at (A 10)

Looking at (B 20)

Looking at (A 30)

Looking at (B 40)

2

CL-USER> (count 'a *v* :key #'verbose-first :from-end t)

Looking at (B 40)

Looking at (A 30)

Looking at (B 20)

Looking at (A 10)

2

Table 11-2 summarizes these arguments.

Table 11-2. Standard Sequence Function Keyword Arguments

Argument Meaning Default
:test
Two-argument function used to compare item (or value extracted by
:key
function) to element.
EQL
:key
One-argument function to extract key value from actual sequence element.
NIL
means use element as is.
NIL
:start
Starting index (inclusive) of subsequence. 0
:end
Ending index (exclusive) of subsequence.
NIL
indicates end of sequence.
NIL
:from-end
If true, the sequence will be traversed in reverse order, from end to start.
NIL
:count
Number indicating the number of elements to remove or substitute or
NIL
to indicate all (
REMOVE
and
SUBSTITUTE
only).
NIL

Higher-Order Function Variants

For each of the functions just discussed, Common Lisp provides two higher-order function variants that, in the place of the item argument, take a function to be called on each element of the sequence. One set of variants are named the same as the basic function with an

-IF
appended. These functions count, find, remove, and substitute elements of the sequence for which the function argument returns true. The other set of variants are named with an
-IF-NOT
suffix and count, find, remove, and substitute elements for which the function argument does not return true.

(count-if #'evenp #(1 2 3 4 5))         ==> 2


(count-if-not #'evenp #(1 2 3 4 5))     ==> 3


(position-if #'digit-char-p "abcd0001") ==> 4


(remove-if-not #'(lambda (x) (char= (elt x 0) #\f))

  #("foo" "bar" "baz" "foom")) ==> #("foo" "foom")

According to the language standard, the

-IF-NOT
variants are deprecated. However, that deprecation is generally considered to have itself been ill-advised. If the standard is ever revised, it's more likely the deprecation will be removed than the
-IF-NOT
functions. For one thing, the
REMOVE-IF-NOT
variant is probably used more often than
REMOVE-IF
. Despite its negative-sounding name,
REMOVE-IF-NOT
is actually the positive variant—it returns the elements that do satisfy the predicate.[125]

The

-IF
and
-IF-NOT
variants accept all the same keyword arguments as their vanilla counterparts except for
:test
, which isn't needed since the main argument is already a function.[126] With a
:key
argument, the value extracted by the
:key
function is passed to the function instead of the actual element.

(count-if #'evenp #((1 a) (2 b) (3 c) (4 d) (5 e)) :key #'first)     ==> 2


(count-if-not #'evenp #((1 a) (2 b) (3 c) (4 d) (5 e)) :key #'first) ==> 3


(remove-if-not #'alpha-char-p

  #("foo" "bar" "1baz") :key #'(lambda (x) (elt x 0))) ==> #("foo" "bar")

The

REMOVE
family of functions also support a fourth variant,
REMOVE-DUPLICATES
, that has only one required argument, a sequence, from which it removes all but one instance of each duplicated element. It takes the same keyword arguments as
REMOVE
, except for
:count
, since it always removes all duplicates.

(remove-duplicates #(1 2 1 2 3 1 2 3 4)) ==> #(1 2 3 4)

Whole Sequence Manipulations

A handful of functions perform operations on a whole sequence (or sequences) at a time. These tend to be simpler than the other functions I've described so far. For instance,

COPY-SEQ
and
REVERSE
each take a single argument, a sequence, and each returns a new sequence of the same type. The sequence returned by
COPY-SEQ
contains the same elements as its argument while the sequence returned by
REVERSE
contains the same elements but in reverse order. Note that neither function copies the elements themselves—only the returned sequence is a new object.

The

CONCATENATE
function creates a new sequence containing the concatenation of any number of sequences. However, unlike
REVERSE
and
COPY-SEQ
, which simply return a sequence of the same type as their single argument,
CONCATENATE
must be told explicitly what kind of sequence to produce in case the arguments are of different types. Its first argument is a type descriptor, like the
:element-type
argument to
MAKE-ARRAY
. In this case, the type descriptors you'll most likely use are the symbols
VECTOR
,
LIST
, or
STRING
.[127] For example:

(concatenate 'vector #(1 2 3) '(4 5 6))    ==> #(1 2 3 4 5 6)

(concatenate 'list #(1 2 3) '(4 5 6))      ==> (1 2 3 4 5 6)

(concatenate 'string "abc" '(#\d #\e #\f)) ==> "abcdef"

Sorting and Merging

The functions

SORT
and
STABLE-SORT
provide two ways of sorting a sequence. They both take a sequence and a two-argument predicate and return a sorted version of the sequence.

(sort (vector "foo" "bar" "baz") #'string<) ==> #("bar" "baz" "foo")

The difference is that

STABLE-SORT
is guaranteed to not reorder any elements considered equivalent by the predicate while
SORT
guarantees only that the result is sorted and may reorder equivalent elements.

Both these functions are examples of what are called destructive functions. Destructive functions are allowed—typically for reasons of efficiency—to modify their arguments in more or less arbitrary ways. This has two implications: one, you should always do something with the return value of these functions (such as assign it to a variable or pass it to another function), and, two, unless you're done with the object you're passing to the destructive function, you should pass a copy instead. I'll say more about destructive functions in the next chapter.

Typically you won't care about the unsorted version of a sequence after you've sorted it, so it makes sense to allow

SORT
and
STABLE-SORT
to destroy the sequence in the course of sorting it. But it does mean you need to remember to write the following:[128]

(setf my-sequence (sort my-sequence #'string<))

rather than just this:

(sort my-sequence #'string<)

Both these functions also take a keyword argument,

:key
, which, like the
:key
argument in other sequence functions, should be a function and will be used to extract the values to be passed to the sorting predicate in the place of the actual elements. The extracted keys are used only to determine the ordering of elements; the sequence returned will contain the actual elements of the argument sequence.

The

MERGE
function takes two sequences and a predicate and returns a sequence produced by merging the two sequences, according to the predicate. It's related to the two sorting functions in that if each sequence is already sorted by the same predicate, then the sequence returned by
MERGE
will also be sorted. Like the sorting functions,
MERGE
takes a
:key
argument. Like
CONCATENATE
, and for the same reason, the first argument to
MERGE
must be a type descriptor specifying the type of sequence to produce.

(merge 'vector #(1 3 5) #(2 4 6) #'<) ==> #(1 2 3 4 5 6)

(merge 'list #(1 3 5) #(2 4 6) #'<)   ==> (1 2 3 4 5 6)

Subsequence Manipulations

Another set of functions allows you to manipulate subsequences of existing sequences. The most basic of these is

SUBSEQ
, which extracts a subsequence starting at a particular index and continuing to a particular ending index or the end of the sequence. For instance:

(subseq "foobarbaz" 3)   ==> "barbaz"

(subseq "foobarbaz" 3 6) ==> "bar"

SUBSEQ
is also
SETF
able, but it won't extend or shrink a sequence; if the new value and the subsequence to be replaced are different lengths, the shorter of the two determines how many characters are actually changed.

(defparameter *x* (copy-seq "foobarbaz"))


(setf (subseq *x* 3 6) "xxx")  ; subsequence and new value are same length

*x* ==> "fooxxxbaz"


(setf (subseq *x* 3 6) "abcd") ; new value too long, extra character ignored.

*x* ==> "fooabcbaz"


(setf (subseq *x* 3 6) "xx")   ; new value too short, only two characters changed

*x* ==> "fooxxcbaz"

You can use the

FILL
function to set multiple elements of a sequence to a single value. The required arguments are a sequence and the value with which to fill it. By default every element of the sequence is set to the value;
:start
and
:end
keyword arguments can limit the effects to a given subsequence.

If you need to find a subsequence within a sequence, the

SEARCH
function works like
POSITION
except the first argument is a sequence rather than a single item.

(position #\b "foobarbaz") ==> 3

(search "bar" "foobarbaz") ==> 3

On the other hand, to find where two sequences with a common prefix first diverge, you can use the

MISMATCH
function. It takes two sequences and returns the index of the first pair of mismatched elements.

(mismatch "foobarbaz" "foom") ==> 3

It returns

NIL
if the strings match.
MISMATCH
also takes many of the standard keyword arguments: a
:key
argument for specifying a function to use to extract the values to be compared; a
:test
argument to specify the comparison function; and
:start1
,
:end1
,
:start2
, and
:end2
arguments to specify subsequences within the two sequences. And a
:from-end
argument of
T
specifies the sequences should be searched in reverse order, causing
MISMATCH
to return the index, in the first sequence, where whatever common suffix the two sequences share begins.

(mismatch "foobar" "bar" :from-end t) ==> 3

Sequence Predicates

Four other handy functions are

EVERY
,
SOME
,
NOTANY
, and
NOTEVERY
, which iterate over sequences testing a boolean predicate. The first argument to all these functions is the predicate, and the remaining arguments are sequences. The predicate should take as many arguments as the number of sequences passed. The elements of the sequences are passed to the predicate—one element from each sequence—until one of the sequences runs out of elements or the overall termination test is met:
EVERY
terminates, returning false, as soon as the predicate fails. If the predicate is always satisfied, it returns true.
SOME
returns the first non-
NIL
value returned by the predicate or returns false if the predicate is never satisfied.
NOTANY
returns false as soon as the predicate is satisfied or true if it never is. And
NOTEVERY
returns true as soon as the predicate fails or false if the predicate is always satisfied. Here are some examples of testing just one sequence:

(every #'evenp #(1 2 3 4 5))    ==> NIL

(some #'evenp #(1 2 3 4 5))     ==> T

(notany #'evenp #(1 2 3 4 5))   ==> NIL

(notevery #'evenp #(1 2 3 4 5)) ==> T

These calls compare elements of two sequences pairwise:

(every #'> #(1 2 3 4) #(5 4 3 2))    ==> NIL

(some #'> #(1 2 3 4) #(5 4 3 2))     ==> T

(notany #'> #(1 2 3 4) #(5 4 3 2))   ==> NIL

(notevery #'> #(1 2 3 4) #(5 4 3 2)) ==> T

Sequence Mapping Functions

Finally, the last of the sequence functions are the generic mapping functions.

MAP
, like the sequence predicate functions, takes a n-argument function and n sequences. But instead of a boolean value,
MAP
returns a new sequence containing the result of applying the function to subsequent elements of the sequences. Like
CONCATENATE
and
MERGE
,
MAP
needs to be told what kind of sequence to create.

(map 'vector #'* #(1 2 3 4 5) #(10 9 8 7 6)) ==> #(10 18 24 28 30)

MAP-INTO
is like
MAP
except instead of producing a new sequence of a given type, it places the results into a sequence passed as the first argument. This sequence can be the same as one of the sequences providing values for the function. For instance, to sum several vectors—
a
,
b
, and
c
—into one, you could write this:

(map-into a #'+ a b c)

If the sequences are different lengths,

MAP-INTO
affects only as many elements as are present in the shortest sequence, including the sequence being mapped into. However, if the sequence being mapped into is a vector with a fill pointer, the number of elements affected isn't limited by the fill pointer but rather by the actual size of the vector. After a call to
MAP-INTO
, the fill pointer will be set to the number of elements mapped.
MAP-INTO
won't, however, extend an adjustable vector.

The last sequence function is

REDUCE
, which does another kind of mapping: it maps over a single sequence, applying a two-argument function first to the first two elements of the sequence and then to the value returned by the function and subsequent elements of the sequence. Thus, the following expression sums the numbers from one to ten:

(reduce #'+ #(1 2 3 4 5 6 7 8 9 10)) ==> 55

REDUCE
is a surprisingly useful function—whenever you need to distill a sequence down to a single value, chances are you can write it with
REDUCE
, and it will often be quite a concise way to express what you want. For instance, to find the maximum value in a sequence of numbers, you can write
(reduce #'max numbers)
.
REDUCE
also takes a full complement of keyword arguments (
:key
,
:from-end
,
:start
, and
:end
) and one unique to
REDUCE
(
:initial-value
). The latter specifies a value that's logically placed before the first element of the sequence (or after the last if you also specify a true
:from-end
argument).

Hash Tables

The other general-purpose collection provided by Common Lisp is the hash table. Where vectors provide an integer-indexed data structure, hash tables allow you to use arbitrary objects as the indexes, or keys. When you add a value to a hash table, you store it under a particular key. Later you can use the same key to retrieve the value. Or you can associate a new value with the same key—each key maps to a single value.

With no arguments

MAKE-HASH-TABLE
makes a hash table that considers two keys equivalent if they're the same object according to
EQL
. This is a good default unless you want to use strings as keys, since two strings with the same contents aren't necessarily
EQL
. In that case you'll want a so-called
EQUAL
hash table, which you can get by passing the symbol
EQUAL
as the
:test
keyword argument to
MAKE-HASH-TABLE
. Two other possible values for the
:test
argument are the symbols
EQ
and
EQUALP
. These are, of course, the names of the standard object comparison functions, which I discussed in Chapter 4. However, unlike the
:test
argument passed to sequence functions,
MAKE-HASH-TABLE
's
:test
can't be used to specify an arbitrary function—only the values
EQ
,
EQL
,
EQUAL
, and
EQUALP
. This is because hash tables actually need two functions, an equivalence function and a hash function that computes a numerical hash code from the key in a way compatible with how the equivalence function will ultimately compare two keys. However, although the language standard provides only for hash tables that use the standard equivalence functions, most implementations provide some mechanism for defining custom hash tables.

The

GETHASH
function provides access to the elements of a hash table. It takes two arguments—a key and the hash table—and returns the value, if any, stored in the hash table under that key or
NIL
.[129] For example:

(defparameter *h* (make-hash-table))


(gethash 'foo *h*) ==> NIL


(setf (gethash 'foo *h*) 'quux)


(gethash 'foo *h*) ==> QUUX

Since

GETHASH
returns
NIL
if the key isn't present in the table, there's no way to tell from the return value the difference between a key not being in a hash table at all and being in the table with the value
NIL
.
GETHASH
solves this problem with a feature I haven't discussed yet—multiple return values.
GETHASH
actually returns two values; the primary value is the value stored under the given key or
NIL
. The secondary value is a boolean indicating whether the key is present in the hash table. Because of the way multiple values work, the extra return value is silently discarded unless the caller explicitly handles it with a form that can "see" multiple values.

I'll discuss multiple return values in greater detail in Chapter 20, but for now I'll give you a sneak preview of how to use the

MULTIPLE-VALUE-BIND
macro to take advantage of
GETHASH
's extra return value.
MULTIPLE-VALUE-BIND
creates variable bindings like
LET
does, filling them with the multiple values returned by a form.

The following function shows how you might use

MULTIPLE-VALUE-BIND
; the variables it binds are
value
and
present
:

(defun show-value (key hash-table)

  (multiple-value-bind (value present) (gethash key hash-table)

    (if present

      (format nil "Value ~a actually present." value)

      (format nil "Value ~a because key not found." value))))


(setf (gethash 'bar *h*) nil) ; provide an explicit value of NIL


(show-value 'foo *h*) ==> "Value QUUX actually present."

(show-value 'bar *h*) ==> "Value NIL actually present."

(show-value 'baz *h*) ==> "Value NIL because key not found."

Since setting the value under a key to

NIL
leaves the key in the table, you'll need another function to completely remove a key/value pair.
REMHASH
takes the same arguments as
GETHASH
and removes the specified entry. You can also completely clear a hash table of all its key/value pairs with
CLRHASH
.

Hash Table Iteration

Common Lisp provides a couple ways to iterate over the entries in a hash table. The simplest of these is via the function

MAPHASH
. Analogous to the
MAP
function,
MAPHASH
takes a two-argument function and a hash table and invokes the function once for each key/value pair in the hash table. For instance, to print all the key/value pairs in a hash table, you could use
MAPHASH
like this:

(maphash #'(lambda (k v) (format t "~a => ~a~%" k v)) *h*)

The consequences of adding or removing elements from a hash table while iterating over it aren't specified (and are likely to be bad) with two exceptions: you can use

SETF
with
GETHASH
to change the value of the current entry, and you can use
REMHASH
to remove the current entry. For instance, to remove all the entries whose value is less than ten, you could write this:

(maphash #'(lambda (k v) (when (< v 10) (remhash k *h*))) *h*)

The other way to iterate over a hash table is with the extended

LOOP
macro, which I'll discuss in Chapter 22.[130] The
LOOP
equivalent of the first
MAPHASH
expression would look like this:

(loop for k being the hash-keys in *h* using (hash-value v)

  do (format t "~a => ~a~%" k v))

I could say a lot more about the nonlist collections supported by Common Lisp. For instance, I haven't discussed multidimensional arrays at all or the library of functions for manipulating bit arrays. However, what I've covered in this chapter should suffice for most of your general-purpose programming needs. Now it's finally time to look at Lisp's eponymous data structure: lists.

Загрузка...