If you have been using Emacs for a while and have been taking advantage of some of its more advanced features, chances are that you have thought of something useful that Emacs doesn't do. Although Emacs has hundreds of built-in commands, dozens of packages and modes, and so on, everyone eventually runs into some functionality that Emacs doesn't have. Whatever feature you find missing, you can program using Emacs Lisp.
Before you dive in, however, note that this chapter is not for everyone. It is intended for people who have already become comfortable using Emacs and who have a fair bit of programming experience, though not necessarily with Lisp per se. If you have no such experience, you may want to skip this chapter; if there is something specific you would like Emacs to do, you might try to find a friendly Emacs Lisp hacker to help you write the necessary code. Or, if you're a little adventurous, you could skim enough to find the file-template example and learn how to install it—it gives you some useful features.
Readers who are building their Lisp skills but don't necessarily want to read the whole chapter might also want to look for the "Treasure Trove of Examples" section in the middle for a useful tool that can help jumpstart their exploration of the Emacs libraries.
Note that we do not cover Lisp in its entirety in this chapter. That would require another large, dense book. Instead, we cover the basics of the language and other features that are often useful in writing Emacs code. If you wish to go beyond this chapter, refer to the GNU Emacs Lisp Reference Manual, distributed with Emacs (choose Help → More Manuals → Introduction to Lisp and Emacs Lisp Reference) for details about the specific Lisp features in Emacs. You may also turn to any of the various Lisp textbooks[73] available for a solid grounding in the language itself.
Emacs Lisp is a full-blown Lisp implementation;[74] thus it is more than the usual macro or script language found in many text editors. (One of the authors has written a small expert system entirely in Emacs Lisp.) In fact, you could even think of Emacs itself as a Lisp system with lots of built-in functions, many of which happen to pertain to text manipulation, window management, file I/O, and other features useful to text editing. The source code for Emacs, written in C, implements the Lisp interpreter, Lisp primitives, and only the most basic commands for text editing; a large layer of built-in Lisp code and libraries on top of that implements the rest of Emacs's functionality. A current version of Emacs comes with close to 250,000 lines of Lisp.
This chapter starts with an introduction to the aspects of Lisp that resemble common programming languages like Java and Perl. These features are enough to enable you to write many Emacs commands. Then we deal with how to interface Lisp code with Emacs so that the functions you write can become Emacs commands. We will see various built-in Lisp functions that are useful for writing your own Emacs commands, including those that use regular expressions; we give an explanation of regular expressions that extends the introduction in Chapter 3 and is oriented toward Lisp programming. We then return to the basics of Lisp for a little while, covering the unique features of the language that have to do with lists, and show how this chapter's concepts fit together by presenting a file template system you can install and use in your own programming or writing projects.
Finally we show you how to program a simple major mode, illustrating that this "summit" of Emacs Lisp programming isn't so hard to scale. After that, you will see how easy it is to customize Emacs's built-in major modes without having to change (or even look at) the code that implements them. We finish the chapter by describing how to build your own library of Lisp packages.
You may have heard of Lisp as a language for artificial intelligence (AI). If you aren't into AI, don't worry. Lisp may have an unusual syntax, but many of its basic features are just like those of more conventional languages you may have seen, such as Java or Perl. We emphasize such features in this chapter. After introducing the basic Lisp concepts, we proceed by building up various example functions that you can actually use in Emacs. In order to try out the examples, you should be familiar with Emacs Lisp mode and Lisp interaction mode, which were discussed in Chapter 9.
The basic elements in Lisp you need to be familiar with are functions, variables, and atoms. Functions are the only program units in Lisp; they cover the notions of procedures, subroutines, programs, and even operators in other languages.
Functions are defined as lists of the above entities, usually as lists of calls to other, existing functions. All functions have return values (as with Perl functions and non-void Java methods); a function's return value is simply the value of the last item in the list, usually the value returned by the last function called. A function call within another function is equivalent to a statement in other languages, and we use statement interchangeably with function call in this chapter. Here is the syntax for function:
(function-name argument1 argument2 ...)
which is equivalent to this:
method_name (argument1, argument2, ...);
in Java. This syntax is used for all functions, including those equivalent to arithmetic or comparison operators in other languages. For example, in order to add 2 and 4 in Java or Perl, you would use the expression 2 + 4, whereas in Lisp you would use the following:
(+ 2 4)
Similarly, where you would use 4 >= 2 (greater than or equal to), the Lisp equivalent is:
(>= 4 2)
Variables in Lisp are similar to those in any other language, except that they do not have types. A Lisp variable can assume any type of value (values themselves do have types, but variables don't impose restrictions on what they can hold).
Atoms are values of any type, including integers, floating point (real) numbers, characters, strings, Boolean truth values, symbols, and special Emacs types such as buffers, windows, and processes. The syntax for various kinds of atoms is:
• Integers are what you would expect: signed whole numbers in the range -227 to 227- 1.
• Floating point numbers are real numbers that you can represent with decimal points and scientific notation (with lowercase "e" for the power of 10). For example, the number 5489 can be written 5489, 5.489e3, 548.9e1, and so on.
• Characters are preceded by a question mark, for example,
?a
. Esc, Newline, and Tab are abbreviated \e
, \n
, and \t
respectively; other control characters are denoted with the prefix \C-
, so that (for example) C-a is denoted as ?\C-a
.[75]
• Strings are surrounded by double quotes; quote marks and backslashes within strings need to be preceded by a backslash. For example, "
Jane said, \"See Dick run.\"
" is a legal string. Strings can be split across multiple lines without any special syntax. Everything until the closing quote, including all the line breaks, is part of the string value.
• Booleans use
t
for true and nil
for false, though most of the time, if a Boolean value is expected, any non-nil
value is assumed to mean true. nil
is also used as a null or nonvalue in various situations, as we will see.
• Symbols are names of things in Lisp, for example, names of variables or functions. Sometimes it is important to refer to the name of something instead of its value, and this is done by preceding the name with a single quote ('). For example, the define-key function, described in Chapter 10, uses the name of the command (as a symbol) rather than the command itself.
A simple example that ties many of these basic Lisp concepts together is the function setq.[76] As you may have figured out from previous chapters, setq is a way of assigning values to variables, as in
(setq auto-save-interval 800)
Notice that setq is a function, unlike in other languages in which special syntax such as
=
or :=
is used for assignment. setq takes two arguments: a variable name and a value. In this example, the variable auto-save-interval (the number of keystrokes between auto-saves) is set to the value 800
.
setq can actually be used to assign values to multiple variables, as in
(setq thisvar thisvalue
thatvar thatvalue
theothervar theothervalue)
The return value of setq is simply the last value assigned, in this case
. You can set the values of variables in other ways, as we'll see, but setq is the most widely applicable.theothervalue
Now it's time for an example of a simple function definition. Start Emacs without any arguments; this puts you into the
*scratch*
buffer, an empty buffer in Lisp interaction mode (see Chapter 9), so that you can actually try this and subsequent examples.
Before we get to the example, however, some more comments on Lisp syntax are necessary. First, you will notice that the dash (
-
) is used as a "break" character to separate words in names of variables, functions, and so on. This practice is simply a widely used Lisp programming convention; thus the dash takes the place of the underscore (_
) in languages like C and Ada. A more important issue has to do with all of the parentheses in Lisp code. Lisp is an old language that was designed before anyone gave much thought to language syntax (it was still considered amazing that you could use any language other than the native processor's binary instruction set), so its syntax is not exactly programmer-friendly. Yet Lisp's heavy use of lists—and thus its heavy use of parentheses—has its advantages, as we'll see toward the end of this chapter.
The main problem a programmer faces is how to keep all the parentheses balanced properly. Compounding this problem is the usual programming convention of putting multiple right parentheses at the end of a line, rather than the more readable technique of placing each right parenthesis directly below its matching left parenthesis. Your best defense against this is the support the Emacs Lisp modes give you, particularly the Tab key for proper indentation and the flash-matching-parenthesis feature.
Now we're ready for our example function. Suppose you are a student or journalist who needs to keep track of the number of words in a paper or story you are writing. Emacs has no built-in way of counting the number of words in a buffer, so we'll write a Lisp function that does the job:
1 (defun count-words-buffer ( )
2 (let ((count 0))
3 (save-excursion
4 (goto-char (point-min))
5 (while (< (point) (point-max))
6 (forward-word 1)
7 (setq count (1+ count)))
8 (message "buffer contains %d words." count))))
Let's go through this function line by line and see what it does. (Of course, if you are trying this in Emacs, don't type the line numbers in.)
The defun on line 1 defines the function by its name and arguments. Notice that defun is itself a function—one that, when called, defines a new function. (defun returns the name of the function defined, as a symbol.) The function's arguments appear as a list of names inside parentheses; in this case, the function has no arguments. Arguments can be made optional by preceding them with the keyword &optional. If an argument is optional and not supplied when the function is called, its value is assumed to be nil.
Line 2 contains a let construct, whose general form is:
(let ((var1 value1) (var2 value2) ... )
statement-block)
The first thing let does is define the variables
var1
, var2
, etc., and set them to the initial values value1
, value2
, etc. Then let executes the statement block, which is a sequence of function calls or values, just like the body of a function.
It is useful to think of let as doing three things:
• Defining (or declaring) a list of variables
• Setting the variables to initial values, as if with setq
• Creating a block in which the variables are known; the let block is known as the scope of the variables
If a let is used to define a variable, its value can be reset later within the let block with setq. Furthermore, a variable defined with let can have the same name as a global variable; all setqs on that variable within the let block act on the local variable, leaving the global variable undisturbed. However, a setq on a variable that is not defined with a let affects the global environment. It is advisable to avoid using global variables as much as possible because their names might conflict with those of existing global variables and therefore your changes might have unexpected and inexplicable side effects later on.
So, in our example function, we use let to define the local variable count and initialize it to 0. As we will see, this variable is used as a loop counter.
Lines 3 through 8 are the statements within the let block. The first of these calls the built-in Emacs function save-excursion, which is a way of being polite. The function is going to move the cursor around the buffer, so we don't want to disorient the user by jumping them to a strange place in their file just because they asked for a word count. Calling save-excursion tells Emacs to remember the location of cursor at the beginning of the function, and go back there after executing any statements in its body. Notice how save-excursion is providing us with capability similar to let; you can think of it as a way of making the cursor location itself a local variable.
Line 4 calls goto-char. The argument to goto-char is a (nested) function call to the built-in function point-min. As we have mentioned before, point is Emacs's internal name for the position of the cursor, and we'll refer to the cursor as point throughout the remainder of this chapter. point-min returns the value of the first character position in the current buffer, which is almost always 1; then, goto-char is called with the value 1, which has the effect of moving point to the beginning of the buffer.
The next line sets up a while loop; Java and Perl have a similar construct. The while construct has the general form
(while condition statement-block)
Like let and save-excursion, while sets up another statement block. condition is a value (an atom, a variable, or a function returning a value). This value is tested; if it is
nil
, the condition is considered to be false, and the while loop terminates. If the value is other than nil
, the condition is considered to be true, the statement block gets executed, the condition is tested again, and the process repeats.
Of course, it is possible to write an infinite loop. If you write a Lisp function with a while loop and try running it, and your Emacs session hangs, chances are that you have made this all-too-common mistake; just type C-g to abort it.
In our sample function, the condition is the function
<
, which is a less-than function with two arguments, analogous to the < operator in Java or Perl. The first argument is another function that returns the current character position of point; the second argument returns the maximum character position in the buffer, that is, the length of the buffer. The function <
(and other relational functions) return a Boolean value, t
or nil
.
The loop's statement block consists of two statements. Line 6 moves point forward one word (i.e., as if you had typed M-f). Line 7 increments the loop counter by 1; the function
1+
is shorthand for (+ 1 variable-name)
. Notice that the third right parenthesis on line 7 matches the left parenthesis preceding while. So, the while loop causes Emacs to go through the current buffer a word at a time while counting the words.
The final statement in the function uses the built-in function message to print a message in the minibuffer saying how many words the buffer contains. The form of the message function will be familiar to C programmers. The first argument to message is a format string, which contains text and special formatting instructions of the form
%x
, where x
is one of a few possible letters. For each of these instructions, in the order in which they appear in the format string, message
reads the next argument and tries to interpret it according to the letter after the percent sign. Table 11-1 lists meanings for the letters in the format string.
Table 11-1. Message format strings
Format string | Meaning |
---|---|
|
String or symbol |
|
Character |
|
Integer |
|
Floating point in scientific notation |
|
Floating point in decimal-point notation |
|
Floating point in whichever format yields the shortest string |
For example:
(message "\"%s\" is a string, %d is a number, and %c is a character"
"hi there" 142 ?q)
causes the message:
"hi there" is a string, 142 is a number, and q is a character
to appear in the minibuffer. This is analogous to the C code:
printf ("\"%s\" is a string, %d is a number, and %c is a character\n",
"hi there", 142, 'q');
The floating-point-format characters are a bit more complicated. They assume a certain number of significant digits unless you tell them otherwise. For example, the following:
(message "This book was printed in %f, also known as %e." 2004 2004)
yields this:
This book was printed in 2004.000000, also known as 2.004000e+03.
But you can control the number of digits after the decimal point by inserting a period and the number of digits desired between the
%
and the e
, f
, or g
. For example, this:
(message "This book was printed in %.3e, also known as %.0f." 2004 2004)
prints in the minibuffer:
This book was printed in 2.004e+03, also known as 2004.
The count-words-buffer function that we've just finished works, but it still isn't as convenient to use as the Emacs commands you work with daily. If you have typed it in, try it yourself. First you need to get Emacs to evaluate the lines you typed in, thereby actually defining the function. To do this, move your cursor to just after the last closing parenthesis in the function and type C-j (or Linefeed)—the "evaluate" key in Lisp interaction mode—to tell Emacs to perform the function definition. You should see the name of the function appear again in the buffer; the return value of the defun function is the symbol that has been defined. (If instead you get an error message, double check that your function looks exactly like the example and that you haven't typed in the line numbers, and try again.)
Once the function is defined, you can execute it by typing (count-words-buffer) on its own line in your Lisp interaction window, and once again typing C-j after the closing parenthesis.
Now that you can execute the function correctly from a Lisp interaction window, try executing the function with M-x, as with any other Emacs command. Try typing M-x count-words-buffer Enter: you will get the error message
[No match]
. (You can type C-g to cancel this failed attempt.) You get this error message because you need to "register" a function with Emacs to make it available for interactive use. The function to do this is interactive, which has the form:
(interactive "prompt-string")
This statement should be the first in a function, that is, right after the line containing the defun and the documentation string (which we will cover shortly). Using interactive causes Emacs to register the function as a command and to prompt the user for the arguments declared in the defun statement. The prompt string is optional.
The prompt string has a special format: for each argument you want to prompt the user for, you provide a section of prompt string. The sections are separated by newlines (
\n
). The first letter of each section is a code for the type of argument you want. There are many choices; the most commonly used are listed in Table 11-2.
Table 11-2. Argument codes for interactive functions
Code | User is prompted for: |
---|---|
|
Name of an existing buffer |
|
Event (mouse action or function key press) |
|
Name of an existing file |
|
Number (integer) |
|
String |
Most of these have uppercase variations | |
|
Name of a buffer that may not exist |
|
Name of a file that may not exist |
|
Number, unless command is invoked with a prefix argument, in which case use the prefix argument and skip this prompt |
|
Symbol |
With the b and f options, Emacs signals an error if the buffer or file given does not already exist. Another useful option to interactive is r, which we will see later. There are many other option letters; consult the documentation for function interactive for the details. The rest of each section is the actual prompt that appears in the minibuffer.
The way interactive is used to fill in function arguments is somewhat complicated and best explained through an example. A simple example is in the function goto-percent, which we will see shortly. It contains the statement
(interactive "nPercent: ")
The
n
in the prompt string tells Emacs to prompt for an integer; the string Percent:
appears in the minibuffer.
As a slightly more complicated example, let's say we want to write our own version of the replace-string command. Here's how we would do the prompting:
(defun replace-string (from to)
(interactive "sReplace string: \nsReplace string %s with: ")
...)
The prompt string consists of two sections,
sReplace string:
and sReplace string %s with:
, separated by a Newline. The initial s
in each means that a string is expected; the %s
is a formatting operator (as in the previous message function) that Emacs replaces with the user's response to the first prompt. When applying formatting operators in a prompt, it is as if message has been called with a list of all responses read so far, so the first formatting operator is applied to the first response, and so on.
When this command is invoked, first the prompt
Replace string:
appears in the minibuffer. Assume the user types fred
in response. After the user presses Enter, the prompt Replace fred with:
appears. The user types the replacement string and presses Enter again.
The two strings the user types are used as values of the function arguments from and to (in that order), and the command runs to completion. Thus, interactive supplies values to the function's arguments in the order of the sections of the prompt string.
The use of interactive does not preclude calling the function from other Lisp code; in this case, the calling function needs to supply values for all arguments. For example, if we were interested in calling our version of replace-string from another Lisp function that needs to replace all occurrences of "Bill" with "Deb" in a file, we would use
(replace-string "Bill" "Deb")
The function is not being called interactively in this case, so the interactive statement has no effect; the argument from is set to "Bill," and to is set to "Deb."
Getting back to our count-words-buffer command: it has no arguments, so its interactive command does not need a prompt string. The final modification we want to make to our command is to add a documentation string (or doc string for short), which is shown by online help facilities such as describe-function (C-h f). Doc strings are normal Lisp strings; they are optional and can be arbitrarily many lines long, although, by convention, the first line is a terse, complete sentence summarizing the command's functionality. Remember that any double quotes inside a string need to be preceded by backslashes.
With all of the fixes taken into account, the complete function looks like this:
(defun count-words-buffer ( )
"Count the number of words in the current buffer;
print a message in the minibuffer with the result."
(interactive)
(save-excursion
(let ((count 0))
(goto-char (point-min))
(while (< (point) (point-max))
(forward-word 1)
(setq count (1+ count)))
(message "buffer contains %d words." count))))
Now that you've seen how to write a working command, we'll discuss Lisp's primitive functions. These are the building blocks from which you'll build your functions. As mentioned above, Lisp uses functions where other languages would use operators, that is, for arithmetic, comparison, and logic. Table 11-3 shows some Lisp primitive functions that are equivalent to these operators.
Table 11-3. Lisp primitive functions
Arithmetic | , , ,
|
(remainder) |
|
(increment) |
|
(decrement) |
|
,
|
|
Comparison | , , ,
|
(not equal) |
|
(for numbers and characters) |
|
(for strings and other complex objects) |
|
Logic | , ,
|
All the arithmetic functions except
1+
, 1-
, and %
can take arbitrarily many arguments, as can and
and or
. An arithmetic function returns floating point values only if at least one argument is a floating point number, so for example, (/ 7.0 4)
returns 1.75, and (/ 7 4)
returns 1. Notice that integer division truncates the remainder.
It may seem inefficient or syntactically ugly to use functions for everything. However, one of the main merits of Lisp is that the core of the language is small and easy to interpret efficiently. In addition, the syntax is not as much of a problem if you have support tools such as Emacs's Lisp modes to help you.
We have seen that a statement block can be defined using the let function. We also saw that while and save-excursion include statement blocks. Other important constructs also define statement blocks: progn and other forms of let.
progn, the most basic, has the form:
(progn
statement-block)
progn is a simple way of making a block of statements look like a single one, somewhat like the curly braces of Java or the
begin
and end
of Pascal. The value returned by progn is the value returned by the last statement in the block. progn is especially useful with control structures like if (see the following discussion) that, unlike while, do not include statement blocks.
The let function has other forms as well. The simplest is:
(let (var1 var2 ...)
statement-block)
In this case, instead of a list of
(var value)
pairs, there is simply a list of variable names. As with the other form of let, these become local variables accessible in the statement block. However, instead of initializing them to given values, they are all just initialized to nil
. You can actually mix both forms within the same let statement, for example:
(let (var1 (var2 value2) var3 ...)
statement-block)
In the form of let we saw first, the initial values for the local variables can be function calls (remember that all functions return values). All such functions are evaluated before any values are assigned to variables. However, there may be cases in which you want the values of some local variables to be available for computing the values of others. This is where let*, the final version of let, comes in. let* steps through its assignments in order, assigning each local variable a value before moving on to the next.
For example, let's say we want to write a function goto-percent that allows you to go to a place in the current buffer expressed as a percentage of the text in the buffer. Here is one way to write this function:
(defun goto-percent (pct)
(interactive "nGoto percent: ")
(let* ((size (point-max))
(charpos (/ (* size pct) 100)))
(goto-char charpos)))
As we saw earlier, the interactive function is used to prompt users for values of arguments. In this case, it prompts for the integer value of the argument pct. Then the let* function initializes size to the size of the buffer in characters, then uses that value to compute the character position charpos that is pct (percent) of the buffer's size. Finally, the call of goto-char causes point to be moved to that character position in the current window.
The important thing to notice is that if we had used let instead of let*, the value of size would not be available when computing the value of charpos. let* can also be used in the
(var1 var2 ...)
format, just like let, but there wouldn't be any point in doing so.
We should also note that a more efficient way to write goto-percent is this:
(defun goto-percent (pct)
(interactive "nPercent: ")
(goto-char (/ (* pct (point-max)) 100)))
We already saw that the while function acts as a control structure like similar statements in other languages. There are two other important control structures in Lisp: if and cond.
The if function has the form:
(if condition true-case false-block)
Here, the condition is evaluated; if it is non-nil,
is evaluated; if nil, true-case
is evaluated. Note that false-block
is a single statement whereas true-case
is a statement block; false-block
is optional.false-block
As an example, let's suppose we're writing a function that performs some complicated series of edits to a buffer and then reports how many changes it made. We're perfectionists, so we want the status report to be properly pluralized, that is to say "made 53 changes" or "made 1 change." This is a common enough programming need that we decide to write a general-purpose function to do it so that we can use it in other projects too.
The function takes two arguments: the word to be pluralized (if necessary) and the count to be displayed (which determines whether it's necessary).
(defun pluralize (word count)
(if (= count 1)
word
(concat word "s")))
The condition in the if clause tests to see if count is equal to 1. If so, the first statement gets executed. Remember that the "true" part of the if function is only one statement, so progn would be necessary to make a statement block if we wanted to do more than one thing. In this case, we have the opposite extreme; our "true" part is a single variable, word. Although this looks strange, it is actually a very common Lisp idiom and worth getting used to. When the condition block is true, the value of word is evaluated, and this value becomes the value of the entire if statement. Because that's the last statement in our function, it is the value returned by pluralize. Note that this is exactly the result we want when count is 1: the value of word is returned unchanged.
The remaining portion of the if statement is evaluated when the condition is false, which is to say, when count has a value other than 1. This results in a call to the built-in concat function, which concatenates all its arguments into a single string. In this case it adds an "s" at the end of the word we've passed in. Again, the result of this concatenation becomes the result of the if statement and the result of our pluralize function.
If you type it in and try it out, you'll see results like this:
(pluralize "goat" 5)
"goats"
(pluralize "change" 1)
"change"
Of course, this function can be tripped up easily enough. You may have tried something like this already:
(pluralize "mouse" 5)
"mouses"
To fix this, we'd need to be able to tell the function to use an alternate plural form for tricky words. But it would be nice if the simple cases could remain as simple as they are now. This is a good opportunity to use an optional parameter. If necessary, we supply the plural form to use; if we don't supply one, the function acts as it did in its first incarnation. Here's how we'd achieve that:
(defun pluralize (word count &optional plural)
(if (= count 1)
word
(if (null plural)
(concat word "s")
plural)))
The "else" part of our code has become another if statement. It uses the null function to check whether we were given the plural parameter or not. If plural was omitted, it has the value nil and the null function returns t if its argument is nil. So this logic reads "if b was missing, just add an s to word; otherwise return the special plural value we were given."
This gives us results like this:
(pluralize "mouse" 5)
"mouses"
(pluralize "mouse" 5 "mice")
"mice"
(pluralize "mouse" 1 "mice")
"mouse"
A more general conditional control structure is the cond function, which has the following form:
(cond
(condition1 statement-block1)
(condition2 statement-block2)
...)
Java and Perl programmers can think of this as a sequence of if then else if then else if . . . , or as a kind of generalized switch statement. The conditions are evaluated in order, and when one of them evaluates to non-
nil
, the corresponding statement block is executed; the cond function terminates and returns the last value in that statement block.[77]
We can use cond to give a more folksy feel to our hypothetical status reporter now that it's pluralizing nicely. Instead of reporting an actual numeric value for the number of changes, we could have it say no, one, two, or many as appropriate. Again we'll write a general function to do this:
(
defun how-many (count)
(cond ((zerop count) "no")
((= count 1) "one")
((= count 2) "two")
(t "many")))
The first conditional expression introduces a new primitive Lisp function, zerop. It checks whether its argument is zero, and returns
t
(true) when it is. So when count is zero, the cond statement takes this first branch, and our function returns the value no. This strange function name bears a little explanation. It is pronounced "zero-pee" and is short for "zero predicate." In the realm of mathematical logic from which Lisp evolved, a predicate is a function that returns true or false based on some attribute of its argument. Lisp has a wide variety of similar predicate functions, with structurally related names. When you run into the next one, you'll understand it. (Of course, you might now expect the null function we introduced in the previous example to be called "nilp" instead. Nobody's perfectly consistent.)
The next two conditional expressions in the cond statement check if count is 1 or 2 and cause it to return "one" or "two" as appropriate. We could have written the first one using the same structure, but then we'd have missed out on an opportunity for a digression into Lisp trivia!
The last conditional expression is simply the atom t (true), which means its body is executed whenever all the preceding expressions failed. It returns the value many. Executing this function gives us results like these:
(how-many 1)
"one"
(how-many 0)
"no"
(how-many 3)
"many"
Combining these two helper functions into a mechanism to report the change count for our fancy command is easy.
(defun report-change-count (count)
(message "Made %s %s." (how-many count) (pluralize "change" count)))
We get results like these:
(report-change-count 0)
"Made no changes."
(report-change-count 1)
"Made one change."
(report-change-count 1329)
"Made many changes."
Many of the Emacs functions that exist and that you may write involve searching and manipulating the text in a buffer. Such functions are particularly useful in specialized modes, like the programming language modes described in Chapter 9. Many built-in Emacs functions relate to text in strings and buffers; the most interesting ones take advantage of Emacs's regular expression facility, which we introduced in Chapter 3.
We first describe the basic functions relating to buffers and strings that don't use regular expressions. Afterwards, we discuss regular expressions in more depth than was the case in Chapter 3, concentrating on the features that are most useful to Lisp programmers, and we describe the functions that Emacs makes available for dealing with regular expressions.
Table 11-4 shows some basic Emacs functions relating to buffers, text, and strings that are only useful to Lisp programmers and thus aren't bound to keystrokes. We already saw a couple of them in the count-words-buffer example. Notice that some of these are predicates, and their names reflect this.
Table 11-4. Buffer and text functions
Function | Value or action |
---|---|
point | Character position of point. |
mark | Character position of mark. |
point-min | Minimum character position (usually 1). |
point-max | Maximum character position (usually size of buffer). |
bolp | Whether point is at the beginning of the line ( or ). |
eolp | Whether point is at the end of the line. |
bobp | Whether point is at the beginning of the buffer. |
eobp | Whether point is at the end of the buffer. |
insert | Insert any number of arguments (strings or characters) into the buffer after point. |
number-to-string | Convert a numerical argument to a string. |
string-to-number | Convert a string argument to a number (integer or floating point). |
char-to-string | Convert a character argument to a string. |
substring | Given a string and two integer indices start and end, return the substring starting after start and ending before end. Indices start at 0. For example, returns " ". |
aref | Array indexing function that can be used to return individual characters from strings; takes an integer argument and returns the character as an integer, using the ASCII code (on most machines). For example, returns 114, the ASCII code for . |
Many functions not included in the previous table deal with buffers and text, including some that you should be familiar with as user commands. Several commonly used Emacs functions use regions, which are areas of text within a buffer. When you are using Emacs, you delineate regions by setting the mark and moving the cursor. However, region-oriented functions (such as kill-region, indent-region, and shell-command-on-region—really, any function with region in its name) are actually more flexible when used within Emacs Lisp code. They typically take two integer arguments that are used as the character positions of the boundaries for the region on which they operate. These arguments default to the values of point and mark when the functions are called interactively.
Obviously, allowing point and mark as interactive defaults is a more general (and thus more desirable) approach than one in which only point and mark can be used to delineate regions. The r option to the interactive function makes it possible. For example, if we wanted to write the function translate-region-into-German, here is how we would start:
(defun translate-region-into-German (start end)
(interactive "r")
...
The r option to interactive fills in the two arguments start and end when the function is called interactively, but if it is called from other Lisp code, both arguments must be supplied. The usual way to do this is like this:
(translate-region-into-German (point) (mark))
But you need not call it in this way. If you wanted to use this function to write another function called translate-buffer-into-German, you would only need to write the following as a "wrapper":
(defun translate-buffer-into-German ( )
(translate-region-into-German (point-min) (point-max)))
In fact, it is best to avoid using point and mark within Lisp code unless doing so is really necessary; use local variables instead. Try not to write Lisp functions as lists of commands a user would invoke; that sort of behavior is better suited to macros (see Chapter 6).
Regular expressions (regexps) provide much more powerful ways of dealing with text. Although most beginning Emacs users tend to avoid commands that use regexps, like replace-regexp and re-search-forward, regular expressions are widely used within Lisp code. Such modes as Dired and the programming language modes would be unthinkable without them. Regular expressions require time and patience to become comfortable with, but doing so is well worth the effort for Lisp programmers, because they are one of the most powerful features of Emacs, and many things are not practical to implement in any other way.
One trick that can be useful when you are experimenting with regular expressions and trying to get the hang of them is to type some text into a scratch buffer that corresponds to what you're trying to match, and then use isearch-forward-regexp (C-M-s) to build up the regular expression. The interactive, immediate feedback of an incremental search can show you the pieces of the regular expression in action in a way that is completely unique to Emacs.
We introduce the various features of regular expressions by way of a few examples of search-and-replace situations; such examples are easy to explain without introducing lots of extraneous details. Afterward, we describe Lisp functions that go beyond simple search-and-replace capabilities with regular expressions. The following are examples of searching and replacing tasks that the normal search/replace commands can't handle or handle poorly:
• You are developing code in C, and you want to combine the functionality of the functions
read
and readfile
into a new function called get
. You want to replace all references to these functions with references to the new one.
• You are writing a troff document using outline mode, as described in Chapter 7. In outline mode, headers of document sections have lines that start with one or more asterisks. You want to write a function called remove-outline-marks to get rid of these asterisks so that you can run troff on your file.
• You want to change all occurrences of program in a document, including programs and program's, to module/modules/module's, without changing programming to moduleming or programmer to modulemer.
• You are working on documentation for some C software that is being rewritten in Java. You want to change all the filenames in the documentation from
• You just installed a new C++ compiler that prints error messages in German. You want to modify the Emacs compile package so that it can parse the error messages correctly (see the end of Chapter 9).
We will soon show how to use regular expressions to deal with these examples, which we refer to by number. Note that this discussion of regular expressions, although more comprehensive than that in Chapter 3, does not cover every feature; those that it doesn't cover are redundant with other features or relate to concepts that are beyond the scope of this book. It is also important to note that the regular expression syntax described here is for use with Lisp strings only; there is an important difference between the regexp syntax for Lisp strings and the regexp syntax for user commands (like replace-regexp), as we will see.
Regular expressions began as an idea in theoretical computer science, but they have found their way into many nooks and crannies of everyday, practical computing. The syntax used to represent them may vary, but the concepts are much the same everywhere. You probably already know a subset of regular expression notation: the wildcard characters used by the Unix shell or Windows command prompt to match filenames. The Emacs notation is a bit different; it is similar to those used by the language Perl, editors like ed and vi and Unix software tools like lex and grep. So let's start with the Emacs regular expression operators that resemble Unix shell wildcard character, which are listed in Table 11-5.
Table 11-5. Basic regular expression operators
Emacs operator | Equivalent | Function |
---|---|---|
|
|
Matches any character. |
|
|
Matches any string. |
|
|
Matches , , or . |
|
|
Matches any lowercase letter. |
For example, to match all filenames beginning with program in the Unix shell, you would specify
program*
. In Emacs, you would say program.*
. To match all filenames beginning with a through e in the shell, you would use [a-e]*
or [abcde]*
; in Emacs, it's [a-e].*
or [abcde].*
. In other words, the dash within the brackets specifies a range of characters.[78] We will provide more on ranges and bracketed character sets shortly.
To specify a character that is used as a regular expression operator, you need to precede it with a double-backslash, as in
\\*
to match an asterisk. Why a double backslash? The reason has to do with the way Emacs Lisp reads and decodes strings. When Emacs reads a string in a Lisp program, it decodes the backslash-escaped characters and thus turns double backslashes into single backslashes. If the string is being used as a regular expression—that is, if it is being passed to a function that expects a regular expression argument—that function uses the single backslash as part of the regular expression syntax. For example, given the following line of Lisp:
(replace-regexp "fred\\*" "bob*")
the Lisp interpreter decodes the string
fred\\*
as fred\*
and passes it to the replace-regexp command. The replace-regexp command understands fred\*
to mean fred
followed by a (literal) asterisk. Notice, however, that the second argument to replace-regexp is not a regular expression, so there is no need to backslash-escape the asterisk in bob*
at all. Also notice that if you were to invoke the this as a user command, you would not need to double the backslash, that is, you would type M-x replace-regexp Enter followed by fred\* and bob*. Emacs decodes strings read from the minibuffer differently.
The
*
regular expression operator in Emacs (by itself) actually means something different from the *
in the Unix shell: it means "zero or more occurrences of whatever is before the *
." Thus, because . matches any character, .*
means "zero or more occurrences of any character," that is, any string at all, including the empty string. Anything can precede a *
: for example, read*
matches "rea" followed by zero or more d's; file[0-9]*
matches "file" followed by zero or more digits.
Two operators are closely related to
*
. The first is +
, which matches one or more occurrences of whatever precedes it. Thus, read+
matches "read" and "readdddd" but not "rea," and file[0-9]+
requires that there be at least one digit after "file." The second is ?
, which matches zero or one occurrence of whatever precedes it (i.e., makes it optional). html?
matches "htm" or "html," and file[0-9]?
matches "file" followed by one optional digit.
Before we move on to other operators, a few more comments about character sets and ranges are in order. First, you can specify more than one range within a single character set. The set
[A-Za-z]
can thus be used to specify all alphabetic characters; this is better than the nonportable [A-z]
. Combining ranges with lists of characters in sets is also possible; for example, [A-Za-z_]
means all alphabetic characters plus underscore, that is, all characters allowed in the names of identifiers in C. If you give ^
as the first character in a set, it acts as a "not" operator; the set matches all characters that aren't the characters after the ^
. For example, [^A-Za-z]
matches all nonalphabetic characters.
A
^
anywhere other than first in a character set has no special meaning; it's just the caret character. Conversely, -
has no special meaning if it is given first in the set; the same is true for ]
. However, we don't recommend that you use this shortcut; instead, you should double-backslash-escape these characters just to be on the safe side. A double backslash preceding a nonspecial character usually means just that character—but watch it! A few letters and punctuation characters are used as regular expression operators, some of which are covered in the following section. We list "booby trap" characters that become operators when double-backslash-escaped later. The ^
character has a different meaning when used outside of ranges, as we'll see soon.
If you want to get
*
, +
, or ?
to operate on more than one character, you can use the \\(
and \\)
operators for grouping. Notice that, in this case (and others to follow), the backslashes are part of the operator. (All of the nonbasic regular expression operators include backslashes so as to avoid making too many characters "special." This is the most profound way in which Emacs regular expressions differ from those used in other environments, like Perl, so it's something to which you'll need to pay careful attention.) As we saw before, these characters need to be double-backslash-escaped so that Emacs decodes them properly. If one of the basic operators immediately follows \\)
, it works on the entire group inside the \\(
and \\)
. For example, \\(read\\)*
matches the empty string, "read," "readread," and so on, and read\\(file\\)?
matches "read" or "readfile." Now we can handle Example 1, the first of the examples given at the beginning of this section, with the following Lisp code:
(replace-regexp "read\\(file\\)?" "get")
The alternation operator
\\|
is a "one or the other" operator; it matches either whatever precedes it or whatever comes after it. \\|
treats parenthesized groups differently from the basic operators. Instead of requiring parenthesized groups to work with subexpressions of more than one character, its "power" goes out to the left and right as far as possible, until it reaches the beginning or end of the regexp, a \\(
, a \\)
, or another \\|
. Some examples should make this clearer:
•
read\\|get
matches "read" or "get"
•
readfile\\|read\\|get
matches "readfile", "read," or "get"
•
\\(read\\|get\\)file
matches "readfile" or "getfile"
In the first example, the effect of the
\\|
extends to both ends of the regular expression. In the second, the effect of the first \\|
extends to the beginning of the regexp on the left and to the second \\|
on the right. In the third, it extends to the backslash-parentheses.
Another important category of regular expression operators has to do with specifying the context of a string, that is, the text around it. In Chapter 3 we saw the word-search commands, which are invoked as options within incremental search. These are special cases of context specification; in this case, the context is word-separation characters, for example, spaces or punctuation, on both sides of the string.
The simplest context operators for regular expressions are
^
and $
, two more basic operators that are used at the beginning and end of regular expressions respectively. The ^
operator causes the rest of the regular expression to match only if it is at the beginning of a line; $
causes the regular expression preceding it to match only if it is at the end of a line. In Example 2, we need a function that matches occurrences of one or more asterisks at the beginning of a line; this will do it:
(defun remove-outline-marks ( )
"Remove section header marks created in outline-mode."
(interactive)
(replace-regexp "^\\*+" ""))
This function finds lines that begin with one or more asterisks (the
\\*
is a literal asterisk and the +
means "one or more"), and it replaces the asterisk(s) with the empty string "", thus deleting them.
Note that
^
and $
can't be used in the middle of regular expressions that are intended to match strings that span more than one line. Instead, you can put \n
(for Newline) in your regular expressions to match such strings. Another such character you may want to use is \t
for Tab. When ^
and $
are used with regular expression searches on strings instead of buffers, they match beginning- and end-of-string, respectively; the function string-match, described later in this chapter, can be used to do regular expression search on strings.
Here is a real-life example of a complex regular expression that covers the operators we have seen so far: sentence-end, a variable Emacs uses to recognize the ends of sentences for sentence motion commands like forward-sentence (M-e). Its value is:
"[.?!][]\"')}]*\\($\\|\t\\| \\)[ \t\n]*"
Let's look at this piece by piece. The first character set,
[.?!]
, matches a period, question mark, or exclamation mark (the first two of these are regular expression operators, but they have no special meaning within character sets). The next part, []\"')}]*
, consists of a character set containing right bracket, double quote, single quote, right parenthesis, and right curly brace. A *
follows the set, meaning that zero or more occurrences of any of the characters in the set matches. So far, then, this regexp matches a sentence-ending punctuation mark followed by zero or more ending quotes, parentheses, or curly braces. Next, there is the group \\($\\|\t\\| \\)
, which matches any of the three alternatives $
(end of line), Tab
, or two spaces. Finally, [ \t\n]*
matches zero or more spaces, tabs, or newlines. Thus the sentence-ending characters can be followed by end-of-line or a combination of spaces (at least two), tabs, and newlines.
There are other context operators besides
^
and $
; two of them can be used to make regular expression search act like word search. The operators \\<
and \\>
match the beginning and end of a word, respectively. With these we can go part of the way toward solving Example 3. The regular expression \\
matches "program" but not "programmer" or "programming" (it also won't match "microprogram"). So far so good; however, it won't match "program's" or "programs." For this, we need a more complex regular expression:
\\
This expression means, "a word beginning with program followed optionally by apostrophe s or just s." This does the trick as far as matching the right words goes.
There is still one piece missing: the ability to replace "program" with "module" while leaving any
s
or 's
untouched. This leads to the final regular expression feature we will cover here: the ability to retrieve portions of the matched string for later use. The preceding regular expression is indeed the correct one to give as the search string for replace-regexp. As for the replace string, the answer is module\\1
; in other words, the required Lisp code is:
(replace-regexp "\\" "module\\1")
The
\\1
means, in effect, "substitute the portion of the matched string that matched the subexpression inside the \\(
and \\)
." It is the only regular-expression-related operator that can be used in replacements. In this case, it means to use 's
in the replace string if the match was "program's," s
if the match was "programs," or nothing if the match was just "program." The result is the correct substitution of "module" for "program," "modules" for "programs," and "module's" for "program's."
Another example of this feature solves Example 4. To match filenames
(replace-regexp "\\([a-zA-Z0-9_]+\\)\\.c" "\\1.java")
Remember that
\\.
means a literal dot (.). Note also that the filename pattern (which matches a series of one or more alphanumerics or underscores) was surrounded by \\(
and \\)
in the search string for the sole purpose of retrieving it later with \\1
.
Actually, the
\\1
operator is only a special case of a more powerful facility (as you may have guessed). In general, if you surround a portion of a regular expression with \\(
and \\)
, the string matching the parenthesized subexpression is saved. When you specify the replace string, you can retrieve the saved substrings with \\n
, where n
is the number of the parenthesized subexpression from left to right, starting with 1. Parenthesized expressions can be nested; their corresponding \\n
numbers are assigned in order of their \\(
delimiter from left to right.
Lisp code that takes full advantage of this feature tends to contain complicated regular expressions. The best example of this in Emacs's own Lisp code is compilation-error-regexp-alist, the list of regular expressions the compile package (discussed in Chapter 9) uses to parse error messages from compilers. Here is an excerpt, adapted from the Emacs source code (it's become much too long to reproduce in its entirety; see below for some hints on how to find the actual file to study in its full glory):
(defvar compilation-error-regexp-alist
'(
;; NOTE! See also grep-regexp-alist, below.
;; 4.3BSD grep, cc, lint pass 1:
;; /usr/src/foo/foo.c(8): warning: w may be used before set
;; or GNU utilities:
;; foo.c:8: error message
;; or HP-UX 7.0 fc:
;; foo.f :16 some horrible error message
;; or GNU utilities with column (GNAT 1.82):
;; foo.adb:2:1: Unit name does not match file name
;; or with column and program name:
;; jade:dbcommon.dsl:133:17:E: missing argument for function call
;;
;; We'll insist that the number be followed by a colon or closing
;; paren, because otherwise this matches just about anything
;; containing a number with spaces around it.
;; We insist on a non-digit in the file name
;; so that we don't mistake the file name for a command name
;; and take the line number as the file name.
("\\([a-zA-Z][-a-zA-Z._0-9]+: ?\\)?\
\\([a-zA-Z]?:?[^:( \t\n]*[^:( \t\n0-9][^:( \t\n]*\\)[:(][ \t]*\\([0-9]+\\)\
\\([) \t]\\|:\\(\\([0-9]+:\\)\\|[0-9]*[^:0-9]\\)\\)" 2 3 6)
;; Microsoft C/C++:
;; keyboard.c(537) : warning C4005: 'min' : macro redefinition
;; d:\tmp\test.c(23) : error C2143: syntax error : missing ';' before 'if'
;; This used to be less selective and allow characters other than
;; parens around the line number, but that caused confusion for
;; GNU-style error messages.
;; This used to reject spaces and dashes in file names,
;; but they are valid now; so I made it more strict about the error
;; message that follows.
("\\(\\([a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\)) \
: \\(error\\|warning\\) C[0-9]+:" 1 3)
;; Caml compiler:
;; File "foobar.ml", lines 5-8, characters 20-155: blah blah
("^File \"\\([^,\" \n\t]+\\)\", lines? \\([0-9]+\\)[-0-9]*, characters? \
\\([0-9]+\\)" 1 2 3)
;; Cray C compiler error messages
("\\(cc\\| cft\\)-[0-9]+ c\\(c\\|f77\\): ERROR \\([^,\n]+, \\)* File = \
\\([^,\n]+\\), Line = \\([0-9]+\\)" 4 5)
;; Perl -w:
;; syntax error at automake line 922, near "':'"
;; Perl debugging traces
;; store::odrecall('File_A', 'x2') called at store.pm line 90
(".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]" 1 2)
;; See http://ant.apache.org/faq.html
;; Ant Java: works for jikes
("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):\\([0-9]+\\):[0-9]+:[0-9]\
+:" 1 2 3)
;; Ant Java: works for javac
("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):" 1 2)
)
This is a list of elements that have at least three parts each: a regular expression and two numbers. The regular expression matches error messages in the format used by a particular compiler or tool. The first number tells Emacs which of the matched subexpressions contains the filename in the error message; the second number designates which of the subexpressions contains the line number. (There can also be additional parts at the end: a third number giving the position of the column number of the error, if any, and any number of format strings used to generate the true filename from the piece found in the error message, if needed. For more details about these, look at the actual file, as described below.)
For example, the element in the list dealing with Perl contains the regular expression:
".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]"
followed by 1 and 2, meaning that the first parenthesized subexpression contains the filename and the second contains the line number. So if you have Perl's warnings turned on—you always do, of course—you might get an error message such as this:
syntax error at monthly_orders.pl line 1822, near "$"
The regular expression ignores everything up to at. Then it finds monthly_orders.pl, the filename, as the match to the first subexpression "
[^ \n]+
" (one or more nonblank, nonnewline characters), and it finds 1822, the line number, as the match to the second subexpression "[0-9]+
" (one or more digits).
For the most part, these regular expressions are documented pretty well in their definitions. Understanding them in depth can still be a challenge, and writing them even more so! Suppose we want to tackle Example 5 by adding an element to this list for our new C++ compiler that prints error messages in German. In particular, it prints error messages like this:
Fehler auf Zeile linenum in filename: text of error message
Here is the element we would add to compilation-error-regexp-alist:
("Fehler auf Zeile \\([0-9]+\\) in \\([^: \t]+\\):" 2 1)
In this case, the second parenthesized subexpression matches the filename, and the first matches the line number.
To add this to compilation-error-regexp-alist, we need to put this line in .emacs:
(setq compilation-error-regexp-alist
(cons '("Fehler auf Zeile \\([0-9]+\\) in \\([^: \t]+\\):" 2 1)
compilation-error-regexp-alist))
Notice how this example resembles our example (from Chapter 9) of adding support for a new language mode to auto-mode-alist.
Table 11-6 concludes our discussion of regular expression operators with a reference list of all the operators covered.
Table 11-6. Regular expression operators
Operator | Function |
---|---|
|
Match any character. |
|
Match 0 or more occurrences of preceding char or group. |
|
Match 1 or more occurrences of preceding char or group. |
|
Match 0 or 1 occurrences of preceding char or group. |
|
Set of characters; see below. |
|
Begin a group. |
|
End a group. |
|
Match the subexpression before or after \\|. |
|
At beginning of regexp, match beginning of line or string. |
|
At end of regexp, match end of line or string. |
|
Match Newline within a regexp. |
|
Match Tab within a regexp. |
|
Match beginning of word. |
|
Match end of word. |
The following operators are meaningful within character sets: | |
|
At beginning of set, treat set as chars not to match. |
(dash) |
Specify range of characters. |
The following is also meaningful in regexp replace strings: | |
|
Substitute portion of match within the th and , counting from left to right, starting with 1. |
Finally, the following characters are operators (not discussed here) when double-backslash-escaped:
b
, B
, c
, C
, w
, W
, s
, S
, =
, _
, '
, and `
. Thus, these are "booby traps" when double-backslash-escaped. Some of these behave similarly to the character class aliases you may have encountered in Perl and Java regular expressions.
As mentioned above, the full auto-mode-alist has a lot more entries and documentation than fit in this book. The compile.el module in which it is defined also contains functions that use it. One of the best ways to learn how to use Emacs Lisp (as well as discovering things you might not have even realized you can do) is to browse through the implementations of standard modules that are similar to what you're trying to achieve, or that are simply interesting. But how do you find them?
The manual way is to look at the value of the variable load-path. This is the variable Emacs consults when it needs to load a library file itself, so any library you're looking for must be in one of these directories. (This variable is discussed further in the final section of this chapter.) The problem, as you will see if you look at the current value of the variable, is that it contains a large number of directories for you to wade through, which would be pretty tedious each time you're curious about a library. (An easy way to see the variable's value is through Help's "Describe variable" feature, C-h v.)
One of the authors wrote the command listed in Example 11-1 to address this problem and uses it regularly to easily snoop on the source files that make much of Emacs run. If you don't want to type this entire function into your .emacs by hand, you can download it from this book's web site, http://www.oreilly.com/catalog/gnu3.
Example 11-1. find-library-file
(defun find-library-file (library)
"Takes a single argument LIBRARY, being a library file to search for.
Searches for LIBRARY directly (in case relative to current directory,
or absolute) and then searches directories in load-path in order. It
will test LIBRARY with no added extension, then with .el, and finally
with .elc. If a file is found in the search, it is visited. If none
is found, an error is signaled. Note that order of extension searching
is reversed from that of the load function."
(interactive "sFind library file: ")
(let ((path (cons "" load-path)) exact match elc test found)
(while (and (not match) path)
(setq test (concat (car path) "/" library)
match (if (condition-case nil
(file-readable-p test)
(error nil))
test)
path (cdr path)))
(setq path (cons "" load-path))
(or match
(while (and (not elc) path)
(setq test (concat (car path) "/" library ".elc")
elc (if (condition-case nil
(file-readable-p test)
(error nil))
test)
path (cdr path))))
(setq path (cons "" load-path))
(while (and (not match) path)
(setq test (concat (car path) "/" library ".el")
match (if (condition-case nil
(file-readable-p test)
(error nil))
test)
path (cdr path)))
(setq found (or match elc))
(if found
(progn
(find-file found)
(and match elc
(message "(library file %s exists)" elc)
(sit-for 1))
(message "Found library file %s" found))
(error "Library file \"%s\" not found." library))))
Once this command is defined, you can visit any library's implementation by typing M-x find-library file Enter
Enter. If you use it as often as this author does, you too may find it worth binding to a key sequence. We won't present a detailed discussion of how this function works because it goes a bit deeper than this chapter, but if you're curious about what some of the functions do, you can put your cursor in the function name in a Lisp buffer and use the Help system's "Describe function" (C-h f) feature to get more information about it.libraryname
If you find that most of the time when you ask for a library, you end up with a file containing a lot of cryptic numeric codes and no comments, check if the filename ends in .elc. If that is usually what you end up with, it means that only the byte-compiled versions of the libraries (see the discussion at the end of this chapter) have been installed on your system. Ask your system administrator if you can get the source installed; that's an important part of being able to learn and tweak the Emacs Lisp environment.
The functions re-search-forward, re-search-backward, replace-regexp, query-replace-regexp, highlight-regexp, isearch-forward-regexp, and isearch-backward-regexp are all user commands that use regular expressions, and they can all be used within Lisp code (though it is hard to imagine incremental search being used within Lisp code). The section on customizing major modes later in this chapter contains an example function that uses re-search-forward. To find other commands that use regexps you can use the "apropos" help feature (C-h a regexp Enter).
Other such functions aren't available as user commands. Perhaps the most widely used one is looking-at. This function takes a regular expression argument and does the following: it returns
t
if the text after point matches the regular expression (nil
otherwise); if there was a match, it saves the pieces surrounded by \\(
and \\)
for future use, as seen earlier. The function string-match is similar: it takes two arguments, a regexp and a string. It returns the starting index of the portion of the string that matches the regexp, or nil
if there is no match.
The functions match-beginning and match-end can be used to retrieve the saved portions of the matched string. Each takes as an argument the number of the matched expression (as in
\\n
in replace-regexp replace strings) and returns the character position in the buffer that marks the beginning (for match-beginning) or end (for match-end) of the matched string. With the argument 0
, the character position that marks the beginning/end of the entire string matched by the regular expression is returned.
Two more functions are needed to make the above useful: we need to know how to convert the text in a buffer to a string. No problem: buffer-string returns the entire buffer as a string; buffer-substring takes two integer arguments, marking the beginning and end positions of the substring desired, and returns the substring.
With these functions, we can write a bit of Lisp code that returns a string containing the portion of the buffer that matches the
th parenthesized subexpression:n
(buffer-substring (match-beginning n (match-end n)))
In fact, this construct is used so often that Emacs has a built-in function, match-string, that acts as a shorthand;
(match-string n)
returns the same result as in the previous example.
An example should show how this capability works. Assume you are writing the Lisp code that parses compiler error messages, as in our previous example. Your code goes through each element in compilation-error-regexp-alist, checking if the text in a buffer matches the regular expression. If it matches, your code needs to extract the filename and the line number, visit the file, and go to the line number.
Although the code for going down each element in the list is beyond what we have learned so far, the routine basically looks like this:
for each element in compilation-error-regexp-alist
(let ((regexp the regexp in the element)
(file-subexp the number of the filename subexpression)
(line-subexp the number of the line number subexpression))
(if (looking-at regexp)
(let ((filename (match-string file-subexp))
(linenum (match-string line-subexp)))
(find-file-other-window filename)
(goto-line linenum))
(otherwise, try the next element in the list)))
The second let extracts the filename from the buffer from the beginning to the end of the match to the
file-subexp
-th subexpression, and it extracts the line number similarly from the line-subexp
-th subexpression (and converts it from a string to a number). Then the code visits the file (in another window, not the same one as the error message buffer) and goes to the line number where the error occurred.
The code for the calculator mode later in this chapter contains a few other examples of looking-at, match-beginning, and match-end.
Emacs contains hundreds of built-in functions that may be of use to you in writing Lisp code. Yet finding which one to use for a given purpose is not so hard.
The first thing to realize is that you will often need to use functions that are already accessible as keyboard commands. You can use these by finding out what their function names are via the C-h k (for describe-key) command (see Chapter 14). This gives the command's full documentation, as opposed to C-h c (for describe-key-briefly), which gives only the command's name. Be careful: in a few cases, some common keyboard commands require an argument when used as Lisp functions. An example is forward-word; to get the equivalent of typing M-f, you have to use
(forward-word 1)
.
Another powerful tool for getting the right function for the job is the command-apropos (C-h a) help function. Given a regular expression, this help function searches for all commands that match it and display their key bindings (if any) and documentation in a
*Help*
window. This can be a great help if you are trying to find a command that does a certain "basic" thing. For example, if you want to know about commands that operate on words, type C-h a
followed by word
, and you will see documentation on about a dozen and a half commands having to do with words.
The limitation with command-apropos is that it gives information only on functions that can be used as keyboard commands. Even more powerful is apropos, which is not accessible via any of the help keys (you must type M-x apropos Enter). Given a regular expression, apropos displays all functions, variables, and other symbols that match it. Be warned, though: apropos can take a long time to run and can generate very long lists if you use it with a general enough concept (such as buffer).
You should be able to use the apropos commands on a small number of well-chosen keywords and find the function(s) you need. Because, if a function seems general and basic enough, the chances are excellent that Emacs has it built-in.
After you find the function you are interested in, you may find that the documentation that apropos prints does not give you enough information about what the function does, its arguments, how to use it, or whatever. The best thing to do at this point is to search Emacs's Lisp source code for examples of the function's use. "A Treasure Trove of Examples" earlier in this chapter provides ways of finding out the names of directories Emacs loads libraries from and an easy way of looking at a library once you know its name. To search the contents of the library files you'll need to use grep or some other search facility to find examples, then edit the files found to look at the surrounding context. If you're ambitious you could put together the examples and concepts we've discussed so far to write an extension of the find-library-file command that searches the contents of the library files in each directory on the load path! Although most of Emacs's built-in Lisp code is not profusely documented, the examples of function use that it provides should be helpful—and may even give you ideas for your own functions.
By now, you should have a framework of Emacs Lisp that should be sufficient for writing many useful Emacs commands. We have covered examples of various kinds of functions, both Lisp primitives and built-in Emacs functions. You should be able to extrapolate many others from the ones given in this chapter along with help techniques such as those just provided. In other words, you are well on your way to becoming a fluent Emacs Lisp programmer. To test yourself, start with the code for count-words-buffer and try writing the following functions:
count-lines-buffer
Print the number of lines in the buffer.
count-words-region
Print the number of words in a region.
what-line
Print the number of the line point is currently on.
You're probably starting to see how all these tools can be put together in really powerful ways. Most of the rest of the chapter consists of examples of building relatively real and useful new features for Emacs. You can use them as learning tools for how to build your own, and you may be able to use them as-is, or with a little tweaking, in your own daily work.
The example we're about to look at is something that one of the authors developed over a decade ago to help with the tedium of creating new files in development projects where a certain amount of structure and standard documentation were always needed. Many coding and writing projects have this characteristic; each file needs some boilerplate, but it needs to be adjusted to the details of the file. Emacs turned out to be very much up to the task of automating a lot of the drudge work, and this template system has been heavily used ever since.
Most of the code in this example should already make sense to you. A couple of aspects that will be explained more thoroughly in the next section about programming a major mode. In particular, don't worry too much yet about exactly what a "hook" function is, or funcall. For now it's sufficient to know that the file-not-found-hook allows us to run code when the user uses find-file to open a file that doesn't exist yet (exactly the time at which we'd like to offer our template services).
Before launching into the code, it's worth looking at an example of it in action. You'd set up your template by creating a file named file-template-java at the top level of a Java project directory hierarchy, containing something like the code shown in Example 11-2.
Example 11-2. file-template-java
/* %filename%
* Created on %date%
*
* (c) 2004 MyCorp, etc. etc.
*/
%package%
import org.apache.log4j.Logger;
/**
* [Documentation Here!]
*
* @author %author%
* @version $Id: ch11.xml,v 1.4 2004/12/17 16:10:05 kend Exp $
*
**/
public class %class% {
/**
* Provides access to the CVS version of this class.
**/
public static final String VERSION =
"$Id: ch11.xml,v 1.4 2004/12/17 16:10:05 kend Exp $";
/**
* Provides hierarchical control and configuration of debugging via
* class package structure.
**/
private static Logger log =
Logger.getLogger(%class%.class);
}
The template system shown in Example 11-3 causes an attempt to find a nonexistent Java source file within this project hierarchy (for example, via C-x C-f src/com/mycorp/util/FooManager.java) to result in the prompt Start with template file? (y or n) in the minibuffer, and if you answer y, you'll see your FooManager.java buffer start out with contents in the following example.
Example 11-3. FooManager.java
/* FooManager.java
* Created on Sun Nov 9 20:56:12 2003
*
* (c) 2004 MyCorp, etc. etc.
*/
package com.mycorp.util;
import org.apache.log4j.Logger;
/**
* [Documentation Here!]
*
* @author Jim Elliott
* @version $Id: ch11.xml,v 1.4 2004/12/17 16:10:05 kend Exp $
*
**/
public class FooManager {
/**
* Provides access to the CVS version of this class.
**/
public static final String VERSION =
"$Id: ch11.xml,v 1.4 2004/12/17 16:10:05 kend Exp $";
/**
* Provides hierarchical control and configuration of debugging via
* class package structure.
**/
private static Logger log =
Logger.getLogger(FooManager.class);
}
The template has been used to populate the buffer with the standard project header comments and a basic Java class skeleton, with proper contextual values filled in (such as the current time, the person creating the file, the file and class name, and so on). Even the Java
package
statement has been inferred by examining the directory path in which the source file is being created. The Logger
declaration will look familiar to anyone who uses the excellent log4j system to add logging and debugging to their Java projects. (The strange version numbers in "$Id
" strings are managed by the CVS version control system and will be updated to the proper file and version information when it's checked in. This topic is discussed in Chapter 12.)
To make this work, the template system needs to be able to do a couple of things:
• Intercept the user's attempt to find a nonexistent file.
• Check whether there is an appropriate template file somewhere in a parent directory.
• If so, offer to use it, and populate the buffer with the contents of the template file.
• Scan the template file for special placeholders (such as
%filename%
) and replace them with information about the file being created.
Let's look at the source code that makes this all happen! (As always, if you don't want to type the code listed in Example 11-4 yourself, you can download it from this book's web site.[79])
Example 11-4. template.el
;;;;;;;;;;;;;;;;;;;;;;;;;;; -*- Mode: Emacs-Lisp -*- ;;;;;;;;;;;;;;;;;;;;;;;;
;; template.el --- Routines for generating smart skeletal templates for files.
(defvar template-file-name "file-template"
"*The name of the file to look for when a find-file request fails. If a
file with the name specified by this variable exists, offer to use it as
a template for creating the new file. You can also have mode-specific
templates by appending \"-extension\" to this filename, e.g. a Java specific
template would be file-template-java.")
(defvar template-replacements-alist
'(("%filename%" . (lambda ( )
(file-name-nondirectory (buffer-file-name))))
("%creator%" . user-full-name)
("%author%" . user-full-name)
("%date%" . current-time-string)
("%once%" . (lambda ( ) (template-insert-include-once)))
("%package%" . (lambda ( ) (template-insert-java-package)))
("%class%" . (lambda ( ) (template-insert-class-name)))
)
"A list which specifies what substitutions to perform upon loading a
template file. Each list element consists of a string, which is the target
to be replaced if it is found in the template, paired with a function,
which is called to generate the replacement value for the string.")
(defun find-template-file ( )
"Searches the current directory and its parents for a file matching
the name configured for template files. The name of the first such
readable file found is returned, allowing for hierarchical template
configuration. A template file with the same extension as the file
being loaded (using a \"-\" instead of a \".\" as the template file's
delimiter, to avoid confusing other software) will take precedence
over an extension-free, generic template."
(let ((path (file-name-directory (buffer-file-name)))
(ext (file-name-extension (buffer-file-name)))
attempt result)
(while (and (not result) (> (length path) 0))
(setq attempt (concat path template-file-name "-" ext))
(if (file-readable-p attempt)
(setq result attempt)
(setq attempt (concat path template-file-name))
(if (file-readable-p attempt)
(setq result attempt)
(setq path (if (string-equal path "/")
""
(file-name-directory (substring path 0 -1)))))))
result))
(defun template-file-not-found-hook ( )
"Called when a find-file command has not been able to find the specified
file in the current directory. Sees if it makes sense to offer to start it
based on a template."
(condition-case nil
(if (and (find-template-file)
(y-or-n-p "Start with template file? "))
(progn (buffer-disable-undo)
(insert-file (find-template-file))
(goto-char (point-min))
;; Magically do the variable substitutions
(let ((the-list template-replacements-alist))
(while the-list
(goto-char (point-min))
(replace-string (car (car the-list))
(funcall (cdr (car the-list)))
nil)
(setq the-list (cdr the-list))))
(goto-char (point-min))
(buffer-enable-undo)
(set-buffer-modified-p nil)))
;; This is part of the condition-case; it catches the situation where
;; the user has hit C-g to abort the find-file (since they realized
;; that they didn't mean it) and deletes the buffer that has already
;; been created to go with that file, since it will otherwise become
;; mysterious clutter they may not even know about.
('quit (kill-buffer (current-buffer))
(signal 'quit "Quit"))))
; Install the above routine
(or (memq 'template-file-not-found-hook find-file-not-found-hooks)
(setq find-file-not-found-hooks
(append find-file-not-found-hooks '(template-file-not-found-hook)))
)
(defun template-insert-include-once ( )
"Returns preprocessor directives such that the file will be included
only once during a compilation process which includes it an
arbitrary number of times."
(let ((name (file-name-nondirectory (buffer-file-name)))
basename)
(if (string-match ".h$" name)
(progn
(setq basename (upcase (substring name 0 -2)))
(concat "#ifndef _H_" basename "\n#define _H_" basename
"\n\n\n#endif /* not defined _H_" basename " */\n"))
"" ; the "else" clause, returns an empty string.
)))
(defun template-insert-java-package ( )
"Inserts an appropriate Java package directive based on the path to
the current file name (assuming that it is in the com, org or net
subtree). If no recognizable package path is found, inserts nothing."
(let ((name (file-name-directory (buffer-file-name)))
result)
(if (string-match "/\\(com\\|org\\|net\\)/.*/$" name)
(progn
(setq result (substring name (+ (match-beginning 0) 1)
(- (match-end 0) 1)))
(while (string-match "/" result)
(setq result (concat (substring result 0 (match-beginning 0))
"."
(substring result (match-end 0)))))
(concat "package " result ";"))
"")))
(defun template-insert-class-name ( )
"Inserts the name of the java class being defined in the current file,
based on the file name. If not a Java source file, inserts nothing."
(let ((name (file-name-nondirectory (buffer-file-name))))
(if (string-match "\\(.*\\)\\.java" name)
(substring name (match-beginning 1) (match-end 1))
"")))
(provide 'template)
You'll notice that this code makes heavy use of the regular expression facilities, which is no surprise. The first section sets up some variables that configure the operation of the template system. template-file-name determines the file name (or prefix) that is used to search for templates; the default value of file-template is probably fine. template-replacements-alist sets up the standard placeholders, and the mechanism by which they get replaced by appropriate values. Adding entries to this list is one way to extend the system. Each entry consists of the placeholder to be replaced, followed by the Lisp function to be executed to produce its replacement. The way this function can be stored in a list and executed when appropriate later is one of the great things about Lisp and is discussed in more depth in the calculator mode example in the next section. The placeholders supported are:
%filename%
Gets replaced by the name of the file being created.
%creator%
, %author%
These are synonyms; both get replaced by the name of the user creating the file.
%date%
Turns into the current date and time when the file is created.
%once%
Expands into boilerplate code for the C preprocessor to cause a header file to include itself only once, even if it's been included multiple times by other header files. (This sort of thing has been taken care of in more modern environments like Objective C and Java but can still be handy when working with traditional C compilers.)
%package%
Is replaced by the Java package which contains the file being created (assuming the file is a Java class). This package is determined by examining the directory structure in which the file is being placed.
%class%
Becomes the name of the Java class being defined in the file, assuming it's a Java source file.
The first function, find-template-file, is responsible for searching the directory hierarchy above the file being created, looking for a file with the right name to be considered a file template (if template-file-name has been left at its default value, this looks for either a file named file-template or file-template-ext where ext is the extension at the end of the name of the file being created). It just keeps lopping the last directory off the path in which it's looking, starting with the location of the new file, and seeing if it can read a file with one of those names in the current directory, until it runs out of directories.
The function template-file-not-found-hook is the "main program" of the template system. It gets "hooked in" to the normal Emacs find-file process, and called whenever find-file doesn't find the file the user asked for (in other words, a new file is being created). It uses condition-case (a mechanism similar to exception handling in C++ and Java) to make sure it gets a chance to clean up after itself if the user cancels the process of filling in the template file. It checks whether the template file can be found, asks users if they want to use it, and (if they do) loads it into the new buffer and performs the placeholder substitutions. For an explanation of the list manipulation and funcall code that makes the substitutions work, read the discussion of Calculator mode in the next section. Finally, it jumps to the beginning of the new buffer and marks it as unchanged (because, as far as users are concerned, it's a brand new buffer on which they've not yet had to expend any effort).
Immediately after the function definition is the chunk of code that hooks it into the find-file mechanism. The file-not-found-hooks is a variable that Emacs uses to keep track of things to do when a requested file is not found. (Giving you opportunities to change or enhance normal behavior through "hooks" is a wonderful trait of Emacs that is discussed in more depth following the Calculator mode example later in this chapter.) Our code checks to make sure it's not already hooked up (so you don't end up having it run twice or more if you re-load the library file during an Emacs session), and then installs our hook at the end of the list if it's not there.
The rest of the file is helper functions to handle the more complex placeholders. template-insert-java-package figures out the value that should replace
%package%
, while template-insert-class-name figures out the Java class name that replaces %class%
.
The last function call in the file,
(provide 'template)
, records the fact that a "feature" named "template" has been loaded successfully. The provide function works with require to allow libraries to be loaded just once. When the function (require 'template)
is executed, Emacs checks whether the feature "template" has ever been provided. If it has, it does nothing, otherwise, it calls load-library to load it. It's a good practice to have your libraries support this mechanism, so that they can be gracefully and efficiently used by other libraries through the require mechanism. You'll find this pattern throughout the Emacs library sources.
After you get comfortable with Emacs Lisp programming, you may find that that "little extra something" you want Emacs to do takes the form of a major mode. In previous chapters, we covered major modes for text entry, word processor input, and programming languages. Many of these modes are quite complicated to program, so we'll provide a simple example of a major mode, from which you can learn the concepts needed to program your own. Then, in the following section, you will learn how you can customize existing major modes without changing any of the Lisp code that implements them.
We'll develop Calculator mode, a major mode for a calculator whose functionality will be familiar to you if you have used the Unix dc (desk calculator) command. It is a Reverse Polish (stack-based) calculator of the type made popular by Hewlett-Packard. After explaining some of the principal components of major modes and some interesting features of the calculator mode, we will give the mode's complete Lisp code.
A major mode has various components that integrate it into Emacs. Some are:
• The symbol that is the name of the function that implements the mode
• The name of the mode that appears in the mode line in parentheses
• The local keymap that defines key bindings for commands in the mode
• Variables and constants known only within the Lisp code for the mode
• The special buffer the mode may use
Let's deal with these in order. The mode symbol is set by assigning the name of the function that implements the mode to the global variable major-mode, as in:
(setq major-mode 'calc-mode)
Similarly, the mode name is set by assigning an appropriate string to the global variable
mode-name
, as in:
(setq mode-name "Calculator")
The local keymap is defined using functions discussed in Chapter 10. In the case of the calculator mode, there is only one key sequence to bind (C-j), so we use a special form of the make-keymap command called make-sparse-keymap that is more efficient with a small number of key bindings. To use a keymap as the local map of a mode, we call the function use-local-map, as in:
(use-local-map calc-mode-map)
As we just saw, variables can be defined by using
setq
to assign a value to them, or by using let to define local variables within a function. The more "official" way to define variables is the defvar function, which allows documentation for the variable to be integrated into online help facilities such as C-h v (for describe-variable). The format is the following:
(defvar varname initial-value "description of the variable")
A variation on this is
defconst
, with which you can define constant values (that never change). For example:
(defconst calc-operator-regexp "[-+*/%]"
"Regular expression for recognizing operators.")
defines the regular expression to be used in searching for arithmetic operators. As you will see, we use the calc- as a prefix for the names of all functions, variables, and constants that we define for the calculator mode. Other modes use this convention; for example, all names in C++ mode begin with
c++-
. Using this convention is a good idea because it helps avoid potential name clashes with the thousands of other functions, variables, and so on in Emacs.
Making variables local to the mode is also desirable so that they are known only within a buffer that is running the mode.[80] To do this, use the make-local-variable function, as in:
(make-local-variable 'calc-stack)
Notice that the name of the variable, not its value, is needed; therefore a single quote precedes the variable name, turning it into a symbol.
Finally, various major modes use special buffers that are not attached to files. For example, the C-x C-b (for list-buffers) command creates a buffer called
*Buffer List*
. To create a buffer in a new window, use the pop-to-buffer function, as in:
(pop-to-buffer "*Calc*")
There are a couple of useful variations on pop-to-buffer. We won't use them in our mode example, but they are handy in other circumstances.
switch-to-buffer
Same as the C-x b command covered in Chapter 4; can also be used with a buffer name argument in Lisp.
set-buffer
Used only within Lisp code to designate the buffer used for editing; the best function to use for creating a temporary "work" buffer within a Lisp function.
A Reverse Polish Notation calculator uses a data structure called a stack. Think of a stack as being similar to a spring-loaded dish stack in a cafeteria. When you enter a number into a RPN calculator, you push it onto the stack. When you apply an operator such as plus or minus, you pop the top two numbers off the stack, add or subtract them, and push the result back on the stack.
The list, a fundamental concept of Lisp, is a natural for implementing stacks. The list is the main concept that sets Lisp apart from other programming languages. It is a data structure that has two parts: the head and tail. These are known in Lisp jargon, for purely historical reasons, as car and cdr respectively. Think of these terms as "the first thing in the list" and "the rest of the list." The functions car and cdr, when given a list argument, return the head and tail of it, respectively.[81] Two functions are often used for making lists. cons (construct) takes two arguments, which become the head and tail of the list respectively. list takes a list of elements and makes them into a list. For example, this:
(list 2 3 4 5)
makes a list of the numbers from 2 to 5, and this:
(cons 1 (list 2 3 4 5))
makes a list of the numbers from 1 to 5. car applied to that list would return
1
, while cdr would return the list (2 3 4 5)
.
These concepts are important because stacks, such as that used in the calculator mode, are easily implemented as lists. To push the value of x onto the stack calc-stack, we can just say this:
(setq calc-stack (cons x calc-stack))
If we want to get at the value at the top of the stack, the following returns that value:
(car calc-stack)
To pop the top value off the stack, we say this:
(setq calc-stack (cdr calc-stack))
Bear in mind that the elements of a list can be anything, including other lists. (This is why a list is called a recursive data structure.) In fact (ready to be confused?) just about everything in Lisp that is not an atom is a list. This includes functions, which are basically lists of function name, arguments, and expressions to be evaluated. The idea of functions as lists will come in handy very soon.
The complete Lisp code for the calculator mode appears at the end of this section; you should refer to it while reading the following explanation. If you download or type the code in, you can use the calculator by typing M-x calc-mode Enter. You will be put in the buffer
*Calc*
. You can type a line of numbers and operators and then type C-j to evaluate the line. Table 11-7 lists the three commands in calculator mode.
Table 11-7. Calculator mode commands
Command | Action |
---|---|
|
Print the value at the top of the stack. |
|
Print the entire stack contents. |
|
Clear the stack. |
Blank spaces are not necessary, except to separate numbers. For example, typing this:
4 17*6-=
followed by C-j, evaluates (4 * 17) - 6 and causes the result, 62, to be printed.
The heart of the code for the calculator mode is the functions calc-eval and calc-next-token. (See the code at the end of this section for these.) calc-eval is bound to C-j in Calculator mode. Starting at the beginning of the line preceding C-j, it calls calc-next-token to grab each token (number, operator, or command letter) in the line and evaluate it.
calc-next-token uses a cond construct to see if there is a number, operator, or command letter at point by using the regular expressions calc-number-regexp, calc-operator-regexp, and calc-command-regexp. According to which regular expression was matched, it sets the variable calc-proc-fun to the name (symbol) of the function that should be run (either calc-push-number, calc-operate, or calc-command), and it sets
tok
to the result of the regular expression match.
In calc-eval, we see where the idea of a function as a list comes in. The funcall function reflects the fact that there is little difference between code and data in Lisp. We can put together a list consisting of a symbol and a bunch of expressions and evaluate it as a function, using the symbol as the function name and the expressions as arguments; this is what funcall does. In this case, the following:
(funcall calc-proc-fun tok)
treats the symbol value of calc-proc-fun as the name of the function to be called and calls it with the argument
tok
. Then the function does one of three things:
• If the token is a number, calc-push-number pushes the number onto the stack.
• If the token is an operator, calc-operate performs the operation on the top two numbers on the stack (see below).
• If the token is a command, calc-command performs the appropriate command.
The function calc-operate takes the idea of functions as lists of data a step further by converting the token from the user directly into a function (an arithmetic operator). This step is accomplished by the function read, which takes a character string and converts it into a symbol. Thus, calc-operate uses funcall and read in combination as follows:
(defun calc-operate (tok)
(let ((op1 (calc-pop))
(op2 (calc-pop)))
(calc-push (funcall (read tok) op2 op1))))
This function takes the name of an arithmetic operator (as a string) as its argument. As we saw earlier, the string
tok
is a token extracted from the *Calc*
buffer, in this case, an arithmetic operator such as +
or *
. The calc-operate function pops the top two arguments off the stack by using the pop function, which is similar to the use of cdr
earlier. read converts the token to a symbol, and thus to the name of an arithmetic function. So, if the operator is +
, then funcall is called as here:
(funcall '+ op2 op1)
Thus, the function + is called with the two arguments, which is exactly equivalent to simply (+ op2 op1). Finally, the result of the function is pushed back onto the stack.
All this voodoo is necessary so that, for example, the user can type a plus sign and Lisp automatically converts it into a plus function. We could have done the same thing less elegantly—and less efficiently—by writing calc-operate with a cond construct (as in calc-next-token), which would look like this:
(defun calc-operate (tok)
(let ((op1 (calc-pop))
(op2 (calc-pop)))
(cond ((equal tok "+")
(+ op2 op1))
((equal tok "-")
(- op2 op1))
((equal tok "*")
(* op2 op1))
((equal tok "/")
(/ op2 op1))
(t
(% op2 op1)))))
The final thing to notice in the calculator mode code is the function calc-mode, which starts the mode. It creates (and pops to) the
*Calc*
buffer. Then it kills all existing local variables in the buffer, initializes the stack to nil
(empty), and creates the local variable calc-proc-fun (see the earlier discussion). Finally it sets Calculator mode as the major mode, sets the mode name, and activates the local keymap.
Now you should be able to understand all of the code for the calculator mode. You will notice that there really isn't that much code at all! This is testimony to the power of Lisp and the versatility of built-in Emacs functions. Once you understand how this mode works, you should be ready to start rolling your own. Without any further ado, here is the code:
;; Calculator mode.
;;
;; Supports the operators +, -, *, /, and % (remainder).
;; Commands: ;; c clear the stack
;; = print the value at the top of the stack
;; p print the entire stack contents
;;
(defvar calc-mode-map nil
"Local keymap for calculator mode buffers.")
; set up the calculator mode keymap with
; C-j (linefeed) as "eval" key
(if calc-mode-map
nil
(setq calc-mode-map (make-sparse-keymap))
(define-key calc-mode-map "\C-j" 'calc-eval))
(defconst calc-number-regexp
"-?\\([0-9]+\\.?\\|\\.\\)[0-9]*\\(e[0-9]+\\)?"
"Regular expression for recognizing numbers.")
(defconst calc-operator-regexp "[-+*/%]"
"Regular expression for recognizing operators.")
(defconst calc-command-regexp "[c=ps]"
"Regular expression for recognizing commands.")
(defconst calc-whitespace "[ \t]"
"Regular expression for recognizing whitespace.")
;; stack functions
(defun calc-push (num)
(if (numberp num)
(setq calc-stack (cons num calc-stack))))
(defun calc-top ( )
(if (not calc-stack)
(error "stack empty.")
(car calc-stack)))
(defun calc-pop ( )
(let ((val (calc-top)))
(if val
(setq calc-stack (cdr calc-stack)))
val))
;; functions for user commands:
(defun calc-print-stack ( )
"Print entire contents of stack, from top to bottom."
(if calc-stack
(progn
(insert "\n")
(let ((stk calc-stack))
(while calc-stack
(insert (number-to-string (calc-pop)) " "))
(setq calc-stack stk)))
(error "stack empty.")))
(defun calc-clear-stack ( )
"Clear the stack."
(setq calc-stack nil)
(message "stack cleared."))
(defun calc-command (tok)
"Given a command token, perform the appropriate action."
(cond ((equal tok "c")
(calc-clear-stack))
((equal tok "=")
(insert "\n" (number-to-string (calc-top))))
((equal tok "p")
(calc-print-stack))
(t
(message (concat "invalid command: " tok)))))
(defun calc-operate (tok)
"Given an arithmetic operator (as string), pop two numbers
off the stack, perform operation tok (given as string), push
the result onto the stack."
(let ((op1 (calc-pop))
(op2 (calc-pop)))
(calc-push (funcall (read tok) op2 op1))))
(defun calc-push-number (tok)
"Given a number (as string), push it (as number)
onto the stack."
(calc-push (string-to-number tok)))
(defun calc-invalid-tok (tok)
(error (concat "Invalid token: " tok))
(defun calc-next-token ( )
"Pick up the next token, based on regexp search.
As side effects, advance point one past the token,
and set name of function to use to process the token."
(let (tok)
(cond ((looking-at calc-number-regexp)
(goto-char (match-end 0))
(setq calc-proc-fun 'calc-push-number))
((looking-at calc-operator-regexp)
(forward-char 1)
(setq calc-proc-fun 'calc-operate))
((looking-at calc-command-regexp)
(forward-char 1)
(setq calc-proc-fun 'calc-command))
((looking-at ".")
(forward-char 1)
(setq calc-proc-fun 'calc-invalid-tok)))
;; pick up token and advance past it (and past whitespace)
(setq tok (buffer-substring (match-beginning 0) (point)))
(if (looking-at calc-whitespace)
(goto-char (match-end 0)))
tok))
(defun calc-eval ( )
"Main evaluation function for calculator mode.
Process all tokens on an input line."
(interactive)
(beginning-of-line)
(while (not (eolp))
(let ((tok (calc-next-token)))
(funcall calc-proc-fun tok)))
(insert "\n"))
(defun calc-mode ( )
"Calculator mode, using H-P style postfix notation.
Understands the arithmetic operators +, -, *, / and %,
plus the following commands:
c clear stack
= print top of stack
p print entire stack contents (top to bottom)
Linefeed (C-j) is bound to an evaluation function that
will evaluate everything on the current line. No
whitespace is necessary, except to separate numbers."
(interactive)
(pop-to-buffer "*Calc*" nil)
(kill-all-local-variables)
(make-local-variable 'calc-stack)
(setq calc-stack nil)
(make-local-variable 'calc-proc-fun)
(setq major-mode 'calc-mode)
(setq mode-name "Calculator")
(use-local-map calc-mode-map))
The following are some possible extensions to the calculator mode, offered as exercises. If you try them, you will increase your understanding of the mode's code and Emacs Lisp programming in general.
• Add an operator
^
for "power" (4 5 ^
evaluates to 1024
). There is no built-in power function in Emacs Lisp, but you can use the built-in function expt
.
• Add support for octal (base 8) and/or hexadecimal (base 16) numbers. An octal number has a leading "0," and a hexadecimal has a leading "0x"; thus, 017 equals decimal 15, and 0x17 equals decimal 23.
• Add operators
\+
and \*
to add/multiply all of the numbers on the stack, not just the top two (e.g., 4 5 6 \+
evaluates to 15
, and 4 5 6 \*
evaluates to 120
).[82]
• As an additional test of your knowledge of list handling in Lisp, complete the example (Example 5) from earlier in this chapter that searches compilation-error-regexp-alist for a match to a compiler error message. (Hint: make a copy of the list, then pick off the top element repeatedly until either a match is found or the list is exhausted.)
Now that you understand some of what goes into programming a major mode, you may decide you want to customize an existing one. Luckily, in most cases, you don't have to worry about changing any mode's existing Lisp code to do this; you may not even have to look at the code. All Emacs major modes have "hooks" for letting you add your own code to them. Appropriately, these are called mode-hooks. Every built-in major mode in Emacs has a mode hook called
-hook, where mode-name
is the name of the mode or the function that invokes it. For example, C mode has c-mode-hook, shell mode has shell-mode-hook, etc.mode-name
What exactly is a hook? It is a variable whose value is some Lisp code to run when the mode is invoked. When you invoke a mode, you run a Lisp function that typically does many things (e.g., sets up key bindings for special commands, creates buffers and local variables, etc.); the last thing a mode-invoking function usually does is run the mode's hook if it exists. Thus, hooks are "positioned" to give you a chance to override anything the mode's code may have set up. For example, any key bindings you define override the mode's default bindings.
We saw earlier that Lisp code can be used as the value of a Lisp variable; this use comes in handy when you create hooks. Before we show you exactly how to create a hook, we need to introduce yet another Lisp primitive function: lambda. lambda is very much like defun in that it is used to define functions; the difference is that lambda defines functions that don't have names (or, in Lisp parlance, "anonymous functions"). The format of lambda is:
(lambda (args)
code)
where
are arguments to the function and code is the body of the function. To assign a lambda function as the value of a variable, you need to "quote" it to prevent it from being evaluated (run). That is, you use the form:args
(setq var-name
'(lambda ( )
code))
Therefore, to create code for a mode hook, you could use the form:
(setq mode-name-hook
'(lambda ( )
code for mode hook))
However, it's quite possible that the mode you want to customize already has hooks defined. If you use the
setq
form, you override whatever hooks already exist. To avoid this, you can use the function add-hook
instead:
(add-hook 'mode-name-hook
'(lambda ( )
code for mode hook))
The most common thing done with mode hooks is to change one or more of the key bindings for a mode's special commands. Here is an example: in Chapter 7 we saw that picture mode is a useful tool for creating simple line drawings. Several commands in picture mode set the default drawing direction. The command to set the direction to "down," picture-movement-down, is bound to C-c . (C-c followed by a period). This is not as mnemonic a binding as C-c < for picture-movement-left or C-c ^ for picture-movement-up, so let's say you want to make C-c v the binding for picture-movement-down instead. The keymap for picture mode is, not surprisingly, called picture-mode-map, so the code you need to set this key binding is this:
(define-key picture-mode-map "\C-cv" 'picture-movement-down)
The hook for picture mode is called edit-picture-hook (because edit-picture is the command that invokes picture mode). So, to put this code into the hook for picture mode, the following should go into your .emacs file:
(add-hook 'edit-picture-hook
'(lambda ( )
(define-key picture-mode-map "\C-cv" 'picture-movement-down)))
This instruction creates a lambda function with the one key binding command as its body. Then, whenever you enter picture mode (starting with the next time you invoke Emacs), this binding will be in effect.
As a slightly more complex example, let's say you create a lot of HTML pages. You use HTML mode (see Chapter 8), but you find that there are no Emacs commands that enter standard
head
and title
tags, despite the fact that the help text reminds you of their importance. You want to write your own functions to insert these strings, and you want to bind them to keystrokes in HTML mode.
To do this, you first need to write the functions that insert the tag strings. The simplest approach would just be to insert the text:
(defun html-head ( )
(interactive)
(insert ""))
(defun html-title( )
(interactive)
(insert " "))
Remember that the calls to
(interactive)
are necessary so that Emacs can use these functions as user commands.
The next step is to write code that binds these functions to keystrokes in HTML mode's keymap, which is called html-mode-map, using the techniques described in Chapter 10. Assume you want to bind these functions to C-c C-h (head) and C-c C-t (title). C-c is used as a prefix key in many Emacs modes, such as the language modes we saw in the last chapter. Again, this is no problem:
(define-key html-mode-map"\C-c\C-h" 'html-head)
(define-key html-mode-map"\C-c\C-t" 'html-title))
Finally, you need to convert these lines of Lisp into a value for html-mode-hook. Here is the code to do this:
(add-hook 'html-mode-hook
'(lambda ( )
(define-key html-mode-map"\C-c\C-h" 'html-head)
(define-key html-mode-map"\C-c\C-t" 'html-title)))
If you put this code in your .emacs file, together with the earlier function definitions, you get the desired functionality whenever you use HTML mode.
If you try using these functions, though, you'll find they have some noticeable drawbacks compared to the other tag insertion commands in HTML mode. For one thing, while the other helper commands leave your cursor in between the opening and closing tags, our insertions leave the cursor after the closing tag, which is not only inconsistent, but it's much less helpful. Also, while the other tags you insert can be customized in terms of your preferred capitalization, or wrapped around existing content in the document, our simple-minded insert calls give us no such capabilities.
Luckily, it's not hard to add the smarts we want. It turns out that HTML mode is defined in the file sgml-mode.el (we learned this by applying help's handy describe-function command, C-h f, to the mode-defining function HTML mode. Armed with this knowledge, it was an easy matter to pull up and study the Lisp code that makes it work using the find-library-file utility shown in "A Treasure Trove of Examples" earlier in this chapter. A little quick hunting to find a parallel example revealed that the tag support is implemented using a skeletal function generator. Without going into too much detail, it turns out that the code we want to use is this:
(define-skeleton html-head
"HTML document header section."
nil
"" _ "")
(define-skeleton html-title
"HTML document title."
nil
"" _ " ")
The define-skeleton function sets up the skeletal HTML code to be inserted, and it does this by writing a Lisp function based on the template you pass it. Its first argument is the name of the Lisp function to define, and the next is a documentation string for that function explaining what it inserts. After that comes an optional prompt that can be used to customize the content to be inserted. We don't need any customization, so we leave it as
nil
to skip the prompt. Finally comes the list of strings to be inserted, and we mark where we want the cursor to end up with "_
". (To learn more about the way this skeleton system works, invoke describe-function on insert-skeleton.)
With these changes, our new commands work just like the other insertion tools in HTML mode. Even more than the specific Lisp code that came out of this example, the technique we used to create it is worth learning. If you can develop the skills and habits involved in tracking down an example from the built-in libraries that is close to what you want, and digging into how it works just enough to come up with a variant that solves your problem, you'll be well on your way to becoming the friendly Emacs Lisp guru your friends rely on when they need a cool new trick.
Here is a third example. Let's say you program in C, and you want a Lisp function that counts the number of C function definitions in a file. The following function does the trick; it is somewhat similar to the count-lines-buffer example earlier in the chapter. The function goes through the current buffer looking for (and counting) C function definitions by searching for
{
at the beginning of a line (admittedly, this simplistic approach assumes a particular and rigid C coding style):
(defun count-functions-buffer ( )
"Count the number of C function definitions in the buffer."
(interactive)
(save-excursion
(goto-char (point-min))
(let ((count 0))
(while (re-search-forward "^{" nil t)
(setq count (1+ count)))
(message "%d functions defined." count))))
The re-search-forward call in this function has two extra arguments; the third (last) of these means "if not found, just return
nil
, don't signal an error." The second argument must be set to nil
, its default, so that the third argument can be supplied.[83]
Now assume we want to bind this function to C-c f in C mode. Here is how we would set the value of c-mode-hook:
(add-hook 'c-mode-hook
'(lambda ( )
(define-key c-mode-map "\C-cf" 'count-functions-buffer)))
Put this code and the function definition given earlier in your .emacs file, and this functionality will be available to you in C mode.
As a final example of mode hooks, we'll make good on a promise from the previous chapter. When discussing C++ mode, we noted that the commands c-forward-into-nomenclature and c-backward-into-nomenclature are included as alternatives to forward-word and backward-word that treat WordsLikeThis as three words instead of one, and that this feature is useful for C++ programmers. The question is how to make the keystrokes that normally invoke forward-word and backward-word invoke the new commands instead.
At first, you might think the answer is simply to create a hook for C++ mode that rebinds M-f and M-b, the default bindings for forward-word and backward-word, to the new commands, like this:
(add-hook
'c++-mode-hook
'(lambda ( )
(define-key c++-mode-map "\ef"
'c-forward-into-nomenclature)
(define-key c++-mode-map "\eb"
'c-backward-into-nomenclature)))
(Notice that we are using c++-mode-map, the local keymap for C++ mode, for our key bindings.) But what if those keys have already been rebound, or what if forward-word and backward-word are also bound to other keystroke sequences (which they usually are anyway)? We need a way to find out what keystrokes are bound to these functions, so that we can reset all of them to the new functions.
Luckily, an obscure function gives us this information, where-is-internal. This function implements the "guts" of the where-is help command, which we will see in Chapter 14. where-is-internal returns a list of keystroke atoms that are bound to the function given as an argument. We can use this list in a while loop to do all of the rebinding necessary. Here is the code:
(add-hook 'c++-mode-hook
'(lambda ( )
(let ((fbinds (where-is-internal 'forward-word))
(bbinds (where-is-internal 'backward-word)))
(while fbinds
(define-key c++-mode-map (car fbinds)
'c-forward-into-nomenclature)
(setq fbinds
(cdr fbinds)))
(while bbinds
(define-key c++-mode-map (car bbinds)
'c-backward-into-nomenclature)
(setq bbinds (cdr bbinds))))))
The two lines in the top of the let statement get all of the key bindings of the commands forward-word and backward-word into the local variables fbinds and bbinds, respectively.
After that, there are two while loops that work like the print-stack function of the calculator mode shown earlier in this chapter. This use of while is a very common Lisp programming construct: it iterates through the elements of a list by taking the first element (the car), using it in some way, and deleting it from the list (
(setq list (cdr list)
). The loop finishes when the list becomes empty (nil
), causing the while test to fail.
In this case, the first while loop takes each of the bindings that where-is-internal found for forward-word and creates a binding in C++ mode's local keymap, c++-mode-map, for the new command c-forward-into-nomenclature. The second while loop does the same for backward-word and c-backward-into-nomenclature.
The surrounding code installs these loops as a hook to C++ mode, so that the rebinding takes place only when C++ mode is invoked and is active only in buffers that are in that mode.
One final word about hooks: you may have noticed that some of the mode customizations we have shown in previous chapters include hooks and others do not. For example, the code in the previous chapter to set your preferred C or C++ indentation style included a hook:
(add-hook 'c-mode-hook
'(lambda ( )
(c-set-style "stylename")
(c-toggle-auto-state)))
whereas the code that sets an alternative C preprocessor command name for the c-macro-expand command did not:
(setq c-macro-preprocessor "/usr/local/lib/cpp -C")
Why is this? Actually, the correct way to customize any mode is through its hook—for example, the preceding example should really be:
(add-hook 'c-mode-hook
'(lambda ( )
(setq c-macro-preprocessor "/usr/local/lib/cpp -C")))
If you merely want to set values of variables, you can get away without a hook, but a hook is strictly required if you want to run functions like c-set-style or those used to bind keystrokes. The precise reason for this dichotomy takes us into the murky depths of Lisp language design, but it's essentially as follows.
Variables that are local to modes, like c-macro-preprocessor, do not exist if you don't invoke the mode in which they are defined. So, if you aren't editing C or C++ code, then c-macro-preprocessor doesn't exist in your running Emacs, because you haven't loaded C mode (see below). Yet if your .emacs file contains a setq to set this variable's value, then you call the variable into existence whether or not you ever use C mode. Emacs can deal with this: when it loads C mode, it notices that you have already set the variable's value and does not override it.
However, the situation is different for functions. If you put a call to a mode-local function like c-set-style in your .emacs file, then (in most cases) Emacs complains, with the message
Error in init file
, because it does not know about this function and thus cannot assume anything about what it does. Therefore you must attach this function to a hook for C mode: by the time Emacs runs your hook, it has already loaded the mode and therefore knows what the function does.
These examples of hooks are only the briefest indication of how far you can go in customizing Emacs's major modes. The best part is that, with hooks, you can do an incredible amount of customization without touching the code that implements the modes. In exchange, you should remember, when you do write your own modes, to think about useful places to put hooks so others can take advantage of them.
After you have become proficient at Emacs Lisp programming, you will want a library of Lisp functions and packages that you can call up from Emacs at will. Of course, you can define a few small functions in your .emacs file, but if you are writing bigger pieces of code for more specialized purposes, you will not want to clutter up your .emacs file—nor will you want Emacs to spend all that time evaluating the code each time you start it up. The answer is to build your own Lisp library, analogous to the Lisp directories that come with Emacs and contain all of its built-in Lisp code. After you have created a library, you can load whatever Lisp packages you need at a given time and not bother with the others.
Creating a library requires two simple steps. First, create a directory in which your Lisp code will reside. Most people create a elisp subdirectory of their home directory. Lisp files are expected to have names ending in .el (your .emacs file is an exception). The second step is to make your directory known to Emacs so that when you try to load a Lisp package, Emacs knows where to find it. Emacs keeps track of such directories in the global variable load-path, which is a list of strings that are directory names.
The initial value for load-path is populated with the names of the Lisp directories that come with Emacs, e.g., /usr/local/emacs/lisp. You will need to add the name of your own Lisp directory to load-path. One way to make this addition is to use the Lisp function append, which concatenates any number of list arguments together. For example, if your Lisp directory is ~
(setq load-path (append load-path (list "~yourname/lisp")))
The function list is necessary because all of the arguments to append must be lists. This line of code must precede any commands in your .emacs file that load packages from your Lisp directory.
When you load a library, Emacs searches directories in the order in which they appear in load-path; therefore, in this case, Emacs searches its default Lisp directory first. If you want your directory to be searched first, you should use the cons function described earlier instead of append, as follows:
(setq load-path (cons "~yourname/lisp" load-path))
This form is useful if you want to replace one of the standard Emacs packages with one of your own. For example, you'd use this form if you've written your own version of C mode and want to use it instead of the standard package. Notice that the directory name here is not surrounded by a call to list because cons's first argument can be an atom (a string in this case). This situation is similar to the use of cons for pushing values onto stacks, as in the calculator mode described earlier.
If you want Emacs to search the directory you happen to be in at any given time, simply add
nil
to load-path, either by prepending it via cons or by appending it via append. Taking this step is analogous to putting .
in your Unix PATH environment variable.
After you have created a private Lisp library and told Emacs where to find it, you're ready to load and use the Lisp packages that you've created. There are several ways of loading Lisp packages into Emacs. The first of these should be familiar from Chapter 10:
• Type M-x load-library Enter as a user command; see Chapter 10.
• Put the line
(load "package-name")
within Lisp code. Putting a line like this into your .emacs file makes Emacs load the package whenever you start it.
• Invoke Emacs with the command-line option
. This action loads the package "-l package-name"
.package-name
• Put the line
(autoload 'function "filename")
within Lisp code (typically in your .emacs file), as described in Chapter 10. This action causes Emacs to load the package when you execute the given function
.[84]
After you have created your Lisp directory, you can make loading and running your Lisp files more efficient by byte-compiling them, or translating their code into byte code, a more compact, machine-readable form. Byte-compiling the Lisp file filename.el creates the byte code file filename.elc. Byte code files are typically 40 to 75 percent of the size of their non-byte-compiled counterparts.
Although byte-compiled files are more efficient, they are not strictly necessary. The load-library command, when given the argument
, first looks for a file called filename
You can byte-compile a single function in a buffer of Lisp code by placing your cursor anywhere in the function and typing M-x compile-defun. You can byte-compile an entire file of Lisp by invoking M-x byte-compile-file Enter and supplying the filename. If you omit the .el suffix, Emacs appends it and asks for confirmation. If you have changed the file but have not saved it, Emacs offers to save it first.
Then you will see an entertaining little display in the minibuffer as the byte-compiler does its work: the names of functions being compiled flash by. The byte-compiler creates a file with the same name as the original Lisp file but with
c
appended; thus, Finally, if you develop a directory with several Lisp files, and you make changes to some of them, you can use the byte-recompile-directory command to recompile only those Lisp files that have been changed since being byte-compiled (analogously to the Unix make utility). Just type M-x byte-recompile-directory Enter and supply the name of the Lisp directory or just press Enter for the default, which is the current directory.