Optimizations in the Symbolics CLOS Implementation D. Scott Cyphers David A. Moon Symbolics, inc. (c) Copyright 1990, Symbolics inc. INTRODUCTION Symbolics CLOS is a fast implementation of CLOS that runs on three different hardware architectures: the Symbolics 3600 (used in older Genera platforms), the Symbolics Ivory (used in newer Genera platforms), and the Intel 80386 (used by Symbolics Cloe running under MS/DOS and UNIX). Most of the source code for Symbolics CLOS is shared between all three implementations, and most of the optimizations to be discussed here are machine-independent in concept. Obviously the exact instruction sequences used to implement an optimization vary between machines. OBJECT REPRESENTATION Each Lisp object reference is implemented as a machine word (32 bits in CLOE, 34 on the 3600, and 38 on Ivory). A few bits in the object reference form an internal data type field, and the rest of the bits are either immediate data or a storage address. Two object references are EQ when all corresponding bits are equal. To move the storage for an object requires atomically updating the address field in all references to the object; the garbage collector knows how to do this. The internal data type field provides coarse type discrimination, which is sufficient for many frequently-used operations. Determining the exact type of most objects requires at least one memory read to get additional information from the storage associated with the object. The INSTANCE internal data type is used for objects of metaclass STANDARD-CLASS or STRUCTURE-CLASS. The object reference addresses a header that addresses two objects, the instance information and the slot vector. The instance information is a snapshot of information about the object's class. All the non-obsolete instances of a class share the same instance information object. The slot vector provides the storage for the local slots; each object has its own slot vector. The header and slot vector are usually adjacent in memory, but if an object's class is changed the slot vector can move away from the header until the next garbage collection. The CLOSURE internal data type is used for a kind of function consisting of executable code and some storage, called its environment. When a closure is called, the address of the environment is placed in a special location (which we will refer to as the extra argument) and control passes to the executable code. A special kind of closure is a funcallable instance, which has instance information and a slot vector in its environment. Funcallable instances are used to implement generic functions. The executable code in a generic function closure is a "trampoline" function that dispatches to the appropriate method(s). The remaining Lisp objects, of metaclass BUILT-IN-CLASS, use internal data types other than INSTANCE. These objects do not have slot vectors, but they do have instance information. Getting the instance information is an important part of dispatching and must be fast. It is done by a table lookup using the internal data type, plus some ad-hoc code. SLOT-VALUE Reading and writing the value of a slot is an important operation in CLOE. SLOT-VALUE and related functions take an object and the name of a slot, translate the name into a memory location using tables associated with the object's class, and access the memory location. SLOT-VALUE also has to check whether the slot is unbound and whether the class has been redefined since the object was last accessed. This is a complex, potentially slow operation. If the exact class of the object and the storage layout of objects of that class were known at compile time, the position of the slot relative to the beginning of the object's slot vector would be known and SLOT-VALUE could be compiled as a simple base plus displacement memory reference. However, in Common Lisp, we never know the exact class of an object: If the object is declared to be of class C, then its exact class can be C or a subclass of C. Because of multiple inheritance, the position of a slot in an object's slot vector can be different in different subclasses. For example, if class A has slots X, Y, and Z, class B has slots J, K, and L, and class C inherits from A and B, slots X and J of class C cannot both be in the first position in the slot vector. Even relative slot positions can be different in different subclasses. For example, if class D has slots X, F, and G, and class E inherits from A and D, the slot following slot X cannot be both Y (as in A) and F (as in D). Furthermore, because CLOS allows classes to be redefined, the storage layout of objects of a given class cannot be known at compile time, since the class might be redefined later in a way that changes the storage layout. It would be possible to define a subset of CLOS with restrictions to permit SLOT-VALUE to be compiled as a simple base+displacement memory reference, but a full CLOS implementation cannot do this (note that DEFSTRUCT does it). We implement two optimizations for SLOT-VALUE. In a method, when the object is a specialized argument and the slot name is constant (this is the usual case), SLOT-VALUE is compiled inline. At compile time, (argument-number, slot-name) pairs are assigned indices. The indices index a mapping table, which is a vector passed to the method function through the extra argument. The mapping table contains entries of the following form: Fixnum: The position of the slot in the object's slot vector. Pointer: A pointer to the cell for a shared slot. In Genera, a locative is used, in CLOE, a CONS. Function: The function will be called with information about the type of slot operation being performed, the index, the slot name, and the instance. This is the general escape mechanism. :UPDATE: The mapping table is obsolete. The arguments will be updated, a new mapping table will be obtained, and the operations will be retried. The fixnum case is the most common and is handled in one instruction (sometimes three instructions in a multimethod) on the 3600 and Ivory, doing two to four memory references. The mapping table and the slot vector are always referenced. The object header and the mapping table's length are referenced the first time a method accesses any slot, but are then cached. The fixnum case takes about a dozen instructions (mostly fast ones) on the 80386 and does three memory references, to the mapping table, the object header, and the slot vector. When a non-fixnum mapping table entry is encountered, a helper function is called, so those cases are slower. The notinline form of SLOT-VALUE is also optimized. It first obtains the object's instance information and updates the object if the class has been redefined. Next it looks up the slot name in a table in the instance information, using a fast search. The table indicates a local slot, or the need to call SLOT-VALUE-USING-CLASS. CLOS encourages programmers to access slots by calling accessor generic functions, rather than calling SLOT-VALUE directly. This gives the benefits of increased abstraction, but can degrade performance since a generic function dispatch is required for each slot access. The vast majority of accessor generic functions have methods that do nothing but call SLOT-VALUE. A useful optimization (which we do not yet do) is to recognize a call to an accessor generic function and compile it as if it were a call to SLOT-VALUE, using the mapping table. If the unusual case where the accessor generic function has a nontrivial method arises, perhaps due to a method added at run time, the mapping table entry contains a function instead of a fixnum, exploiting type-checking to cause a trap that invokes the function. This makes the unusual case slower but makes the usual case faster, eliminating any speed penalty at all for using the more abstract construct. CLASS AND METHOD REPRESENTATION Classes and methods are objects of metaclass STANDARD-CLASS. One of the slots of a method is the method function, which takes an extra argument in addition to the arguments which appear in the DEFMETHOD. Each method contains information about the form of extra argument expected by its method function. One of the slots of a class is its current instance information. CALL-NEXT-METHOD A call to CALL-NEXT-METHOD or NEXT-METHOD-P is converted to inline code which uses next method information in the extra argument. CALL-NEXT-METHOD calls the function in the first element of the next method information, passing it the rest of the next method information as its extra argument. If there is no next method or CALL-NEXT-METHOD is invalid for this method, the function in the next method information is a helper that calls NO-NEXT-METHOD or signals an error. NEXT-METHOD-P simply checks for a helper. CALLING GENERIC FUNCTIONS When a generic function is called, the set of applicable methods must be determined, sorted, and combined. This is a time consuming operation, so we cache the result and arrange to decache it when changes are made. The caching and decaching are carefully designed to permit redefining metaclasses. When a generic function is called, if the appropriate information is cached it dispatches to a "handler" function, passing the arguments of the generic function and an appropriate extra argument. Otherwise, it updates the cache as described in the next section. A generic function's trampoline function does the dispatch. This consists of a series of hash table lookups, each producing either a specification of the next dispatch step or the final handler and extra argument. If the dispatch cache is invalid, the handler is the function to update the cache. Each dispatch step is either a class dispatch step or an EQL dispatch step. In a class dispatch step, the hash table comes from the instance information of one of the arguments and the key indicates the generic function and the argument number. In an EQL dispatch step, the hash table is associated with the generic function and the key is one of the arguments. Ivory has the most efficient dispatch implementation. Each hash table entry is three words: the key, the address of the first instruction in the handler, and the extra argument. This organization was chosen because Ivory has two instructions, originally intended for Flavors, that employ hashing hardware to do a class or EQL dispatch step, placing the extra argument into the appropriate stack slot and jumping to the handler. If a dispatch step is not the last one, the handler performs the next dispatch using one of these instructions. The extra argument for a class dispatch step is the key. For an EQL dispatch step, it is a dummy object whose instance information contains the hash table. The 3600 and CLOE use a similar dispatch table, but instead of an instruction address there is either a handler function or a command to a dispatch table interpreter. The 3600 has microcode support for a class dispatch step on the first argument and has customized trampoline functions for other common dispatching patterns. Complicated dispatches are handled by a dispatch table interpreter trampoline function. In CLOE, the dispatch table interpreter is always used. The Ivory technique, in which the tables contain instruction addresses and the dispatches are jumps, could be used on the 80386 if we had a convenient assembler to generate the handlers. This technique is ruled out on the 3600 because of the 3600's complicated calling sequence. Unlike most other CLOS implementations, Symbolics CLOS dispatching associates the tables with classes, rather than generic functions. This is primarily to take advantage of the existing support for dispatching in Ivory and 3600. Other dispatching techniques could be used for particular generic functions simply by substituting a different trampoline function. The performance advantage of multiple dispatching techniques is not clear at this time. CREATING A HANDLER When generic function dispatch does not find a cached handler, it determines the set of applicable methods, sorts them, and makes an effective method form in about the way described in the CLOS specification. This form is then converted to an executable form by converting each occurrence of CALL-METHOD to a call to the method function with an appropriate extra argument, under the control of the method metaobject, which knows what kind of extra argument the method function expects. The extra argument might contain a mapping table, next method information, or both. This process also accumulates the extra argument that the handler will expect. Typically this is a vector of method functions and their extra arguments. When necessary, code for handling the :ARGUMENTS option to DEFINE-METHOD-COMBINATION and for checking keyword arguments is wrapped around the form. If the resulting form just calls one method, the handler is that method's method function. However if the method is an accessor, the handler is a helper function and the extra argument is a slot locator (an offset in the slot vector for a local slot, or a pointer to a shared slot). In the remaining cases, the handler is a function generated by attaching an appropriate lambda-list to the form. If an equivalent function has been generated before, it is reused, otherwise the function is compiled and saved for future reuse. Once the handler and extra argument have been determined, they are cached so that similar arguments invoke the same handler with the same extra argument. When adding an entry for a given generic function and arguments to the cache, the process starts with the set of all methods for the generic function. For each argument position, the set of methods is examined. If any method specializes that argument position, then a class dispatch and/or an EQL dispatch will be needed, with a dispatch table entry matching the argument given in that position. If these dispatches already exist, they are followed in the dispatch tables, otherwise they are created. The set of methods is then filtered so that only the methods applicable to arguments that match the dispatch remain. The process repeats until no more dispatches are required and the set of methods has been filtered down to the applicable methods for the given arguments. The remaining methods are combined into a handler and extra argument, which are stored in the hash table entry for the final dispatch step. CHANGE-CLASS, MAKE-INSTANCES-OBSOLETE, AND CLASS REDEFINITION CHANGE-CLASS modifies the header to refer to a new slot vector and the instance information of the new class. The header's address does not change, so there is no problem with EQ. CHANGE-CLASS also decaches the dispatches for all generic functions that have methods with EQL parameter specializers mentioning this object, since the route through a class dispatch to the EQL dispatch will be different. When a class is redefined or MAKE-INSTANCES-OBSOLETE is called, the class's current instance information, dispatch table, and mapping tables are marked obsolete and new ones are allocated. The old dispatch table is modified so that all dispatches trap to a function that updates obsolete instances. The old mapping tables are filled with the symbol :UPDATE, which will cause a trap to a function that updates obsolete instances and substitutes the new mapping table. An old mapping table can be used when a closure of a function internal to a method captures the mapping table, the class is redefined, and then the closure is called and accesses a slot. Marking the old instance information, dispatch table, and mapping tables obsolete ensures that the next operation on an obsolete instance of the class traps and updates the instance, changing the header to refer to a new slot vector and to the up to date instance information. As in CHANGE-CLASS, the header's address does not change, so there is no problem with EQ. Note that it is not always possible to use CHANGE-CLASS to change to a different metaclass. For example, a funcallable instance, with metaclass FUNCALLABLE-STANDARD-CLASS, cannot be changed to a standard object, with metaclass STANDARD-CLASS, because the internal data type in each reference to the object would have to change from CLOSURE to INSTANCE. The CLOS specification permits this restriction. MAKE-INSTANCE The example implementation of MAKE-INSTANCE given in the CLOS specification is highly interpretive, and therefore slow. The following operations can be eliminated or optimized: mapping a class name to a class object, receiving keyword arguments, checking for invalid keyword arguments, defaulting unsupplied initialization arguments, evaluating slot initforms, and initializing slots from initialization arguments and slot initforms. The necessary operations of allocating storage and calling user-supplied initialization methods can be streamlined. Mapping a constant class name to a constant class object is ordinary constant-folding. A call to MAKE-INSTANCE with a constant class and constant keywords is optimized into a call to an automatically generated constructor function that specializes in creating objects of that class with those initialization arguments. This eliminates class lookup and all keyword argument processing (positional arguments are used), and simplifies defaulting of initialization arguments since the supplied/unsupplied status of each initialization argument is fixed. Other calls to MAKE-INSTANCE still call the MAKE-INSTANCE generic function, but dispatch to an automatically generated constructor method that specializes in creating objects of that class. Keyword argument processing and defaulting are still necessary in this case. The constructor function or method uses inlining of initforms, inlining of system-defined method bodies, direct calls to user-defined method functions, loop unrolling, elimination of redundant slot-boundp tests, and minimized instruction sequences for storing into slots. Since the system-defined methods called by MAKE-INSTANCE are highly interpretive (they get the list of slot-descriptions, iterate over them, check if the slot has any initargs, etc.) the inlining is not done by analyzing the body of the real method, but instead by a specialized generator function for each method. The generator functions are similar to macros, but have access to the class definition and to the sets of initialization arguments that are known to be supplied, defaulted, or might be either supplied or defaulted. These optimizations eliminate all run-time references to the class and slot-definition metaobjects. The information contained in those objects has been converted into compiled code instructions. With these optimizations, MAKE-INSTANCE on Ivory is slightly faster than MAKE-ARRAY and almost as fast as MAKE-LIST, measured for two slots or elements. Some care is required to regenerate the automatically generated constructor functions and methods if the class is redefined, and to arrange to generate them at load-time rather than at run-time so that the first instantiation of each class is not slowed down by calling the constructor generator and the compiler.