Embedding a properly tail-recursive, stack-based interpreter in C
Kurt on Wed, 2006-05-17 14:21.
Many Scheme interpreters implement proper tail-recursion by forcing everything to “happen” within a tightly looping byte-code VM. The VM loops on op-codes and manipulates a stack (or two).
Tail-calls typically reuse the current activation record with new arguments. Oaklisp, a pure OO Scheme interpreter, uses two stacks, one for values (call arguments and locals) and one for activation records, and a tightly tuned bytecode VM loop. Use of two stacks, allows more efficent control flow in tail-calls and continuations.
To embed a stack-based interpreter in C, calling out to C functions from within interpreted code and calling into interpreter functions from C need to be allowed without too much gnashing of teeth; we cannot force the developer or extender of the interpreter to put all activity in a byte-code loop.
Thus extending a properly tail-recursive bytecode VM with C call-ins and call-outs can be cumbersome.
However, we want to support a byte-code execution environment in our interpreter, otherwise it’s not much of an interpreter.
We want C code to execute in its enviroment and byte-code to execute in its environement while allowing the two execution enviroments to commnunicate efficiently and seamlessly as possible, because the interpreter will spend most of its time executing interpreted code and we want the core of the interpreter to be written and extendable in C
Since C (almost always) uses the same stack for activation records and values, we will have some problems getting proper tail-recursion out of C code (although there is some support in GCC for tail-recursion as long as there are no pointers to stack locals).
However we can do something about tail-calls occuring at the boundaries of C and interpreted code.
The example puts the tail-call loop at all call sites in the C code of the interpreter, thus allowing efficient tail-calls to and from C code, while still allowing the interpreter to be developed as a somewhat normal C program.
It directly supports the intention to iterate using tail-calls in the interpreter as an explicit iteration on activation records at the call sites in the interpreter’s implementation.
It allows different bytecode VMs to be embedded whereever and however the extender of the interpreter wants without subverting proper tail-calling.
The example is a simple interpreter that has only integer values and no interpreted function name lookups. The interpreted function names are hard-coded in C to keep this example (somewhat) short.
A more complete implementation of this technique is in an embeddable pure OO Scheme implementation, based on Oaklisp, named ll .