TM: a real-time conservative allocator.

0.4

Introduction

TM is a real-time, non-relocating, conservative, allocator using type- and color-segregated lists, inspired by Henry Baker’s "Treadmill" paper.

The source code for TM is located at http://kurtstephens.com/pub/tredmill.

TM interleaves marking, scanning and sweeping during each call to tm_alloc().

TM attempts to limit the amount of scanning, marking and sweeping during each call to tm_alloc() to avoid "stopping the world" for long periods of time.

TM supports type segregation to increase locality of objects of the same type and to reduce internal fragmentation.

The space overhead of each allocated object requires an object header (tm_node) of two address pointers.

More Information

See:

http://kurtstephens.com/research/tredmill
The Treadmill: Real-Time Garbage Collection Without Motion Sickness, Henry G. Baker, http://home.pipeline.com/~hbaker1/NoMotionGC.html

To Do

Support for 64-bit address spaces.
Implement allocations larger than tm_block_SIZE.
Implement aligned allocations using a page-indexed bit vector.
- Flag a bit for the beginning page of the aligned block.
- Use a hash table to map the address of the aligned block to a tm_block containing its own tm_type.

An allocation unit is a tm_node. Each tm_node is allocated from a block of system memory (tm_block) sized and aligned to a multiple of the operating system’s page size. Each tm_block is dedicated to a particular type (tm_type). Each tm_type represents a tm_node allocation size, thus each tm_block is uniformly parceled into tm_nodes of the same size. In general, each power-of-two size has a tm_type. However, tm_types can be explicitly created for any size.

Node Coloring

Each tm_node has a “color” tag describing its current allocation state:

tm_WHITE: free, unused.
tm_ECRU: allocated, potentially unused.
tm_GREY: allocated, marked used, not scanned for pointers.
tm_BLACK: allocated, marked used, scanned for pointers.

Each tm_type has it's own tm_WHITE, tm_ECRU, tm_GREY, and tm_BLACK doubly-linked lists. Each tm_node is a member of one, and only one, of its type’s color lists. To keep the tm_node header small, the node's color is encoded in the lower two bits of the tm_node._prev pointer.

When a tm_node's color changes, it is moved to a different colored list in its tm_type for processing in a different allocation phase.

Node types can be explicitly created for common sizes that are not a power of two, to reduce internal memory fragmentation.

Allocation Phases

The allocator interleaves the following phases with allocation:

tm_ALLOC: Allocate from OS or parcel nodes from allocated blocks. (tm_WHITE -> tm_ECRU)
tm_UNMARK: Unmarking scanned nodes for subsequent rooting. (tm_BLACK -> tm_ECRU)
tm_ROOT: Root scanning and marking nodes for scanning. (tm_ECRU -> tm_GREY)
tm_SCAN: Scanning marked nodes for interior pointers. (tm_GREY -> tm_BLACK)
tm_SWEEP: Freeing unmarked nodes. (tm_ECRU -> tm_WHITE)

During each call to tm_alloc(), each phase will do some collection work before returning a newly allocated tm_node:

tm_ALLOC: Parcel nodes from allocated blocks. Memory pressure may cause switching to the tm_UNMARK phase.
tm_UNMARK: Allocate tm_nodes from the tm_type’s tm_WHITE list until it is empty. A fixed number of tm_BLACK nodes are returned to the tm_ECRU list. Memory pressure may cause switching to the tm_ROOT phase.
tm_ROOT: A fixed number of roots are scanned for pointers into allocated space. tm_ECRU nodes with valid references are moved to tm_GREY. After all roots are scanned, the next phase is tm_SCAN.
tm_SCAN: A fixed number of node bytes of tm_GREY nodes are scanned for interior pointers and moved to tm_BLACK list. Once all tm_GREY nodes are scanned, the next phase is tm_SWEEP.
tm_SWEEP: If there are no more nodes to be scanned, sweep a fixed number of tm_ECRU nodes to the tm_WHITE list. Once all tm_ECRU nodes are sweeped, the collector returns to the tm_UNMARK phase.

Node States

As nodes are parceled, allocated, marked, scanned, sweeped and unmarked they are moved between the colored lists as follows:

Node States and Transitions

Each tm_type maintains a separate list for each tm_node color. Statistics of the colors of all tm_nodes are kept at the tm_type, tm_block and global level in arrays indexed by color. A total count of all nodes at each level is kept at the offset of tm_N.

In general, newly allocated tm_nodes are taken from the tm_type’s tm_WHITE list and placed on its tm_ECRU list. If the type’s tm_WHITE list is empty, a tm_block is allocated from the operating system and is scheduled for parceling new tm_WHITE nodes for the type. If the allocation tm_block becomes empty, the process is repeated: another tm_block is allocated and parceled.

A limited number of new free nodes are parceled from the type’s current allocation block as needed and are initialized as new tm_WHITE nodes.

New tm_blocks may be requested from the operating system during all phases, if the type’s tm_WHITE list or allocation block is empty. The reasoning is the operating system should be able to allocate a new allocation block faster than a collection that would need to completely “stop the world”.

All phases, except the tm_SWEEP phase, allocate nodes by moving them from the tm_WHITE list to the tm_ECRU list. The tm_SWEEP phase allocates nodes from tm_WHITE to tm_GREY, because tm_ECRU nodes are considered to be unmarked garbage during the tm_SWEEP phase and tm_BLACK nodes are considered “in-use’ -- tm_GREY nodes are not sweeped during the tm_SWEEP phase. Using tm_GREY during tm_SWEEP is also related to tm_BLACK node mutation, which gives rise to requiring a write barrier on tm_BLACK nodes.

The tm_SWEEP phase will always attempt to scan any tm_GREY nodes before continuing to sweep any tm_ECRU nodes.

The tm_BLACK color might seem a better choice for tm_SWEEP allocations, but this would force young nodes, which are more likely to be garbage, to be kept until the next tm_SWEEP phase. Coloring tm_SWEEP allocations tm_BLACK would also prevent any new interior pointers stored in nodes that may reference tm_ECRU nodes from being scanned before tm_SWEEP is complete.

If once tm_SWEEP is complete, tm_blocks with no active tm_nodes (i.e. b->n[tm_WHITE] == b->n[tm_TOTAL]) have all their tm_nodes unparcelled and the tm_block is returned to the operating system.

The tm_SCAN phase may cause a full GC if its determined that the last remaining node to scan is being mutated. The tm_SCAN phase may also cause all tm_GREY nodes to be scanned during tm_malloc(), if memory pressure is high. The tm_SCAN phase always scans all roots if there are no tm_GREY nodes left, before continuing to the tm_SWEEP phase.

To prevent thrashing the operating system with tm_block allocation and free requests, a limited number of unused blocks are kept on a global tm_block free list.

Aligned Blocks

Aligned blocks allow determining if generic word is a pointer to a heap-allocated tm_node to be computed with simple pointer arithmetic and a bit vector check.

To determine a pointer’s allocation:

Checking a bit vector indexed by the pointer’s page number. The bit is set when tm_nodes are parceled and cleared when the entire tm_block is unused and returned to the free block list or operating system.
Mask off the insignificant page bits to construct an aligned tm_block address.
Determine the size of tm_nodes in the tm_block from the block’s tm_type.
Determine if the pointer resides in the data portion of the tm_node by considering the tm_node and tm_block linked-list headers to be “holes” in the address space.

Type Segregation

Nodes are segregated by type. Each type has a size. By default, type sizes are rounded up to the next power of two. A specific allocation type can be requested with tm_adesc_for_size(). The allocation descriptor can then be used by tm_alloc_desc(). The opaque element can be used to store additional mutator data.

Each node type has its own colored lists, allocated block lists and accounting. Segregating node types allows allocation requests to be done without scanning tm_WHITE nodes for best fit. However, since types and the blocks are segregated, and nodes of a larger size are not scavenged for smaller sise, this could least to poor actual memory utilization in mutators with small numbers of allocations for many sizes, since a single node allocation for a given size will cause at least one block to requested from the operating system.

The color lists are logically combined from all types for iteration using nested type and node iterators.

Data Structures

There is a single, statically allocated tm_data object.
A tm_list is used as a doubly-linked list header. It also encodes color in the lowest bits of its prev pointer.
The tm_data object has a tm_list of tm_type objects.
Each tm_type has a tm_list of tm_blocks.
Each tm_block is associated with a tm_type.
Each tm_type has 4 color tm_list objects containing tm_node objects of the same color.
Each tm_block has been parceled into tm_node objects sized by its tm_type.
Each tm_data, tm_type and tm_block object has counters for each tm_node color.

Example tm_data structure

The above example has a single tm_type of size 16 with two tm_block objects, and 8 tm_node objects parceled the tm_block objects. White tm_node objects are rendered here with a dotted style.

Write Barriers

During the tm_SCAN and tm_SWEEP phase, any tm_BLACK node that is mutated must be rescanned due to the possible introduction of new references from the tm_BLACK node to tm_ECRU nodes. This is achieved by calling tm_write_barrier(&R) in the mutator after modifying R’s contents.

There are three versions of the write barrier:

tm_write_barrier_pure(R) can be called when R is guaranteed to be a pointer to the head of a node allocated by tm_alloc(). tm_write_barrier_pure(R) cannot be used if R might be an interior pointer or a pointer to a stack or root-allocated object. tm_write_barrier(R) should be used if the address of R is not known to be a pointer to the heap, the stack or global roots. tm_write_barrier_root(R) should be used if the address of R is on the stack or global roots. If R is a pointing into global roots, tm_write_barrier(R) will cause global root rescanning, if the collector is in the tm_SCAN phase.

Stack writes are not barriered, because stack scanning occurs atomically at the end of tm_ROOT.

Unfriendly Mutators

When entering code where the write barrier protocol is not followed, tm_disable_write_barrier() can be called to put the collector into a “stop-world” collection mode until tm_enable_write_barrier() is called.

A virtual memory write-barrier based on mprotect() might be easier to manage than requiring the mutator to call the write barrier. Recoloring and marking root set pages can be done in hardware assuming the overhead of mprotect() and the SIGSEGV signal handler is low when changing phases and colors.

Issues

Due to the altering of tm_node headers during all allocation phases, forked processes will mutate pages quickly.
TM is not currently thread-safe.
TM is not currently operational on 64-bit archtectures due to statically allocation block bit maps.
TM does not currently support allocations larger than a tm_block. This will be fixed by using another page-indexed bit vector. A “block-header-in-page” bit vector marks the page of each tm_block header. This bit vector will be scanned backwards to locate the first page that contains the allocation’s block header.
TM does not currently support requests for page-aligned allocations. This could be achieved by using a hash table to map page-aligned allocations to its tm_block.

References

The Treadmill: Real-Time Garbage Collection Without Motion Sickness, Henry G. Baker, http://home.pipeline.com/~hbaker1/NoMotionGC.html