Kurt StephensNerd Up! | ||
Ruby Internals: Why RUBY_FIXNUM_FLAG should be 0x00Kurt on Tue, 2008-01-01 04:06.
Type tags in Ruby VALUEInternally, values in Ruby are 32-bit (at least for 32-bit processors). Some of the least-significant bits are used to store type information. See the Ruby uses a single-bit tag of Since Ruby uses 31 bits to store the 3 2 1 10987654321098765432109876543210 bit index -------------------------------- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1 Fixnum 00000000000000000000000000000000 false 00000000000000000000000000000010 true 00000000000000000000000000000100 nil 00000000000000000000000000000110 undef xxxxxxxxxxxxxxxxxxxxxxxx00001110 Symbol xxxxxxxxxxxxxxxxxxxxxxxxxxxxx0x0 All other types A non-zero Imagine the following bit of Ruby code: x = 1 y = 3 puts x + y Internally, #define INT2FIX(X) (((X) << 1) | RUBY_FIXNUM_FLAG) #define FIX2INT(X) ((X) >> 1) Thus: INT2FIX(1) => 3 INT2FIX(3) => 7
To compute x + y => INT2FIX(FIX2INT(x) + FIX2INT(y)) => ((3 >> 1) + (7 >> 1)) << 1 | 1 => 9 FIX2INT(9) => (9 >> 1) => 4 If a type-tag of #define INT2FIX(X) ((X) << 1) #define FIX2INT(X) ((X) >> 1) The Ruby expression Multiplication with zero #define FIXMUL(X, Y)((X) >> 1) * (Y)) Fixnum division: the result of the division is shifted up: #define FIXDIV(X, Y) (((X) / (Y)) << 1) Two-bit type tags on 32-bit architecturesOaklisp and LL (http://kurtstephens.com/pub/ll/) use 2-bit tags for all values. LL uses the following tag scheme: 3 2 1 10987654321098765432109876543210 bit index -------------------------------- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxx00 <fixnum> pppppppppppppppppppppppppppppp01 <locative> seeeeeeeemmmmmmmmmmmmmmmmmmmmm10 <flonum> pppppppppppppppppppppppppppppp11 All other types Floating-point ( The rationale for choosing a fixed-size lower 2-bit type tag, opposed to a dynamic-length type tag, as in Ruby, or high-bit tags, like some older Lisp implementations, is as follows: C compilers and dynamic memory allocators will align allocations to If references to allocated objects are encoded using a #define RBASIC(X) ((struct RBasic*)((X) - 3)) Assuming that most of the time the interpreter is referencing structure members of the object, and does not need the actual address of the object:
struct RBasic {
VALUE flags; /* struct offset: + 0 */
VALUE klass; /* struct offset: + 4 */
};
RBASIC(X)->klass =>
((struct RBasic*)((X) - 3))->klass
C compilers convert the RBASIC(X)->flags => *(VALUE*)((X) - 3 + 0) RBASIC(X)->klass => *(VALUE*)((X) - 3 + 4) Using subtraction as the tag removal operation, instead of RBASIC(X)->flags => *(VALUE*)((X) - 3) RBASIC(X)->klass => *(VALUE*)((X) + 1) Therefore, there is no additional tag removal cost to reference structure members with non-zero offsets. One could reorder the members depending on which is “hotter”. Research shows that tag manipulation is a heavy cost, esp. for numerics; any tagging scheme should be as simple and consistent as possible. For example, determining the class of a VALUE could be inlined: #define CLASS_OF(v) ((v) & 3) ? RBASIC(v)->klass : rb_cFixnum) Two-bit tags naturally align with word boundaries on 32-bit processors. Thus, zero tags for integers on 32-bit processors allows pointer arithmetic on Thanks to Gary Wright for inspiring me to write about this. |
||
Recent comments
5 days 21 hours ago
3 weeks 1 day ago
3 weeks 1 day ago
6 weeks 3 days ago
6 weeks 3 days ago
8 weeks 2 days ago
8 weeks 5 days ago
8 weeks 5 days ago
8 weeks 5 days ago
27 weeks 11 hours ago