Kurt StephensNerd Up! | ||
Ruby Internals: Why RUBY_FIXNUM_FLAG should be 0x00Kurt on Tue, 2008-01-01 04:06.
Type tags in MRI Ruby VALUEInternally, values in MRI Ruby are 32-bit (at least for 32-bit processors). Some of the least-significant bits are used to store type information. See the Ruby uses a single-bit tag of Since Ruby uses 31 bits to store the 3 2 1 10987654321098765432109876543210 bit index -------------------------------- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1 Fixnum 00000000000000000000000000000000 false 00000000000000000000000000000010 true 00000000000000000000000000000100 nil 00000000000000000000000000000110 undef xxxxxxxxxxxxxxxxxxxxxxxx00001110 Symbol xxxxxxxxxxxxxxxxxxxxxxxxxxxxx0x0 All other types A non-zero Imagine the following bit of Ruby code: x = 1 y = 3 puts x + y Internally, #define INT2FIX(X) (((X) << 1) | RUBY_FIXNUM_FLAG) #define FIX2INT(X) ((X) >> 1) Thus: INT2FIX(1) => 3 INT2FIX(3) => 7
To compute x + y => INT2FIX(FIX2INT(x) + FIX2INT(y)) => ((3 >> 1) + (7 >> 1)) << 1 | 1 => 9 FIX2INT(9) => (9 >> 1) => 4 If a type-tag of #define INT2FIX(X) ((X) << 1) #define FIX2INT(X) ((X) >> 1) The Ruby expression Multiplication with zero #define FIXMUL(X, Y)((X) >> 1) * (Y)) Fixnum division: the result of the division is shifted up: #define FIXDIV(X, Y) (((X) / (Y)) << 1) Two-bit type tags on 32-bit architecturesOaklisp and LL (http://kurtstephens.com/pub/ll/) use 2-bit tags for all values. LL uses the following tag scheme: 3 2 1 10987654321098765432109876543210 bit index -------------------------------- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxx00 <fixnum> pppppppppppppppppppppppppppppp01 <locative> seeeeeeeemmmmmmmmmmmmmmmmmmmmm10 <flonum> pppppppppppppppppppppppppppppp11 All other types Floating-point ( The rationale for choosing a fixed-size lower 2-bit type tag, opposed to a dynamic-length type tag, as in Ruby, or high-bit tags, like some older Lisp implementations, is as follows: C compilers and dynamic memory allocators will align allocations to word boundaries for performance reasons, so there cannot not be a pointer to an object that would require some of the lower bits of a pointer, except for sub-word access, e.g.: If references to allocated objects are encoded using a #define RBASIC(X) ((struct RBasic*)((X) - 3)) Assuming that most of the time the interpreter is referencing structure members of the object, and does not need the actual address of the object:
struct RBasic {
VALUE flags; /* struct offset: + 0 */
VALUE klass; /* struct offset: + 4 */
};
RBASIC(X)->klass =>
((struct RBasic*)((X) - 3))->klass
C compilers convert the RBASIC(X)->flags => *(VALUE*)((X) - 3 + 0) RBASIC(X)->klass => *(VALUE*)((X) - 3 + 4) Using subtraction as the tag removal operation, instead of RBASIC(X)->flags => *(VALUE*)((X) - 3) RBASIC(X)->klass => *(VALUE*)((X) + 1) Therefore, there is no additional tag removal cost to reference structure members with non-zero offsets. One could reorder the members depending on which is “hotter”. Research shows that tag manipulation is a heavy cost, esp. for numerics; any tagging scheme should be as simple and consistent as possible. For example, determining the class of a VALUE could be inlined: #define CLASS_OF(v) ((v) & 3) ? RBASIC(v)->klass : rb_cFixnum) Two-bit tags naturally align with word boundaries on 32-bit processors. Thus, zero tags for integers on 32-bit processors allows pointer arithmetic on Thanks to Gary Wright for inspiring me to write about this. submitSubmitted by r (not verified) on Fri, 2008-05-23 12:34.
maybe post it to core and see if people agree with you? maybe even a patch? :) » reply
Very good post, thanks aSubmitted by dido (not verified) on Fri, 2010-01-08 07:58.
Very good post, thanks a lot. » reply
Some tradeoffs involvedSubmitted by Beoran (not verified) on Fri, 2009-11-20 08:24.
OCaml uses a similar type tagging system as Ruby, in that it also sets the lower bit to 1 on Fixnum-like values. http://rwmj.wordpress.com/2009/08/04/ocaml-internals/ Your approach does speed up integer caluclations a bit, but is also slows down pointer access, because now pointers cannot be used “as is” anymore. For an OOP language that usually uses heaps of references between objects, I think that may be a more serious problem. Also, how would you go at encoding Symbols? » reply
TradeoffsSubmitted by Kurt on Sun, 2009-11-29 21:31.
Pointers to data structures are almost never actually used “as is”, they are almost always used with a offset into a structure starting at an address.
struct cons {
value type;
value car;
value cdr;
};
For 32-bit word-aligned pointer As others have pointed out, all tag schemes have trade-offs. A trade-off for using non-zero bits for pointers might mean extra tag removal when communicating pointers to low-level code, e.g. C libraries. Fixed-size tags would likely simplify code and allow additional values besides small integers to be immediate values. I would encode Symbols like every other allocated object. — Kurt » reply
|
||
Recent comments
1 week 1 day ago
3 weeks 4 days ago
3 weeks 6 days ago
3 weeks 6 days ago
3 weeks 6 days ago
3 weeks 6 days ago
6 weeks 2 days ago
6 weeks 2 days ago
8 weeks 5 days ago
8 weeks 6 days ago