Reference: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/26869
I have a proof-of-concept patch to MRI that caches #to_s values for immutable values. It is implemented using a few fixed-size hash tables.
It reduces the number of #to_s result objects by 1890 during the MRI test suite for NilClass#to_s, TrueClass#to_s, FalseClass#to_s, Symbol#to_s, and Float#to_s.
It requires a minor semantic change to Ruby core. This minor change could cascade into a huge performance improvement for all Ruby implementations — as will be illustrated later:
#to_s may return frozen Strings.
This appears to not be a problem since any callers of #to_s are likely to anticipate that the receiver may already be a String and are not going to mutate it — #to_s is a coercion. The current MRI test suite passes if some #to_s results are frozen.
For code that may expect #to_s to return a mutable, an Object#dup_if_frozen method might be helpful. This method will return self.dup if the receiver is #frozen? and is not an immediate or an immutable. (Aside: a fast #dup_unless_frozen method might be helpful for general memoization of computations!)
This caching technique could be extended into other immutables (e.g.: the Numerics) and objects whose #to_s representations never change (e.g.: Class, Module?) and for #inspect under similar constraints.
In the patch, Fixnum#to_s is not cached because Fixnums are often incremented during long loops; any cache for it is quickly churned. However, this could be enabled if it proves useful in practice.
If this new semantic for #to_s is reasonable, I recommend explicitly storing frozen strings for true.to_s, false.to_s, nil.to_s and storing Symbol#to_s with each Symbol, likewise for #inspect.
In practice, most Ruby String literals become garbage immediately. If Symbol#to_s was guaranteed to be always be cached, this would enable the use of:
puts :"some string"
instead of
puts "some string"
as an in-line memoized frozen String that creates no garbage when calling puts which will call #to_s on its argument, but never mutate the result. A parser or compiler could recognize Symbol#to_s as an operation with no side-effect and elide it, providing a true String constant. This idiom would eliminate the pointless String garbage created by the evaluation of every String literal.
This is far more expressive and concise than:
SOME_STRING = "some string".freeze
...
puts SOME_STRING
The alternative to :"some string" might be to memoize all String literals as frozen. This is a superior syntax and semantic — old code would need to change on a massive scale, but any issues would be easy to diagnose:
str = '' # Make a mutable empty string.
str << "foo" # "foo" is garbage
str << "bar" # "bar" is garbage
would become:
str = ''.dup # Make a mutable empty string.
str << "foo" # "foo" is not garbage
str << "bar" # "bar" is not garbage
The latter is backwards-compatible with the current String literal semantics.
Recent comments
9 weeks 6 days ago
20 weeks 10 hours ago
38 weeks 5 days ago
1 year 2 weeks ago
1 year 36 weeks ago
1 year 48 weeks ago
1 year 51 weeks ago
1 year 51 weeks ago
1 year 51 weeks ago
1 year 51 weeks ago