Ruby Code Performance Tweaks

Why bother?

  • Layers.
  • Common Ruby idioms are (sometimes):
    • Concise
    • Elegant
    • Performant
  • Ruby platforms and language versions perform differently with the same code.

Layers

  • Decomposition
  • Reuse, Reliability
  • Layer Pyramids become Dependency Funnels
  • Lower layers are bottlenecks, until they are tuned.

Pyramid

Funnel

Ruby Platforms

  • MRI-1.8.6-p399
    ruby 1.8.6 (2010-02-05 patchlevel 399) [x86_64-linux]
  • MRI-1.8.7
    ruby 1.8.7 (2010-04-20 patchlevel 254) [x86_64-linux]
  • MRI-1.9
    ruby 1.9.2dev (2010-05-05 trunk 27618) [x86_64-linux]
  • JRuby-1.4
    jruby 1.4.1 (ruby 1.8.7 patchlevel 174) (2010-04-26 ea6db6a) (OpenJDK 64-Bit Server VM 1.6.0_0) [amd64-java]
  • Rubinius
    rubinius 1.0.0-rc4 (1.8.7 9098d4d7 2010-03-31 JI) [x86_64-unknown-linux-gnu]
  • Problems

    • Small problems become big problems.
    • Common idioms are born from common problems.

    Idioms

    make friends with:

    • programmers – efficiency, style
    • platforms – implementation techniques
    • algorithms – O(1), O(N), O(log N), O(N ^ 2)
    • space – structure, allocation and garbage collection costs
    • time – is money

    The best performing idiom might:

    • not be the most elegant idiom.
    • might not be the most intuitive idiom.
    • might not work best for your platform.

    Thoughts

    These Benchmarks:

    • very low-level, narrow and specific.
    • will not extrapolate to represent your application’s overall performance.
    • not comprehensive or real-world comparisons of different Ruby platforms.
    • may be due to misconfiguration.

    Stuff gets better:

    • Ruby platforms get better – benchmark results could change tomorrow.
    • Contribute.

    Do Your Research

    • Kill your own myths – see what works for you.
    • Fix code before blaming Ruby – clean, profiled code works best everywhere.
    • Measure, then measure again.

    Common Solution Domain Problems

    • startup_overhead: Start a ruby process N times.
    • yield_n_times: Yield to a block N times.
    • tail_position_return: Return a value from a method.
    • first_element: Get first element of Array.
    • last_element: Get last element of Array.
    • string_build: Build a String from parts.
    • string_formatting: Output a String and Integer in a simple format
    • string_to_symbol: Construct a Symbol from a String.
    • inject: Enumerate elements while using a temporary or block variable.
  • Common Solution Domain Problems

    • inject: Enumerate elements while using a temporary or block variable.
    • string_concatenation: Accumulate String parts of size N into one larger String.
    • string_join: Accumulate String parts of size N into one larger String.
    • array_include_short: Is a value in a short, constant set?
    • array_include: Is an element in a Array?
    • value_in_set: Is a value in a constant set?
    • dynamic_expression: Evaluate a dynamic expression
  • Problem: startup_overhead

    Start a ruby process N times.

      n.times do
        # SOLUTION?
      end
    

    Solutions

    • system("ruby ...")
  • startup_overhead

    system("ruby ...")

      n.times do
        system("#{$platform_cmd_line} -e 'exit 0'") or 
          raise "failed"    
      end
    

    Problem: yield_n_times

    Yield to a block N times.

      40000.times do
        # SOLUTION?
      end
    

    Solutions

    • for i in 1..n
    • n.times
    • 1.upto(n)
    • (1..n).each
  • yield_n_times

    for i in 1..n

      40000.times do
        for i in 1 .. n do
          n
        end    
      end
    

    yield_n_times

    n.times

      40000.times do
        n.times do
          n
        end    
      end
    

    yield_n_times

    1.upto(n)

      40000.times do
        1.upto(n) do
          n
        end    
      end
    

    yield_n_times

    (1..n).each

      40000.times do
        (1 .. n).each do
          n
        end    
      end
    

    Summary: yield_n_times

    • Use n.times, for portability.
    • Do not bother with the rest.
    • Ranges create garbage.
  • Problem: tail_position_return

    Return a value from a method.

      n.times do
        x = true
        # SOLUTION? x
        x = false
        # SOLUTION? x
      end
    

    Solutions

    • explicit return
    • fall through
  • tail_position_return

    explicit return

    def sol_1  x
      if x
        return 1
      else
        return 2
      end  
    end
    
      n.times do
        x = true
        sol_1 x
        x = false
        sol_1 x
      end
    

    tail_position_return

    fall through

    def sol_2  x
      if x
        1
      else
        2
      end  
    end
    
      n.times do
        x = true
        sol_2 x
        x = false
        sol_2 x
      end
    

    Summary: tail_position_return

    • Newer ruby implementations recognize tail position returns.
    • No return keyword == less code.
    • Easier to debug and move expressions around later.
  • Problem: first_element

    Get first element of Array.

      array = [ :thing ]
      n.times do
        # SOLUTION?
      end
    

    Solutions

    • array[0]
    • array.first
  • first_element

    array[0]

      array = [ :thing ]
      n.times do
        array[0]    
      end
    

    first_element

    array.first

      array = [ :thing ]
      n.times do
        array.first    
      end
    

    Summary: first_element

    • array[0] is optimized on some platforms.
    • array.first should be optimized on all platforms.
  • Problem: last_element

    Get last element of Array.

      array = [ :thing ]
      n.times do
        # SOLUTION?
      end
    

    Solutions

    • array[-1]
    • array.last
  • last_element

    array[-1]

      array = [ :thing ]
      n.times do
        array[-1]    
      end
    

    last_element

    array.last

      array = [ :thing ]
      n.times do
        array.last    
      end
    

    Summary: last_element

    • array[-1] is optimized on some platforms.
    • array.last should be optimized on all platforms.
  • Problem: string_build

    Build a String from parts.

      foo = "foo"
      bar = 42
    
      n.times do
        # SOLUTION?
      end
    

    Solutions

    • String#+
    • String Interpolation
  • string_build

    String#+

      foo = "foo"
      bar = 42
    
      n.times do
        "abc" + foo + "123" + bar.to_s    
      end
    

    string_build

    String Interpolation

      foo = "foo"
      bar = 42
    
      n.times do
        "abc#{foo}123#{bar}"    
      end
    

    Summary: string_build

    • Use String interpolation.
  • Problem: string_formatting

    Output a String and Integer in a simple format

      foobar = "foobar"
    
      n.times do
        # SOLUTION?
      end
    

    Solutions

    • String#%
    • String interpolation
    • SprintfCompiler cached
  • string_formatting

    String#%

      foobar = "foobar"
    
      n.times do
        "%s, %d" % [ foobar, n ]    
      end
    

    string_formatting

    String interpolation

      foobar = "foobar"
    
      n.times do
        "#{foobar}, #{n}"    
      end
    

    string_formatting

    SprintfCompiler cached

      foobar = "foobar"
    
      n.times do
        SprintfCompiler.fmt("%s, %d", [ foobar, n ])    
      end
    
  • Summary: string_formatting

    • String interpolation is faster for formats without options.
    • Rubinius::Sprintf is slow.
    • SprintfCompiler:
      • speeds up Rubinius for cached formats.
      • is ~2.5x slower than MRI sprintf(),
      • but is ~3x faster in Rubinius than MRI 1.8.6 sprintf().
      • is slower than native JRuby String#%.
  • Problem: string_to_symbol

    Construct a Symbol from a String.

      foobar = "foobar"
    
      n.times do
        # SOLUTION?
      end
    

    Solutions

    • String#to_sym
    • Dynamic Symbol
  • string_to_symbol

    String#to_sym

      foobar = "foobar"
    
      n.times do
        "#{foobar}123".to_sym    
      end
    

    string_to_symbol

    Dynamic Symbol

      foobar = "foobar"
    
      n.times do
        :"#{foobar}123"    
      end
    

    Summary: string_to_symbol

    • :“symbol” is faster on all except MRI 1.9.
  • Problem: inject

    Enumerate elements while using a temporary or block variable.

      array = (0 ... n).to_a.sort_by{|x| rand}
    
      100000.times do
        # SOLUTION?
      end
    

    Solutions

    • Array#inject
    • Local variable
  • inject

    Array#inject

      array = (0 ... n).to_a.sort_by{|x| rand}
    
      100000.times do
        array.inject({ }) { | hash, x | hash[x] = true; hash }    
      end
    

    inject

    Local variable

      array = (0 ... n).to_a.sort_by{|x| rand}
    
      100000.times do
        hash = { }
        array.each { | x | hash[x] = true }
        hash    
      end
    

    Results

  • Summary: inject

    • Array#inject is slower than using a local variable.
    • Use a local variable
      • If the result is going to be stored in a local variable anyway.
      • A local variable is less confusing and error-prone.
    • Rubinius: local variables are a bit more costly than expected.
  • Problem: string_concatenation

    Accumulate String parts of size N into one larger String.

      parts = (0 ... 100).to_a.map{"a" * n}
    
      str = ''
      100.times do
        # SOLUTION?
      end
    

    Solutions

    • str += x
    • str << x
  • string_concatenation

    str += x

      parts = (0 ... 100).to_a.map{"a" * n}
    
      str = ''
      100.times do
        parts.each do | x |
          str += x
        end    
      end
    
      str += x
    

    is the same as:

      str = (str + x)
    

    string_concatenation

    str << x

      parts = (0 ... 100).to_a.map{"a" * n}
    
      str = ''
      100.times do
        parts.each do | x |
          str << x
        end    
      end
    

    Summary: string_concatenation

    • Use str << x, but carefully avoid side-effects.
    • str += x creates pointless garbage; DONT USE IT!
    • Some platforms handle garbage and assignments differently.
    • Use array.concat x, instead of array += x.
  • Problem: string_join

    Accumulate String parts of size N into one larger String.

      parts = (0 ... 100).to_a.map{"a" * n}
    
      10000.times do
        # SOLUTION?
      end
    

    Solutions

    • str << x
    • parts.join
  • string_join

    str << x

      parts = (0 ... 100).to_a.map{"a" * n}
    
      10000.times do
        str = ''
        parts.each do | x |
          str << x
        end    
      end
    

    string_join

    parts.join

      parts = (0 ... 100).to_a.map{"a" * n}
    
      10000.times do
        str = parts.join("")    
      end
    

    Results

  • Summary: string_join

    • String to be joined are not protected from side-effects.
    • Use String#<<, if avg. string size < 100.
    • Rubinius handles String#<< better when string size < 500.
  • Problem: array_include_short

    Is a value in a short, constant set?

      # n = 2
    
      x == 0 || x == 1
    
      [ 0, 1 ].include?(x)
    
      case x
      when 0, 1
        true
      end
    
      # ETC... TIMTOWTI!
    

    Solutions

    • x == y1 || ...
    • [ ... ].include?(x)
    • array.include?(x)
    • case x; when y1, y2 ...
    • case x; when *array
    • hash.key?(x)
  • array_include_short

    x == y1 || ...

      @array = (0 ... n).to_a.sort_by{|x| rand}
      try    = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      x == 0                 # n == 1
      x == 0 || x == 1       # n == 2
      ...
    

    array_include_short

    [ ... ].include?(x)

      @array = (0 ... n).to_a.sort_by{|x| rand}
      try    = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      [ 0, 1 ].include?(x)     # n == 2
    

    array_include_short

    array.include?(x)

      @array = (0 ... n).to_a.sort_by{|x| rand}
      try    = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      ARRAY = [ 0, 1 ].freeze   # n == 2
      ...
      ARRAY.include?(x)
    

    array_include_short

    case x; when y1, y2 ...

      @array = (0 ... n).to_a.sort_by{|x| rand}
      try    = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      case x
      when 0, 1           # n == 2
        true
      end
    

    array_include_short

    case x; when *array

      @array = (0 ... n).to_a.sort_by{|x| rand}
      try    = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      ARRAY = [ 0, 1 ].freeze
      ...
      case x
      when *ARRAY
        true
      end
    

    array_include_short

    hash.key?(x)

      @array = (0 ... n).to_a.sort_by{|x| rand}
      try    = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      HASH = { 0 => true, 1 => true }.freeze
      ...
      HASH.key?(x)
    

    Summary: array_include_short

    • Beware: case uses ===, not ==.
    • Use x == y when n == 1.
    • Use HASHkey?(x) when n > 1.
    • x == y1 && … is faster than [ … ].include?(x) when n < ~10.
  • Problem: array_include

    Is an element in a Array?

      array = (0 ... n).to_a.sort_by{|x| rand}
      try   = (0 ... 2000).to_a.sort_by{|x| rand}
    
      100.times do
        try.each do | x |
          # SOLUTION?
        end
      end
    

    Solutions

    • Array#include?
    • case x; when *array
    • hash.key?(x)
  • array_include

    Array#include?

      array = (0 ... n).to_a.sort_by{|x| rand}
      try   = (0 ... 2000).to_a.sort_by{|x| rand}
    
      100.times do
        try.each do | x |
          array.include?(x)      
        end
      end
    

    array_include

    case x; when *array

      array = (0 ... n).to_a.sort_by{|x| rand}
      try   = (0 ... 2000).to_a.sort_by{|x| rand}
    
      100.times do
        try.each do | x |
          case x
          when *array
            true
          end      
        end
      end
    

    array_include

    hash.key?(x)

      array = (0 ... n).to_a.sort_by{|x| rand}
      try   = (0 ... 2000).to_a.sort_by{|x| rand}
    
      100.times do
        try.each do | x |
          hash.key?(x)      
        end
      end
    

    Summary: array_include

    • Use a Hash.
    • Beware: case uses #=== operator.
    • MRI 1.8.7+ Array#include? is slower than MRI 1.8.6.
  • Problem: value_in_set

    Is a value in a constant set?

      # n = 2 
      array = [ :foo, :bar ] 
      hash  = { :foo => true, :bar => true }
      set   = Set.new([ :foo, :bar ])
      ...
      array.include?(x)
      hash.key?(x)
      set.include?(x)
      ! (array & [ x ]).empty?     # <== WTF?
    

    Solutions

    • array.include?(x)
    • hash.key?(x)
    • set.include?(x)
    • !(array&[x]).empty?
  • value_in_set

    array.include?(x)

      array = (0 ... n).to_a.sort_by{|x| rand}
      try   = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      1000.times do
        try.each do | x |
          array.include? x      
        end
      end
    

    value_in_set

    hash.key?(x)

      array = (0 ... n).to_a.sort_by{|x| rand}
      try   = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      1000.times do
        try.each do | x |
          hash.key?(x)      
        end
      end
    

    value_in_set

    set.include?(x)

      array = (0 ... n).to_a.sort_by{|x| rand}
      try   = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      1000.times do
        try.each do | x |
          set.include?(x)      
        end
      end
    

    value_in_set

    !(array&[x]).empty?

      array = (0 ... n).to_a.sort_by{|x| rand}
      try   = (0 ... 1000).to_a.map{|x| rand(n + n)}.sort_by{|x| rand}
    
      1000.times do
        try.each do | x |
          ! (array & [ x ]).empty?      
        end
      end
    

    Summary: value_in_set

    • Ruby Set is slower than Hash.
    • !(Array&[x]).empty? performs “too well”. (!!!)
    • Set performs poorly on Rubinius.
    • Array is slower than Hash.
    • In general, use Hash#key?
  • Problem: dynamic_expression

    Evaluate a dynamic expression

      x = 10
      expr = "x * 2"
      eval(expr, binding)
    

    Solutions

    • eval
    • cached lambda
  • dynamic_expression

    eval

      exprs = (0 ... 100).to_a.sort_by{|y| rand}.map{|y| "x * #{y}"}
      x = 10
    
      n.times do 
        exprs.each do | expr |
          eval(expr, binding)      
        end
      end
    

    dynamic_expression

    cached lambda

      exprs = (0 ... 100).to_a.sort_by{|y| rand}.map{|y| "x * #{y}"}
      x = 10
    
      n.times do 
        exprs.each do | expr |
          p = lambdas[expr] ||= eval(%Q{lambda{|x| #{expr}}})
          p.call(x)      
        end
      end
    

    Summary: dynamic_expression

    • Create lambdas and cache them.
    • Avoid relying on binding.
    • Rubinius eval() is expensive.
  • More Info

    http://github.com/kstephens/ruby_code_tweaks