Ruby 1.8 vs 1.9 group_by performance

August 17, 2008

I am working on a ruby project which has cpu intensive tasks so I wanted to see what the impact would be of using the ruby 1.9 development version. It is supposed to provide a major performance increase.

The task used in this project crunches a large amount of data using mathematical functions, array sorting/grouping and database writes.

Ruby versions used for this test:

ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.10.1]ruby 1.9.0 (2008-07-25 revision 18217) [i686-darwin9]

A typical run would take 240sec using ruby 1.8 and 205s with ruby 1.9 without any adaptations: A ~15% speed increase!

I dug a little deeper and found that the array.group_by method was performing a lot faster in ruby 1.9 so i wrote this benchmark to test the increase.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
require 'rubygems'
require 'activesupport'
require 'benchmark'
include Benchmark
 
class T
attr_accessor :date, :count
def initialize(_date,_count)
@count = _count
@date = _date
end
end
 
 
def fill_array(n)
a = []
random_dates = n / 5
n.times { |x| a << T.new(rand(random_dates).days.ago,x) }
return a
end
 
(2..5).each do |p|
n = 10**p
puts "=== 10^#{p} ==="
array = []
benchmark do |x|
x.report("filling array : ") { array = fill_array(n) }
x.report("group_by (Time) : ") { array.group_by { |r| r.date } }
x.report("group_by (Date) : ") { array.group_by { |r| r.date.to_date } }
x.report("group_by (String) : ") { array.group_by { |r| r.date.to_date.to_s } }
end
end

 

On ruby 1.8 this only runs upto 10^4, 10^5 took way too long for Time and Date grouping.

=== 10^2 ===filling array     :   0.030000   0.000000   0.030000 (  0.023987)group_by (Time)   :   0.030000   0.000000   0.030000 (  0.032630)group_by (Date)   :   0.010000   0.000000   0.010000 (  0.012132)group_by (String) :   0.020000   0.000000   0.020000 (  0.019727)=== 10^3 ===filling array     :   0.300000   0.000000   0.300000 (  0.305314)group_by (Time)   :   4.270000   0.020000   4.290000 (  4.306768)group_by (Date)   :   0.610000   0.000000   0.610000 (  0.622509)group_by (String) :   0.260000   0.000000   0.260000 (  0.258966)=== 10^4 ===filling array     :   3.050000   0.030000   3.080000 (  3.094591)group_by (Time)   : 425.980000   2.250000 428.230000 (433.534167)group_by (Date)   :  53.930000   0.300000  54.230000 ( 54.870563)group_by (String) :   5.440000   0.010000   5.450000 (  5.480042)=== 10^5 ===filling array     :  31.680000   0.240000  31.920000 ( 32.714948)group_by (Time)   :  --- too long ---group_by (Date)   :  --- too long ---group_by (String) : 796.560000   5.540000 802.100000 (815.322759)

Another thing to note is that grouping by Time is a lot more expensive than grouping by Date. Grouping by String is even faster. This is the exact opposite of ruby 1.9 behavior:

=== 10^2 ===filling array     :   0.030000   0.000000   0.030000 (  0.022190)group_by (Time)   :   0.000000   0.000000   0.000000 (  0.000217)group_by (Date)   :   0.000000   0.000000   0.000000 (  0.001032)group_by (String) :   0.000000   0.000000   0.000000 (  0.004109)=== 10^3 ===filling array     :   0.110000   0.000000   0.110000 (  0.107807)group_by (Time)   :   0.000000   0.000000   0.000000 (  0.001625)group_by (Date)   :   0.010000   0.000000   0.010000 (  0.011684)group_by (String) :   0.050000   0.000000   0.050000 (  0.061122)=== 10^4 ===filling array     :   0.940000   0.010000   0.950000 (  0.975104)group_by (Time)   :   0.020000   0.000000   0.020000 (  0.018976)group_by (Date)   :   0.140000   0.000000   0.140000 (  0.134740)group_by (String) :   0.390000   0.010000   0.400000 (  0.400235)=== 10^5 ===filling array     :   9.290000   0.130000   9.420000 (  9.711327)group_by (Time)   :   0.300000   0.000000   0.300000 (  0.312848)group_by (Date)   :   1.360000   0.020000   1.380000 (  1.391566)group_by (String) :   4.100000   0.040000   4.140000 (  4.194526)

With ruby 1.9, grouping by Time is faster than Date. Grouping by String is the slowest. Important to note is that the increase is almost linear.

If we look at the difference between 1.8 and 1.9 we see that filling an array with objects is 3 times faster. But the real surprise is the exponential speed increase when grouping large dataset!

[ad#banner_ad]

Discussion, links, and tweets

I work at Venture Spirit. Follow me on Twitter; you'll enjoy my tweets. I take care to carefully craft each one. Or at least aim to make you giggle. Or offended. One of those two— I haven't decided which yet.