-
Notifications
You must be signed in to change notification settings - Fork 7
Lazy Evaluation
Usual MDArray operations are done eagerly, i.e., if @a, @b, @c are three MDArrays then the following:
@d = @a + @b + @c
will be evaluated as follows: first @a + @b is performed and stored in a temporary variable, then this temporary variable is added to @c. For large expressions, temporary variables can have significant performance impact.
This version of MDArray introduces lazy evaluation of expressions. Thus, when in lazy mode:
@lazy_d = @a + @b + @c
will not evaluate immediately. Rather, the expression is preprocessed and only executed when required. Since at execution time the whole expression is known, there is no need for temporary variables as the whole expression is executed at once. To put MDArray in lazy mode we only need to set its mode to lazy with the following command
MDArray.lazy = true
All expressions after that are by default lazy. In lazy mode, MDArray resembles Numexpr, however, there is no need to write the expression as a string, and there is no compilation involved.
MDArray does not implement broadcasting rules as NumPy. As a result, trying to operate on arrays of different shape raises an exception. On lazy mode, this exception is raise only at evaluation time, so it is possible to have an invalid lazy array. To evaluate a lazy array one should use the “[]” method as follows:
@d = lazy_d[]
@d is now a normal MDArray.
Lazy MDArrays are really lazy, so let’s assume that
@a = [1, 2, 3, 4] and @b = [5, 6, 7, 8].
Lets also have:
@l_c = @a + @b.
Now doing:
@c = @l_c[], will evaluate @c to [6, 8, 10, 12].
Now, let´s do:
@a[1] = 20 and then @d = @l_c[].
Now @d evaluates to [25, 8, 10, 12] as the new value of @a is used.
Lazy arrays can be evaluated inside expressions:
@l_c = (@a + @b)[] + @c
In this example, @l_c is a lazy array, but (@a + @b) is evaluated when the “[]” method is called and then added to @c. If now the value of @a or @b is changed, the evaluation of @l_c will not be changed as in the previous example.
Finally, lazyness is contagious. So, let´s assume that we have @l_c as above, a lazy array and we do MDArray.lazy = false. From this point on in the code, operations will be done eagerly. Now doing: @e = @d + @l_c, @e is a lazy array as its construction involves a lazy array. One should be careful when in eager mode mixing lazy and eager arrays:
@c = @l_a + (@b + @c)
then, with parenthesis, first (@b + @c) is evaluated eagerly and then added lazily to @l_a, giving a lazy array.
In this version, Lazy evaluation is around 40% less efficient in one machine I tested up to approximately the same performance in another equipment than eager evaluation when only native Java methods (Parallel Colt methods described below) are used in the expression. If expression involves any Ruby method, evaluation of lazy expressions becomes much slower that eager evaluation. In order to improve performance, I believe that compilation of expression will be necessary.
setup do
@a = MDArray.int([2, 3], [1, 2, 3, 4, 5, 6])
@b = MDArray.int([2, 3], [10, 20, 30, 40, 50, 60])
@c = MDArray.double([2, 3], [100, 200, 300, 400, 500, 600])
@d = MDArray.init_with("int", [2, 3], 4)
@e = MDArray.init_with("int", [2, 3], 5)
@f = MDArray.init_with("int", [2, 3], 6)
@g = MDArray.init_with("int", [2, 3], 7)
@h = MDArray.init_with("int", [3, 3], 7)
@i = MDArray.init_with("int", [3, 3], 7)
@float = MDArray.init_with("float", [2, 3], 10.5)
@long = MDArray.init_with("long", [2, 3], 10)
@byte = MDArray.init_with("byte", [2, 3], 10)
MDArray.lazy = true
end
#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------
should "execute lazy operations" do
p "2 a"
a_2 = (@a + @a)[]
a_2.print
c = 2 * @a
lazy_c = @a + @b
# calculate the value of the lazy array with []
c = lazy_c[]
c.print
c = (@c + @a)[]
c = (@a + @c)[]
c = (@c + @byte)[]
c = (@byte + @c)[]
lazy_c = (@a * @d - @e) - (@b + @c)
lazy_c[].print
# evaluate the lazy expression with [] on the same line
d = ((@a * @d - @e) - (@b + @c))[]
d.print
# evaluate lazy expression with [] anywhere in the expression. (@a * @d - @e) is
# done lazyly then evaluated then operated with a lazy (@b + @c). The final
# result is lazy
d = ((@a * @d - @e)[] - (@b + @c))
d.print
MDArray.lazy = false
end
#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------
should "check that lazy is contagious" do
# this is done lazyly as we have "MDArray.lazy = true" defined in the setup
l_c = @a + @b
# now operations are done eagerly
MDArray.lazy = false
e_e = @c + @b
# note that we do not require [] before printing e_e
e_e.print
# now operating a lazy array with an eager array... should be lazy even when
# MDArray.lazy is false
l_f = l_c + e_e
l_f.print
# request the calculation and print
l_f[].print
MDArray.lazy = false
end
#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------
should "execute operations eagerly" do
MDArray.lazy = false
c = @a + @b
c.print
c = (@a * @d - @e) - (@b + @c)
c.print
MDArray.lazy = false
end
#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------
should "show that lazy is really lazy" do
# MDArray lazy operations are really lazy, i.e., there is not checking of any sort
# when parsing the expression. Validation is only done when values are required.
a = MDArray.int([2, 3], [1, 2, 3, 4, 5, 6])
b = MDArray.double([3, 1], [1, 2, 3])
# arrays a and b are of different shape and cannot work together. Yet, there will
# be no error while parsing
l_c = a + b
# now we get an error
assert_raise ( RuntimeError ) { l_c[].print }
# now lets add correctly
p "calculating lazy c"
l_c = @a + @b
l_c[].print
# now lets change @b
@b[0, 0] = 1000
# and calculate again lazy c
p "calculating lazy c again"
l_c[].print
p "calculating expression"
@b[1, 0] = 2000
p1 = (@a * @d - @e)[]
p2 = (@b + @c)[]
(p1 - p2)[].print
p "@b is"
@b.print
# evaluate lazy expression with [] anywhere in the expression. (@a * @d - @e) is
# done lazyly then evaluated then operated with a lazy (@b + @c). The final
# result is lazy
p "calculating lazy d"
d = ((@a * @d - @e)[] - (@b + @c))
d[].print
# lets now change the value of @a
@a[0, 0] = 1000
# no change in d... @a has being eagerly calculated
p "lazy d again after changing @a"
d[].print
# lets now change @b
@b[0, 0] = 1
@b[0, 1] = 150
@b[1, 1] = 1000
p "b is now:"
@b.print
# @b is still lazy on d calculation, so changing @b will change the value
# of d[].
p "lazy d again after changing @b"
d[].print
p "calculating new expression"
p3 = (@b + @c)
(p1 - p3)[].print
MDArray.lazy = false
end
#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------
should "work with Numeric" do
l_c = @a + 2
l_c[].print
l_c = 2 + @a
l_c[].print
l_c = 2 - @a
l_c[].print
MDArray.lazy = false
end
#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------
should "work with unary operators" do
# MDArray.lazy = false
arr = MDArray.linspace("double", 0, 1, 100)
l_a = arr.sin
l_a.print
b = l_a[]
b.print
((@a * @d - @e).sin - (@b + @c))[].print
sinh = (arr.sinh)[]
MDArray.lazy = false
end