Skip to content
Rodrigo Botafogo edited this page Jul 17, 2013 · 3 revisions

Usual MDArray operations are done eagerly, i.e., if @a, @b, @c are three MDArrays then the following:

@d = @a + @b + @c

will be evaluated as follows: first @a + @b is performed and stored in a temporary variable, then this temporary variable is added to @c. For large expressions, temporary variables can have significant performance impact.

This version of MDArray introduces lazy evaluation of expressions. Thus, when in lazy mode:

@lazy_d = @a + @b + @c

will not evaluate immediately. Rather, the expression is preprocessed and only executed when required. Since at execution time the whole expression is known, there is no need for temporary variables as the whole expression is executed at once. To put MDArray in lazy mode we only need to set its mode to lazy with the following command

MDArray.lazy = true  

All expressions after that are by default lazy. In lazy mode, MDArray resembles Numexpr, however, there is no need to write the expression as a string, and there is no compilation involved.

MDArray does not implement broadcasting rules as NumPy. As a result, trying to operate on arrays of different shape raises an exception. On lazy mode, this exception is raise only at evaluation time, so it is possible to have an invalid lazy array. To evaluate a lazy array one should use the “[]” method as follows:

@d = lazy_d[]

@d is now a normal MDArray.

Lazy MDArrays are really lazy, so let’s assume that

@a = [1, 2, 3, 4] and @b = [5, 6, 7, 8].  

Lets also have:

@l_c = @a + @b.  

Now doing:

@c = @l_c[], will evaluate @c to [6, 8, 10, 12].  

Now, let´s do:

@a[1] = 20 and then @d = @l_c[].  

Now @d evaluates to [25, 8, 10, 12] as the new value of @a is used.

Lazy arrays can be evaluated inside expressions:

@l_c = (@a + @b)[] + @c

In this example, @l_c is a lazy array, but (@a + @b) is evaluated when the “[]” method is called and then added to @c. If now the value of @a or @b is changed, the evaluation of @l_c will not be changed as in the previous example.

Finally, lazyness is contagious. So, let´s assume that we have @l_c as above, a lazy array and we do MDArray.lazy = false. From this point on in the code, operations will be done eagerly. Now doing: @e = @d + @l_c, @e is a lazy array as its construction involves a lazy array. One should be careful when in eager mode mixing lazy and eager arrays:

@c = @l_a + (@b + @c)

then, with parenthesis, first (@b + @c) is evaluated eagerly and then added lazily to @l_a, giving a lazy array.

In this version, Lazy evaluation is around 40% less efficient in one machine I tested up to approximately the same performance in another equipment than eager evaluation when only native Java methods (Parallel Colt methods described below) are used in the expression. If expression involves any Ruby method, evaluation of lazy expressions becomes much slower that eager evaluation. In order to improve performance, I believe that compilation of expression will be necessary.

setup do

  @a = MDArray.int([2, 3], [1, 2, 3, 4, 5, 6])
  @b = MDArray.int([2, 3], [10, 20, 30, 40, 50, 60])
  @c = MDArray.double([2, 3], [100, 200, 300, 400, 500, 600])
  @d = MDArray.init_with("int", [2, 3], 4)
  @e = MDArray.init_with("int", [2, 3], 5)
  @f = MDArray.init_with("int", [2, 3], 6)
  @g = MDArray.init_with("int", [2, 3], 7)

  @h = MDArray.init_with("int", [3, 3], 7)
  @i = MDArray.init_with("int", [3, 3], 7)

  @float = MDArray.init_with("float", [2, 3], 10.5)
  @long = MDArray.init_with("long", [2, 3], 10)
  @byte = MDArray.init_with("byte", [2, 3], 10)

  MDArray.lazy = true

end


#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------

should "execute lazy operations" do

  p "2 a"
  a_2 = (@a + @a)[]
  a_2.print

  c = 2 * @a

  lazy_c = @a + @b
  # calculate the value of the lazy array with []
  c = lazy_c[]
  c.print

  c = (@c + @a)[]
  c = (@a + @c)[]

  c = (@c + @byte)[]
  c = (@byte + @c)[]
 
  lazy_c = (@a * @d - @e) - (@b + @c)
  lazy_c[].print

  # evaluate the lazy expression with [] on the same line
  d = ((@a * @d - @e) - (@b + @c))[]
  d.print

  # evaluate lazy expression with [] anywhere in the expression. (@a * @d - @e) is 
  # done lazyly then evaluated then operated with a lazy (@b + @c).  The final 
  # result is lazy
  d = ((@a * @d - @e)[] - (@b + @c))
  d.print

  MDArray.lazy = false

end

#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------

should "check that lazy is contagious" do

  # this is done lazyly as we have "MDArray.lazy = true" defined in the setup
  l_c = @a + @b

  # now operations are done eagerly
  MDArray.lazy = false
  e_e = @c + @b
  # note that we do not require [] before printing e_e
  e_e.print

  # now operating a lazy array with an eager array... should be lazy even when 
  # MDArray.lazy is false
  l_f = l_c + e_e
  l_f.print
  # request the calculation and print
  l_f[].print

  MDArray.lazy = false

end

#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------

should "execute operations eagerly" do

  MDArray.lazy = false

  c = @a + @b
  c.print

  c = (@a * @d - @e) - (@b + @c)
  c.print

  MDArray.lazy = false

end

#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------

should "show that lazy is really lazy" do

  # MDArray lazy operations are really lazy, i.e., there is not checking of any sort
  # when parsing the expression.  Validation is only done when values are required.

  a = MDArray.int([2, 3], [1, 2, 3, 4, 5, 6])
  b = MDArray.double([3, 1], [1, 2, 3])


  # arrays a and b are of different shape and cannot work together. Yet, there will
  # be no error while parsing
  l_c = a + b

  # now we get an error
  assert_raise ( RuntimeError ) { l_c[].print }

  # now lets add correctly
  p "calculating lazy c"
  l_c = @a + @b
  l_c[].print

  # now lets change @b
  @b[0, 0] = 1000
  # and calculate again lazy c
  p "calculating lazy c again"
  l_c[].print

  p "calculating expression"
  @b[1, 0] = 2000
  p1 = (@a * @d - @e)[]
  p2 = (@b + @c)[]
  (p1 - p2)[].print
  p "@b is"
  @b.print

  # evaluate lazy expression with [] anywhere in the expression. (@a * @d - @e) is 
  # done lazyly then evaluated then operated with a lazy (@b + @c).  The final 
  # result is lazy
  p "calculating lazy d"
  d = ((@a * @d - @e)[] - (@b + @c))
  d[].print
  # lets now change the value of @a
  @a[0, 0] = 1000
  # no change in d... @a has being eagerly calculated
  p "lazy d again after changing @a"
  d[].print
  # lets now change @b
  @b[0, 0] = 1
  @b[0, 1] = 150
  @b[1, 1] = 1000
  p "b is now:"
  @b.print
  # @b is still lazy on d calculation, so changing @b will change the value 
  # of d[].
  p "lazy d again after changing @b"
  d[].print

  p "calculating new expression"
  p3 = (@b + @c)
  (p1 - p3)[].print

  MDArray.lazy = false

end

#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------

should "work with Numeric" do

  l_c = @a + 2
  l_c[].print

  l_c = 2 + @a
  l_c[].print

  l_c = 2 - @a
  l_c[].print

  MDArray.lazy = false

end

#-------------------------------------------------------------------------------------
#
#-------------------------------------------------------------------------------------
should "work with unary operators" do

  # MDArray.lazy = false

  arr = MDArray.linspace("double", 0, 1, 100)
  l_a = arr.sin
  l_a.print

  b = l_a[]
  b.print

  ((@a * @d - @e).sin - (@b + @c))[].print

  sinh = (arr.sinh)[]
  
  MDArray.lazy = false

end
Clone this wiki locally