Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-1000] get Ndarray real value and form it from a NDArray #12690

Merged
merged 11 commits into from
Jan 25, 2019

Conversation

lanking520
Copy link
Member

@lanking520 lanking520 commented Sep 27, 2018

Description

This is an implementation of visualizing the internal structure of NDArray as well as forming an NDArray from a multi-dimensional array.

visualize: Shown the structure of the current NDArray, it is also able to visualize some large NDArray

toNDArray: Convert a Array[Array[....Array[MXNET_PRIMITIVE]...]] matrix into NDArray. Flatten and fill in a long array

@nswamy @andrewfayres @yzhliu @gigasquid @ustcfd

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@vandanavk
Copy link
Contributor

@mxnet-label-bot [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Sep 27, 2018
@nswamy
Copy link
Member

nswamy commented Oct 17, 2018

I am not comfortable adding this API, an analogy to NDArray is BufferedImage, you don't print all the pixel values of a BufferedImage in toString. I understand it helps with debugging, however i would expect the developer/user to use a debugger or override toString then.

@lanking520
Copy link
Member Author

I am not comfortable adding this API, an analogy to NDArray is BufferedImage, you don't print all the pixel values of a BufferedImage in toString. I understand it helps with debugging, however i would expect the developer/user to use a debugger or override toString then.

@nswamy two things I would like to mention:

  • the visualize method does not override toString.
  • the visualize method can smartly handle the large NDArray case. Please see my unittest examples. It would ignore most of the content and only print a part of it. It implementation is very similar to ndarray print function in Python

@nswamy
Copy link
Member

nswamy commented Oct 17, 2018

IMO, It still doesn't warrant to add this , whether you call it visualize or override toString. Another thing its confusing user experience that you are putting arbitrary THRESHOLD values?. what makes these values to be correct, why not more or less.

    val ARRAYTHRESHOLD = 1000   // longest array to show in full 

@lanking520
Copy link
Member Author

IMO, It still doesn't warrant to add this , whether you call it visualize or override toString. Another thing its confusing user experience that you are putting arbitrary THRESHOLD values?. what makes these values to be correct, why not more or less.

    val ARRAYTHRESHOLD = 1000   // longest array to show in full 

@nswamy since this is just a helper tool for users to debug and see what content inside. I think it's fair to have this tool. This Threshold value is just a limitation on numbers of elements to be shown in the array. The reason I set it as 1000 because 10000 sometimes would crash the JVM because it used too much memory and it's making no sense for a user to visualize that much. If the user seriously wants something to show up, he/she should at least try to slice and get it visualized. We can get this threshold configurable and ask the user to pass in. I am also curious how numpy set up this number in Python. If I can find a number in there I will let you know.

@lanking520
Copy link
Member Author

Link to the discussion we discussed before: #12536

nswamy
nswamy previously requested changes Oct 17, 2018
Copy link
Member

@nswamy nswamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this API(visualize/toString in object) fit to be in the NDArray class, it might be suitable in a Util class.

@yzhliu
Copy link
Member

yzhliu commented Oct 22, 2018

Any reason why we should not name the function toString? @nswamy @gigasquid Does clojure pkg so far has similar thing that may conflict with this?

@nswamy
Copy link
Member

nswamy commented Oct 23, 2018

quoting from Oracle's toString https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#toString()

Returns a string representation of the object. In general, the toString method returns a string that
"textually represents" this object. The result should be a **concise but informative representation**
that is easy for a person to read. It is recommended that all subclasses override this method.

It has to be concise: It might be helpful to print the metadata of the NDArray(Shape, DType, ...).

@yzhliu
Copy link
Member

yzhliu commented Oct 23, 2018

I agree. Can we modify the current impl so that it can reach the toString bar?

@lanking520
Copy link
Member Author

lanking520 commented Oct 23, 2018

This is a demo for toString method:

    val arrBuf = ArrayBuffer[Array[Float]]()
    for (i <- 0 until 20) arrBuf += Array(1.0f, 2.0f, 3.0f, 4.0f)
    val arr = Array(
      Array(
        arrBuf.toArray
      ),
      Array(
        arrBuf.toArray
      )
    )
    val nd = NDArray.toNDArray(arr)
    println(Visualize.toString(nd))
    println(nd)

Here is the output

[
 [
  [
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
  ]
 ]
 [
  [
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
   [1.0,2.0,3.0,4.0]
  ]
 ]
]
<NDArray (2,1,20,4) cpu(0)>

Here is what Gluon looks like:

Out[23]: [[0. 1. 2. 3.]
                [4. 5. 9. 7.]
                [ 8. 9. 10. 11.]]
               <NDArray 3x4 @cpu(0)>

@lanking520
Copy link
Member Author

@nswamy @yzhliu @gigasquid WDYT?

@yzhliu
Copy link
Member

yzhliu commented Oct 24, 2018

@nswamy Any thoughts? If it is for pretty-printing ndarray, I think it is suitable to put into toString. As for now we do not have a good toString for NDArray.
though I think @lanking520 's example output is a bit too long.

@lanking520
Copy link
Member Author

As offline discussion with @nswamy, the visualize method will go to a Util class. The reason behind it does not quite explain the idea behind NDArray. We can take it back to toString if anytime in the future when this method is more stable.

@yzhliu
Copy link
Member

yzhliu commented Oct 24, 2018

would you mind describe at what extent, it describes the idea behind NDArray? I think such a small feature can be easily made correctly at the very first time.

@lanking520
Copy link
Member Author

@yzhliu here is the points:

We should not place it at the very core place if we are not totally confident about its functionality. We should place it elsewhere until we are confident enough user likes it and it is not buggy.
This method is not fully tested so we are using users for the experiment. Apart from that, we are more responsible for all public methods we introduce because the future maintenance for it is big.

@yzhliu
Copy link
Member

yzhliu commented Oct 24, 2018

My point is, if we think the method is buggy, then we should not place it anywhere - why create a new Class which for sure will be deprecated later?
I don't see the point why we cannot make it correct now.

So, do we agree, NDArray needs an implement of toString? If so, can we spend some efforts make the functionality correct for toString?

@yzhliu
Copy link
Member

yzhliu commented Oct 24, 2018

So far what need to be improved:

  • @nswamy 's suggestion: print also Shape and DType.
  • Mine: too many lines printed, suggest to follow numpy's limitation.

Anything else?

@nswamy
Copy link
Member

nswamy commented Oct 24, 2018

@yzhliu my concerns are:

  1. that we are exposing helper utility functions as public API. I am not convinced how printing binary float values would make sense except when you probably are dealing with a 2 or 3 dimensional tensor when axes have small values, i don't think I can find a difference if there is a pixel value different in [3x224x224] Image represented in NDArray.

  2. Typically Java object's toString() do not print the binary content of an object - if you print binary content IMO could be very surprising to users. I gave an analogy above how BufferedImage does not print its pixel values in toString.
    Hence I suggested to only print the metadata of NDArray in toString() and move this helper to a utility class.

Lastly look at what other frameworks are doing in their Tensor's toString()
https://www.tensorflow.org/api_docs/java/reference/org/tensorflow/Tensor.html#toString()

@lanking520
Copy link
Member Author

I think Tensor is an ideal translation to MXNet Symbol since Tensorflow itself is a symbolic language. It did not contain real-time data when you compute at its stage so showing the detailed information seemed impossible.

I agree with you on showing the entire image buffered data is not necessary unless the user really want to do so.

@lanking520
Copy link
Member Author

@zachgk Please review again

@lanking520 lanking520 merged commit 0f334ae into apache:master Jan 25, 2019
jessr92 pushed a commit to jessr92/incubator-mxnet that referenced this pull request Jan 27, 2019
…e#12690)

* add visualize

* adding Any type input to form NDArray

* fix bug and add tests

* add a toString method

* add Visualize Util and migrate visualize structure to there

* update with tests

* refactor code

* fix the minor issue

* add multiple types support

* add changes on names and tests

* make code elegant and improve readability
jessr92 pushed a commit to jessr92/incubator-mxnet that referenced this pull request Jan 27, 2019
…e#12690)

* add visualize

* adding Any type input to form NDArray

* fix bug and add tests

* add a toString method

* add Visualize Util and migrate visualize structure to there

* update with tests

* refactor code

* fix the minor issue

* add multiple types support

* add changes on names and tests

* make code elegant and improve readability
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
…e#12690)

* add visualize

* adding Any type input to form NDArray

* fix bug and add tests

* add a toString method

* add Visualize Util and migrate visualize structure to there

* update with tests

* refactor code

* fix the minor issue

* add multiple types support

* add changes on names and tests

* make code elegant and improve readability
@lanking520 lanking520 deleted the ndarray-im branch March 11, 2019 22:28
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
…e#12690)

* add visualize

* adding Any type input to form NDArray

* fix bug and add tests

* add a toString method

* add Visualize Util and migrate visualize structure to there

* update with tests

* refactor code

* fix the minor issue

* add multiple types support

* add changes on names and tests

* make code elegant and improve readability
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-work-in-progress PR is still work in progress Scala
Projects
None yet
Development

Successfully merging this pull request may close these issues.