Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Mismatch between shape Java API #14756

Closed
androuino opened this issue Apr 22, 2019 · 15 comments
Closed

Mismatch between shape Java API #14756

androuino opened this issue Apr 22, 2019 · 15 comments
Assignees
Labels
Bug Java Label to identify Java API component Scala

Comments

@androuino
Copy link

androuino commented Apr 22, 2019

Hi, I’ve finished training the yolo3_darknet53 params and wanted to test it with Java API, however am having this error which I have no idea where it coming from:

 Exception in thread “main” java.lang.IllegalArgumentException: requirement failed:
 Mismatch between shape (100,1) and (1,100,4)
    	at scala.Predef$.require(Predef.scala:224)
    	at org.apache.mxnet.NDArray$$anonfun$9.apply(NDArray.scala:448)
    	at org.apache.mxnet.NDArray$$anonfun$9.apply(NDArray.scala:445)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    	at org.apache.mxnet.NDArray$.concatenate(NDArray.scala:445)
    	at org.apache.mxnet.module.BaseModule$$anonfun$2.apply(BaseModule.scala:267)
    	at org.apache.mxnet.module.BaseModule$$anonfun$2.apply(BaseModule.scala:267)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    	at org.apache.mxnet.module.BaseModule.predict(BaseModule.scala:267)
    	at org.apache.mxnet.infer.Predictor$$anonfun$11.apply(Predictor.scala:210)
    	at org.apache.mxnet.infer.Predictor$$anonfun$11.apply(Predictor.scala:210)
    	at org.apache.mxnet.infer.MXNetThreadPoolHandler$$anon$4.call(MXNetHandler.scala:73)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:748)
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':ObjectDetectionTutorial.main()'.
> Process 'command '/usr/lib/jvm/java-8-openjdk-amd64/bin/java'' finished with non-zero exit value 1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 1s
2 actionable tasks: 2 executed
Process 'command '/usr/lib/jvm/java-8-openjdk-amd64/bin/java'' finished with non-zero exit value 1
10:16:24: Task execution finished 'ObjectDetectionTutorial.main()'.

I only have 1 class and my input image is 512 and pretrained_base=false. Any help would be much appreciated. Thank you in advance.

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Doc

@frankfliu
Copy link
Contributor

@mxnet-label-bot add [scala, bug]

@lanking520
Copy link
Member

@androuino It's a general shape mismatch problem sounds like.

What is the input shape of your model? Are you trying to do object detection on it?

It will be better if you can share me your model and I can have a quick evaluation on it.

@androuino
Copy link
Author

androuino commented Apr 23, 2019

Thanks @lanking520 for the response. Yes am trying to do object detection and the input shape of my model is [1, 3, 512, 512] (if this is what you're looking for). This is my symbol.json file as you have requested.
https://gist.github.com/androuino/0b5e36c2fb39be2bda6cd8fc8bdadd05
and this is my params
https://filebin.net/n89gzmb285oxre2g
Thank you so much!.

@lanking520 lanking520 added the Java Label to identify Java API component label Apr 23, 2019
@lanking520 lanking520 self-assigned this Apr 23, 2019
@lanking520
Copy link
Member

@androuino Which version of MXNet Java are you using?

@androuino
Copy link
Author

androuino commented Apr 23, 2019

Hi @lanking520, am using the latest mxnet version (1.4.0) both java library and for training the model in linux gpu machine. I actually trained a resnet50 model hoping that training is different from yolo but they are throwing the same error. Do you think by changing mxnet to previous version would fix the problem? Thanks for your response.

@androuino
Copy link
Author

Tried the versions starting from 1.2.0 up to 1.3.1 but without success, all these versions has no ObjectDetector class, or maybe the implementation is different.

@lanking520
Copy link
Member

@androuino Thank you, I can reproduce this problem in the most recent 1.5.0-SNAPSHOT. This problem is a bug in the concat operator in Scala it seemed. I will be working on that from today on to see how I can fix it.

@lanking520 lanking520 added Bug and removed Question labels Apr 24, 2019
@androuino
Copy link
Author

androuino commented Apr 25, 2019

Hi @lanking520 thank you for addressing this issue. I'm really looking forward for the fix. I hope it don't give you hard time fixing it. If only the java library is open source, my team could help fixing the issue as well. Anyway thank you so much, hoping to hear some good news about this issue soon.

@androuino
Copy link
Author

I would also like to ask why am getting this error also:
https://gist.github.com/androuino/213131592b1d22763bd2a0f98fff7ed8
This happens when I use the java library: compile group: 'org.apache.mxnet', name: 'mxnet-full_2.11-linux-x86_64-gpu', version: '1.4.0'.
My machine is Ubuntu 18.04 with Cuda 9.0 and nvidia-390 installed for GeForce GTX1080ti.
I tried the same library with a fresh install of XUbuntu 16.04 with Cuda 9.2 and nvidia-396 installed for GeForce1070ti GPU but without success, am having the same error. Am using Intellij 2019.1 for testing. For Ubuntu 18.04, I installed the mxnet library through pip install mxnet-cu90, for XUbuntu 16.04, I installed the mxnet library from source (cmake). Thank you.

@lanking520
Copy link
Member

@androuino Currently, we publish our CUDA package under 9.2 version, all version prior will not be supported unfortunatly. Please install CUDA 9.2 on your machine to get it work.

The reason that might cause 9.2 not working is the CUDA path not properly set. As an alternative, you can use the script we use to install cuda and cudnn. For your use case it will be:

setup_gpu_build_tools.sh cu92 /path-you-want-to-place-cuda

If you are interested, you can join our slack channel https://mxnet.apache.org/versions/master/community/contribute.html#slack where you will expect faster response.

@lanking520
Copy link
Member

Minimum reproducible Java code:

import org.apache.mxnet.infer.javaapi.Predictor;
import org.apache.mxnet.javaapi.*;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class YoloInference {
    public static void main(String[] args) {
        // Prepare the predictor
        List<DataDesc> inputDesc = new ArrayList<>();
        List<Context> contexts = new ArrayList<>();
        DataDesc dataDesc = new DataDesc("data", new Shape(new int[]{1, 3, 512, 512}), DType.Float32(), "NCHW");
        Context context = Context.cpu();
        inputDesc.add(dataDesc);
        contexts.add(context);
        Predictor predictor = new Predictor("yolo/yolo3_darknet53", inputDesc, contexts, 0);
        // Prepare the data
        NDArray nd = NDArray.ones(Context.cpu(), new int[]{1, 3, 512, 512});
        // Do inference
        System.out.println(nd);
        predictor.predictWithNDArray(Arrays.asList(nd));
    }
}

Where as Python did not have this problem:

import mxnet as mx
from mxnet import nd

sym, arg_params, aux_params = mx.model.load_checkpoint(prefix='yolo3_darknet53', epoch=0)

mod = mx.mod.Module(symbol=sym,
                    data_names=['data'],
                    context=mx.cpu(),
                    label_names=None)
mod.bind(for_training=False,
         data_shapes=[('data', (1, 3, 512, 512))],
         label_shapes=mod._label_shapes)

mod.set_params(arg_params, aux_params, allow_missing=True)

sample_input = nd.ones((1, 3, 512, 512))

data_iter = mx.io.NDArrayIter(sample_input, None, 1)
result = mod.predict(data_iter)
for item in result:
    print(item.shape)

@lanking520
Copy link
Member

I add a PR to fix this problem. The problem is the wrong ways of concatenating the NDArrays. It used to be:

  outputBatches = [
         [a1, a2, a3], // batch a
         [b1, b2, b3]  // batch b
   ]
  result = [
         NDArray, // [a1, a2, a3]
         NDArray, // [b1, b2, b3]
  ]

Now is:

  result = [
         NDArray, // [a1, b1]
         NDArray, // [a2, b2]
         NDArray, // [a3, b3]
  ]

@androuino
Copy link
Author

Hi @lanking520, wow the fix is here already! I am now testing the fix for the Java API. I will also highly consider the script for installing CUDA in a fresh new installed Ubuntu OS.

I'll be happy to join the slack channel. Thanks again for all of this.

lanking520 added a commit to lanking520/incubator-mxnet that referenced this issue Apr 29, 2019
* add fix in the code

* add unit test

* update comments
lanking520 added a commit that referenced this issue Apr 29, 2019
* clean up submodule (#14645)

* Scala/Java Predict API fix #14756 (#14804)

* add fix in the code

* add unit test

* update comments

* add fixes to code gen
@androuino
Copy link
Author

androuino commented May 7, 2019

Hi @lanking520. After pulling your changes for the fix of my issue, I tried it right away but am having now a strange error using the example of ObjectDetection.java class but I made some alteration to it to meet my requirements. Now it looks like this:
https://gist.github.com/androuino/7808b6fdf05e3122a03f35c63d3a5f89

I followed the step by step tutorial here https://github.com/apache/incubator-mxnet/tree/master/scala-package/mxnet-demo/java-demo for running the demo.

and this is the error am having:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [16:52:43] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.4.0. Attempting to upgrade... [16:52:43] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! Exception in thread "main" java.lang.IllegalStateException: Invalid output shapes, expected: Vector(DataDesc[data,(1,3,512,512),float32,NCHW]).length, actual: ArrayBuffer((yolov30_slice_axis1_output,(1,100,1)), (yolov30_slice_axis2_output,(1,100,1)), (yolov30_slice_axis3_output,(1,100,4))).length. at org.apache.mxnet.infer.ObjectDetector.getImageClassifier(ObjectDetector.scala:202) at org.apache.mxnet.infer.ObjectDetector.<init>(ObjectDetector.scala:52) at org.apache.mxnet.infer.javaapi.ObjectDetector.<init>(ObjectDetector.scala:58) at mxnet.ObjectDetection.runObjectDetectionSingle(ObjectDetection.java:77) at mxnet.ObjectDetection.main(ObjectDetection.java:86)

The image I was used to test is 512x512 in size and the model that I've trained is also set to 512.
This is my train_yolo3.py script: https://gist.github.com/androuino/af5212923534a204b155c01b3bacb7f1
Then run the script with this command python train.py --gpus 0 --batch-size 2 --data-shape 512

If you want the files that am using including the test image, I could email it you. I couldn't upload it publicly so I preferred to email it directly to you.

Thanks for any enlightenment that you could give me with this error or at least tell me if I did something or missed something with the code: https://gist.github.com/androuino/7808b6fdf05e3122a03f35c63d3a5f89 .

access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this issue May 14, 2019
* add fix in the code

* add unit test

* update comments
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this issue Jun 23, 2019
* add fix in the code

* add unit test

* update comments
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Java Label to identify Java API component Scala
Projects
None yet
Development

No branches or pull requests

5 participants