Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

add symbol.SwapAxis operator, just can do Forward(). #502

Closed
wants to merge 9 commits into from
Closed

add symbol.SwapAxis operator, just can do Forward(). #502

wants to merge 9 commits into from

Conversation

starimpact
Copy link
Contributor

this is a early version, can run now, but haven't been tested.
This is just a look-version, means let you have a look and give me some ideas.

@starimpact
Copy link
Contributor Author

I am sorry for did not see your comments.

@starimpact
Copy link
Contributor Author

thanks for your suggestions.

@starimpact
Copy link
Contributor Author

I have finished the new code based on your suggestions.
And the Backward is also completed.
Please check the code for me.Give me more suggestions.
Thank you very much.:smile:

@@ -38,15 +38,15 @@ ADD_CFLAGS =
#---------------------------------------------

# whether use CUDA during compile
USE_CUDA = 0
USE_CUDA = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this back to default, as most users don't have cuda

@tqchen
Copy link
Member

tqchen commented Nov 6, 2015

Thanks for the contribution. I have made a few comments on the code, in general

  • Code Style, we use Google C++ code style, you can reproduce the linter check locally using
make lint

@starimpact
Copy link
Contributor Author

Thanks very much for your suggestions.

…swapaxis files.some code style change. function is ready to go.
@starimpact
Copy link
Contributor Author

function is ready to go. please check it.
maybe there is a little code style problem.
😆 😄 😸 😈

@starimpact
Copy link
Contributor Author

I come across a problem:
SwapAxis(..., dim1=2, dim2=3)can work well on both cpu and gpu.
SwapAxis(..., dim1=3, dim2=2)only work well on cpu, but fail on gpu with the following error:

INFO:root:Start training with [gpu(0)]
[13:46:12] ./dmlc-core/include/dmlc/logging.h:208: [13:46:12] ./mshadow/mshadow/./tensor_blob.h:530: Check failed: (this->shape_.Size()) == (shape.Size()) TBlob.get_with_shape: new and old shape do not match total elements
[13:46:12] ./dmlc-core/include/dmlc/logging.h:208: [13:46:12] src/engine/./threaded_engine.h:295: [13:46:12] ./mshadow/mshadow/./tensor_blob.h:530: Check failed: (this->shape_.Size()) == (shape.Size()) TBlob.get_with_shape: new and old shape do not match total elements
terminate called after throwing an instance of 'dmlc::Error'
  what():  [13:46:12] src/engine/./threaded_engine.h:295: [13:46:12] ./mshadow/mshadow/./tensor_blob.h:530: Check failed: (this->shape_.Size()) == (shape.Size()) TBlob.get_with_shape: new and old shape do not match total elements
Aborted (core dumped)

@starimpact
Copy link
Contributor Author

std::accumulate can not be recognized by nvcc.

@tqchen
Copy link
Member

tqchen commented Nov 8, 2015

Oh, yap. Maybe create our own version of prod function is easier. Then I think it is OK. Thanks

@tqchen
Copy link
Member

tqchen commented Nov 8, 2015

For the error, as it indicates, it means the shape's size do not match the size of TBlob. Likely due to shape initialization error, either InferShape or Shape2Five

@tqchen
Copy link
Member

tqchen commented Nov 8, 2015

The platform dependency issue was likely due to some uninitialized memory(variable) that causes uncertainties, but just my guess

@tqchen
Copy link
Member

tqchen commented Nov 8, 2015

resolve all the comments by cpplint, there are detailed messages

@tqchen
Copy link
Member

tqchen commented Nov 8, 2015

For output, we need to do

exec_c.forward()
out = exec.output[0].asnumpy()
print out

The error was due to we did not call asnumpy to wait for the result, and the system start to shutdown before the computation starts.

@starimpact
Copy link
Contributor Author

I can see nothing comments in the swapaxis files by cpplint? where are they?

@starimpact
Copy link
Contributor Author

there is something error...

=====69/70 cpp-header files passed check=====
src/operator/swapaxis-inl.h: 34 Errors of 4 Categories map={'whitespace': 29, 'runtime': 2, 'readability': 2, 'build': 1}
=====57/58 cpp-soruce files passed check=====
src/operator/swapaxis.cc: 2 Errors of 1 Categories map={'readability': 2}
=====40/40 python files passed check=====
2 files failed lint
make: *** [lint] Error 1

@tqchen
Copy link
Member

tqchen commented Nov 8, 2015

Hmm the error message occurs before these when it scan through files. You
may need to scroll up a bit to see it
On Sat, Nov 7, 2015 at 10:46 PM Zhang Ming [email protected] wrote:

there is something error...

=====69/70 cpp-header files passed check=====
src/operator/swapaxis-inl.h: 34 Errors of 4 Categories map={'whitespace': 29, 'runtime': 2, 'readability': 2, 'build': 1}
=====57/58 cpp-soruce files passed check=====
src/operator/swapaxis.cc: 2 Errors of 1 Categories map={'readability': 2}
=====40/40 python files passed check=====
2 files failed lint
make: *** [lint] Error 1


Reply to this email directly or view it on GitHub
#502 (comment).

@starimpact
Copy link
Contributor Author

cpu is work right now, but not for gpu:

def test2():
    data_in = mx.symbol.Variable('data')
    conv = mx.symbol.Convolution(data=data_in, kernel=(3, 3), num_filter=16)
    datatmp = np.ones((1, 1, 32, 64))
    mxdata = mx.nd.array(datatmp)
    weightmp = np.ones((16, 1, 3, 3))
    mxweight = mx.nd.array(weightmp)
    biastmp = np.zeros(16)
    mxbias = mx.nd.array(biastmp)
    exe_c = conv.bind(ctx=mx.gpu(0), args=[mxdata, mxweight, mxbias])
    exe_c.forward()
    out = exe_c.outputs[0].asnumpy()
    print out

test2()

Error

[14:50:16] ./dmlc-core/include/dmlc/logging.h:208: [14:50:16] ./mshadow/mshadow/./tensor_blob.h:508: Check failed: Device::kDevMask == dev_mask_ && DataType<DType>::kFlag == type_flag_ TBlob.get: device type do not match specified type
[14:50:16] ./dmlc-core/include/dmlc/logging.h:208: [14:50:16] src/engine/./threaded_engine.h:295: [14:50:16] ./mshadow/mshadow/./tensor_blob.h:508: Check failed: Device::kDevMask == dev_mask_ && DataType<DType>::kFlag == type_flag_ TBlob.get: device type do not match specified type
terminate called after throwing an instance of 'dmlc::Error'
  what():  [14:50:16] src/engine/./threaded_engine.h:295: [14:50:16] ./mshadow/mshadow/./tensor_blob.h:508: Check failed: Device::kDevMask == dev_mask_ && DataType<DType>::kFlag == type_flag_ TBlob.get: device type do not match specified type
Aborted (core dumped)

@starimpact
Copy link
Contributor Author

make lint is all passed!

…support.change test function name to test_swapaxes.
std::vector<TShape> *out_shape,
std::vector<TShape> *aux_shape) const override {
int input_num = in_shape->size();
if (input_num == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK_EQ(input_shape->size(), 1);

@tqchen
Copy link
Member

tqchen commented Nov 8, 2015

Thanks for the good job! I have last few comments, please address them and rebase to resolve the conflict to current master here.

http://mxnet.readthedocs.org/en/latest/contribute.html#how-to-resolve-conflict-with-master

The conflict might have something to do with commits you have in mshadow, if that is the case, the best way might be reset mshadow's version, or keep a copy of your files, and do a clean fork

@starimpact
Copy link
Contributor Author

Done, please have a check!

@starimpact
Copy link
Contributor Author

oh, sorry, have not updated your newest comments.

@starimpact
Copy link
Contributor Author

I think you forgot I can not push to your repository.

@starimpact
Copy link
Contributor Author

Do I have another way to add your newest commits without do second time fork?

@tqchen
Copy link
Member

tqchen commented Nov 9, 2015

@starimpact
Copy link
Contributor Author

something is wrong when I do rebase your mxnet.

@starimpact
Copy link
Contributor Author

I do fetch your mxnet, but my repository is changed to my older version one when I do rebase.
really strange.

@starimpact
Copy link
Contributor Author

I have pushed the newest code to my forked mxnet, can you check it?

@starimpact
Copy link
Contributor Author

I need a little time to figure out the strange problem of rebase.

@starimpact
Copy link
Contributor Author

Can I do merge?

@tqchen
Copy link
Member

tqchen commented Nov 9, 2015

The general instruction is here http://mxnet.readthedocs.org/en/latest/contribute.html#how-to-resolve-conflict-with-master

If you find files with conflicts, edit the files and merge the conflict, and do a git add as indicated in the instruction

@starimpact
Copy link
Contributor Author

the situation is: I have commit a lot in my local repository, when I rebase to your newest master, my current work directory content is updated to the first commit. What can I do?

@starimpact
Copy link
Contributor Author

OK, likely I found the way....

@starimpact
Copy link
Contributor Author

ok ,now, please check.

@tqchen
Copy link
Member

tqchen commented Nov 9, 2015

closed due to #519

@tqchen tqchen closed this Nov 9, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants