add WaveReader and WaveInfoReader#87
Conversation
|
@cskuangfj please review |
|
sorry @csukuangfj pls review |
| .def("Duration", &WaveInfo::Duration) | ||
| .def("NumChannels", &WaveInfo::NumChannels) | ||
| .def("BlockAlign", &WaveInfo::BlockAlign) | ||
| .def("DataBytes", &WaveInfo::DataBytes); |
There was a problem hiding this comment.
I am not sure whether the binding for Read should also be included.
| .def("DataBytes", &WaveInfo::DataBytes); | ||
|
|
||
| py::class_<WaveData>(m, "WaveData") | ||
| .def("Duration", &WaveData::Duration) |
There was a problem hiding this comment.
I know it's time-consuming to wrap every member of the class, but it's just copy and paste;
please also wrap other data members.
| .def("Done", &SequentialTableReader<WaveHolder>::Done) | ||
| .def("Key", &SequentialTableReader<WaveHolder>::Key) | ||
| .def("FreeCurrent", &SequentialTableReader<WaveHolder>::FreeCurrent) | ||
| .def("Value", &SequentialTableReader<WaveHolder>::Value) |
There was a problem hiding this comment.
I think py::return_value_policy::reference should be used for "Value",
If a function in C++ returns a pointer or a reference that does not owned by Python,
we should use py::return_value_policy::reference
|
thanks so much @csukuangfj ! Good to have pybind expert to help us.
…On Thu, Dec 19, 2019 at 10:31 AM csukuangfj ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/pybind/feat/wave_reader_pybind.cc
<#87 (comment)>:
> + .def("NumChannels", &WaveInfo::NumChannels)
+ .def("BlockAlign", &WaveInfo::BlockAlign)
+ .def("DataBytes", &WaveInfo::DataBytes);
+
+ py::class_<WaveData>(m, "WaveData")
+ .def("Duration", &WaveData::Duration)
+ .def("Data", &WaveData::Data);
+
+ py::class_<SequentialTableReader<WaveHolder>>(m, "SequentialWaveReader")
+ .def(py::init<>())
+ .def(py::init<const std::string&>())
+ .def("Open", &SequentialTableReader<WaveHolder>::Open)
+ .def("Done", &SequentialTableReader<WaveHolder>::Done)
+ .def("Key", &SequentialTableReader<WaveHolder>::Key)
+ .def("FreeCurrent", &SequentialTableReader<WaveHolder>::FreeCurrent)
+ .def("Value", &SequentialTableReader<WaveHolder>::Value)
I think py::return_value_policy::reference should be used for "Value",
If a function in C++ returns a pointer or a reference that does not owned
by Python,
we should use py::return_value_policy::reference
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLOZHILE6FFCWZFPPHH3QZLMJNA5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCPXDKMI#pullrequestreview-334378289>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO5GPD6OTBJYFX62UWDQZLMJNANCNFSM4J4OIUOQ>
.
|
|
Something I am concerned about with this pybind stuff is the
documentation. How can users who are not pybind11 experts figure out the
usage?
…On Thu, Dec 19, 2019 at 10:38 AM csukuangfj ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/pybind/feat/wave_reader_pybind.h
<#87 (comment)>:
> @@ -0,0 +1,25 @@
+// pybind/feat/wave_reader_pybind.h
+
+// Copyright 2019 Microsoft Corporation (author: Xingyu Na)
+
+// See ../../COPYING for clarification regarding multiple authors
../../../COPYING
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLO3ETI4QXFHWS6GQ75TQZLNBNA5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCPXDVNQ#pullrequestreview-334379702>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO73ABFAIH5G5W4JTXDQZLNBNANCNFSM4J4OIUOQ>
.
|
|
As for the documentation, a normal user who uses kaldi pybind in Python can get the help for |
|
Mm. I'm still concerned that it is kind of a pain to do that (not everyone does much interactive python programming). One possibility is to have a comment block above each pybind wrapping where we write it out as if it were a native python class definition, with function signatures and documentation (but of course no implementation)? I know it would be time consuming, but it could make a big difference to how easy it is to understand the code. |
|
I found this |
|
My instinct is that human-generated documentation would be more readable
and also easier (if more time-consuming) to maintain.
…On Thu, Dec 19, 2019 at 10:48 AM csukuangfj ***@***.***> wrote:
I found this
https://github.com/intel-isl/Open3D/blob/master/src/Python/open3d_pybind/docstring.h
yesterday and have not explored too far. I think it's maybe helpful.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLO5HHVEMTLRHF34GHVLQZLOI7A5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHIGUVQ#issuecomment-567306838>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZFHMWQWMBIOQZ7YALQZLOI7ANCNFSM4J4OIUOQ>
.
|
|
You can see the result of the generated documentation using the Open3D approach. The C++ code written by a human: m_io.def("write_image",
[](const std::string &filename, const geometry::Image &image,
int quality) {
return io::WriteImage(filename, image, quality);
},
"Function to write Image to file", "filename"_a, "image"_a,
"quality"_a = 90);and the corresponding documentation generated by the machine can be found at I am not sure whether that is what you wanted. |
|
OK, interesting. That is nice looking documentation.
On the other hand, I think it's probably the habit of most Kaldi users to
go looking in the code itself for
documentation (since that's where it's easiest to find normally). So
having a separate website just for the pybind
may not be so convenient.
Would be nice to see what others think.
…On Thu, Dec 19, 2019 at 10:54 AM csukuangfj ***@***.***> wrote:
You can see the result of the generated documentation using the Open3D
approach.
The *C++* code
<https://github.com/intel-isl/Open3D/blob/e4315a8033dc9e28386fc34d4896d9bb0a3bf91b/src/Python/open3d_pybind/io/class_io.cpp#L108>
written by a human:
m_io.def("write_image",
[](const std::string &filename, const geometry::Image &image,
int quality) {
return io::WriteImage(filename, image, quality);
},
"Function to write Image to file", "filename"_a, "image"_a,
"quality"_a = 90);
and the corresponding documentation generated by the *machine* can be
found at
http://www.open3d.org/docs/release/python_api/open3d.io.write_image.html#open3d.io.write_image
I am not sure whether that is what you wanted.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLO6ME55QI3D2ZGF5GWTQZLO7TA5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHIG6QY#issuecomment-567308099>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4RPM25HD7ZWLNYGW3QZLO7TANCNFSM4J4OIUOQ>
.
|
|
Oh yeah, sorry, I should have said SubMatrix (assuming you did not want to
copy the data).
…On Thu, Dec 19, 2019 at 2:23 PM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/pybind/feat/wave_reader_pybind.cc
<#87 (comment)>:
> +
+void pybind_wave_reader(py::module& m) {
+ m.attr("kWaveSampleMax") = py::cast(kWaveSampleMax);
+
+ py::class_<WaveInfo>(m, "WaveInfo")
+ .def("IsStreamed", &WaveInfo::IsStreamed)
+ .def("SampFreq", &WaveInfo::SampFreq)
+ .def("SampleCount", &WaveInfo::SampleCount)
+ .def("Duration", &WaveInfo::Duration)
+ .def("NumChannels", &WaveInfo::NumChannels)
+ .def("BlockAlign", &WaveInfo::BlockAlign)
+ .def("DataBytes", &WaveInfo::DataBytes);
+
+ py::class_<WaveData>(m, "WaveData")
+ .def("Duration", &WaveData::Duration)
+ .def("Data", &WaveData::Data);
https://github.com/danpovey/kaldi/blob/532f3845a804e85842670a6a248d1db3aac0c32f/src/matrix/kaldi-matrix.h#L743-L750
The constructor and destructor of MatrixBase are protected. You can never
create an object of MatrixBase directly. So the returned type is a
subclass of MatrixBase
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLOZNS4FPXKOEQRYG5LTQZMHM5A5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCPXPNCQ#discussion_r359702555>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO3DGDJOKZSKG6A3IM3QZMHM5ANCNFSM4J4OIUOQ>
.
|
|
@csukuangfj comments resolved. I've left Read and Write methods out, since it is not necessary for feature extraction. |
| .def("Close", &SequentialTableReader<WaveHolder>::Close); | ||
|
|
||
| py::class_<RandomAccessTableReader<WaveHolder>>(m, "RandomAccessWaveReader") | ||
| .def(py::init<>()) |
There was a problem hiding this comment.
should the copy constructor also be wrapped?
|
These types of things should not have public copy constructors
…On Thu, Dec 19, 2019 at 3:37 PM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/pybind/feat/wave_reader_pybind.cc
<#87 (comment)>:
> +
+ py::class_<SequentialTableReader<WaveHolder>>(m, "SequentialWaveReader")
+ .def(py::init<>())
+ .def(py::init<const std::string&>(), py::arg("rspecifier"))
+ .def("Open", &SequentialTableReader<WaveHolder>::Open, py::arg("rspecifier"))
+ .def("Done", &SequentialTableReader<WaveHolder>::Done)
+ .def("Key", &SequentialTableReader<WaveHolder>::Key)
+ .def("FreeCurrent", &SequentialTableReader<WaveHolder>::FreeCurrent)
+ .def("Value", &SequentialTableReader<WaveHolder>::Value,
+ py::return_value_policy::reference)
+ .def("Next", &SequentialTableReader<WaveHolder>::Next)
+ .def("IsOpen", &SequentialTableReader<WaveHolder>::IsOpen)
+ .def("Close", &SequentialTableReader<WaveHolder>::Close);
+
+ py::class_<RandomAccessTableReader<WaveHolder>>(m, "RandomAccessWaveReader")
+ .def(py::init<>())
should the copy constructor also be wrapped?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLO5HSUYFBAV4D42LJPTQZMQDDA5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCPXU6FA#pullrequestreview-334450452>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4NV7XYHZOBDW63VELQZMQDDANCNFSM4J4OIUOQ>
.
|
|
+2 |
|
I added iterator and dict reader for wave in |
|
My instinct is that from a user/caller's point of view everything should
reflect the namespace things would be in in Kaldi.
But I don't have a good feel for these things, so ready to hear any
opinions. Also of course this may be separable from the actual modules
they were originally declared in.
Anyway you guys should decide this because I haven't looked at the code
carefully enough to have a very informed opinion.
…On Thu, Dec 19, 2019 at 6:40 PM Xingyu Na ***@***.***> wrote:
I added iterator and dict reader for wave in matrix_reader.py. But the
module name is confusing. It's better to rename it to kaldi.reader or
create seperate submodules, such as kaldi.matrix.reader and
kaldi.feat.reader
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLOZWYF52JTKQGLLUOCTQZNFRFA5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHJGD7I#issuecomment-567435773>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4JO42XUY7EMQSHUZ3QZNFRFANCNFSM4J4OIUOQ>
.
|
|
I'd like to hear @csukuangfj and @DongjiGao 's opinion since they created the basics. |
|
@DongjiGao is, like me, a relative newcomer to python binding. I think @dogancan's opinion might carry more weight if he can share it. But glad to hear from Dongji too. |
|
I think @dogancan, the creator of PyKaldi, is more experienced. There are lots of useful code we can borrow from PyKaldi. For example, for the matrix part, PyKalid uses For the I/O reader/writer part,
are great references. |
|
We can also follow the directory structure what PyKalid is using. |
|
Isn't PyKaldi's directory structure similar to hours, just located in a
different place?
Regarding class Matrix:
What I think is best is that we encourage people to mainly deal with
NumPy/PyTorch arrays from Python, and not expose too much of class Matrix
other than its buffer interface. If the memory management gets too
complicated I am open to modifying Kaldi's matrix library to use shared
pointers. (In the short term I hope that won't be necessary.)
…On Thu, Dec 19, 2019 at 7:21 PM Fangjun Kuang ***@***.***> wrote:
We can also follow the directory structure what PyKalid is using.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLO7PIAFXO5STCVBEIV3QZNKMLA5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHJJP2Q#issuecomment-567449578>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4KYUOVMJVDVVW4VHDQZNKMLANCNFSM4J4OIUOQ>
.
|
|
Yes, PyKaldi's directory structure follows Kaldi's source code structure. |
|
Oh OK. That looks like a reasonable approach. I suppose if Dogan did it
then at least it's tested.
…On Thu, Dec 19, 2019 at 7:31 PM Xingyu Na ***@***.***> wrote:
I just checked util
<https://github.com/pykaldi/pykaldi/blob/master/kaldi/util/table.py>
module @dogancan <https://github.com/dogancan> shared in the group. They
have all type of readers wrapped in a kaldi.util module. That seems to be
a good practice. Other folders, e.g. matrix and feat only focus on the type
holder.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#87?email_source=notifications&email_token=AAZFLO5R7QCEXIJN3BVP55TQZNLR5A5CNFSM4J4OIUO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHJKIUQ#issuecomment-567452754>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOYWKYFSF32AOSDFRX3QZNLR5ANCNFSM4J4OIUOQ>
.
|
|
@danpovey how about finish this PR about wave as is and open another one to restructure the submodules? |
|
OK, will do that. Thanks for helping to push this forward! |
Members of wave data will be added upon future request. Currently it works like