Skip to content

add pybind for fst::StdVectorFst#88

Merged
danpovey merged 4 commits intodanpovey:pybind11from
mobvoi:fangjun-pybind11
Dec 20, 2019
Merged

add pybind for fst::StdVectorFst#88
danpovey merged 4 commits intodanpovey:pybind11from
mobvoi:fangjun-pybind11

Conversation

@csukuangfj
Copy link

At Mobvoi AI Lab in Beijing, we are trying to wrap Kaldi with Pybind11
so that we can reuse the components from Kaldi instead of reinventing the wheel.

We have a pipeline for LF-MMI AM training,
where the neural network part is from PyTorch and the loss function
is from Kaldi wrapped by Pybind11. The Fsts required by the loss function
are currently read in C++ from an rxfilename passed from the Python side.
The Fsts can be passed from Python to C++ directly once we have wrapped
FST to Python and we will open-source the updated loss function in the future.

There are existing python bindings for OpenFst, such as Pykaldi (using Clif) and
the official python binding (using Cython). So why do we need another one?
Since we use Pybind11 for Kaldi, it is very difficult, if not impossible, to pass a Python
object not wrapped by Pybind11 to the C++ part. Therefore, we need another python
binding for OpenFST.

As Kaldi is also going to be wrapped by Pybind11, we, at Mobvoi, would like to
open source what we have done and hope it would be beneficial to the community.

The python binding for OpenFst has not yet been finished and is still in progress.
We hope this pullrequrest would be a good starting point.

@danpovey
Copy link
Owner

This is amazing!
I don't anticipate that there will be any issues with this- I wouldn't be able to spot them anyway, TBH. Thinking whether I should merge immediately.

@danpovey
Copy link
Owner

Before I merge: we need to figure out what we'll do to wrap lattices. Those will be needed at some point. Would it make sense to template-ize some of the wrapping code? LMK what you think.

@csukuangfj
Copy link
Author

I agree.

Templates would be helpful when we wrap LatticeArc.

@@ -0,0 +1,111 @@
#!/usr/bin/env python
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small detail -- wouldn't be better to have this as

#!/usr/bin/env python3

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with you, but I am not sure whether kaldi pybind supports only Python3.

Python 2.7 will not be maintained past 2020.

Is this a good time to switch to Python3 and to support only Python3 in the future for kaldi pybind?

@jtrmal
Copy link

jtrmal commented Dec 19, 2019 via email

@csukuangfj
Copy link
Author

thanks. I will change all python scripts to python3.

@csukuangfj
Copy link
Author

an initial version for SequentialNnetChainExampleReader, please see
https://github.com/mobvoi/kaldi/blob/fangjun-pybind11/src/pybind/nnet3/nnet_chain_example_pybind_test.py

for usage.

@csukuangfj
Copy link
Author

I'll fix the confilicts soon.

We are using `__iter__` method in Python, which calls Next()
immediately when Value() is returned. This problem does not
exist in C++. We use copy semantics currently to solve this problem.
@csukuangfj
Copy link
Author

We can always refactor the current OpenFst Pybind11 code to use templates.

I guess the current priority is to make LF-MMI working as soon as possible in kaldi pybind
and potentially with PyTorch. As LF-MMI requires only fst::StdVectorFst, we wrapped it
and its dependecies with non-template code, since this approach is much faster.

When the training pipeline is finished, we can consider to convert OpenFst wrapper
related code to templates to wrap Lattices so that we can also perform decoding in Python.

@danpovey
Copy link
Owner

danpovey commented Dec 20, 2019 via email

"non-const to enable things like shallow swap on the held object in "
"situations where this would avoid making a redundant copy.",
py::return_value_policy::reference)
py::return_value_policy::copy)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little concerned about the waste here.
I think it would be more efficient to have the iterator call Next() before providing the value, but have a boolean member to see whether this is the first time next() was called, and if so, skip calling Next().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll change it.

@danpovey
Copy link
Owner

Mergng this to keep things moving, you can fix the iterator thing later.

@danpovey danpovey merged commit c66bd34 into danpovey:pybind11 Dec 20, 2019
@csukuangfj csukuangfj deleted the fangjun-pybind11 branch December 20, 2019 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants