Skip to content

Commit 973009a

Browse files
authored
Add more documentations (#71)
1 parent 7c452ed commit 973009a

File tree

7 files changed

+102
-44
lines changed

7 files changed

+102
-44
lines changed

README.md

+27
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
# Introduction
2+
3+
The text_search project can be used to create ASR (automatic speech recognition) dataset with long-form audios and even longer texts.
4+
5+
The core of text_search is a general audio alignment pipeline, which aims to align the audio files to the corresponding text and split them into short segments, while also excluding segments of audio that do not correspond exactly with the aligned text.
6+
7+
18
# Installation
29

310
## With pip
@@ -36,3 +43,23 @@ python3 -c "import textsearch; print(textsearch.__file__)"
3643
We only set the environment variable `PYTHONPATH`.
3744

3845

46+
47+
# Recipes
48+
49+
- [libriheavy](examples/libriheavy)
50+
- [subtitle](examples/subtitle)
51+
52+
53+
# References
54+
More explainations are available in the following paper:
55+
56+
```
57+
@misc{kang2023libriheavy,
58+
title={Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context},
59+
author={Wei Kang and Xiaoyu Yang and Zengwei Yao and Fangjun Kuang and Yifan Yang and Liyong Guo and Long Lin and Daniel Povey},
60+
year={2023},
61+
eprint={2309.08105},
62+
archivePrefix={arXiv},
63+
primaryClass={eess.AS}
64+
}
65+
```

docs/source/getting-started/index.rst

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Getting started
2+
===============
3+
4+
About
5+
-----
6+
7+
The text_search project can be used to create ASR (automatic speech recognition) dataset with long-form audios and even longer texts.
8+
9+
The core of text_search is a general audio alignment pipeline, which aims to align the audio files to the corresponding text and split them into short segments, while also excluding segments of audio that do not correspond exactly with the aligned text.
10+
11+
Installation
12+
------------
13+
14+
With pip
15+
********
16+
17+
.. code-block:: bash
18+
19+
pip install fasttextsearch
20+
21+
22+
For developers
23+
**************
24+
25+
Please use the following commands to install `fasttextsearch`_:
26+
27+
.. code-block:: bash
28+
29+
pip install numpy
30+
31+
git clone https://github.com/k2-fsa/text_search
32+
cd text_search
33+
34+
mkdir build
35+
cd build
36+
cmake ..
37+
make -j
38+
make test
39+
40+
# set PYTHONPATH so that you can use "import textsearch"
41+
42+
export PYTHONPATH=$PWD/../textsearch/python:$PWD/lib:$PYTHONPATH
43+
44+
To test the you have installed `fasttextsearch`_ successfully, please run:
45+
46+
.. code-block:: bash
47+
48+
python3 -c "import textsearch; print(textsearch.__file__)"
49+
50+
It should print something like below:
51+
52+
.. code-block:: bash
53+
54+
/Users/fangjun/open-source/text_search/textsearch/python/textsearch/__init__.py
55+
56+
.. hint::
57+
We did not use either `python3 setup.py install` or `pip install`.
58+
We only set the environment variable `PYTHONPATH`.
59+

docs/source/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@ Welcome to fasttextsearch's documentation!
1010
:maxdepth: 2
1111
:caption: Contents:
1212

13-
./install/index.rst
13+
./getting-started/index.rst
1414
./tutorials/index.rst
1515
./python-api/index.rst

docs/source/install/developers.rst

-35
This file was deleted.

docs/source/install/index.rst

-7
This file was deleted.

docs/source/python-api/index.rst

+13-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Python API
33

44
This section lists Python APIs in `fasttextsearch`_.
55

6-
.. currentmodule:: textsearch
6+
.. currentmodule:: textsearch.python.textsearch
77

88

99
create_suffix_array
@@ -25,3 +25,15 @@ get_nice_alignments
2525
-------------------
2626

2727
.. autofunction:: get_nice_alignments
28+
29+
align_queries
30+
-------------------
31+
.. autofunction:: align_queries
32+
33+
get_longest_increasing_pairs
34+
-------------------
35+
.. autofunction:: get_longest_increasing_pairs
36+
37+
split_aligned_queries
38+
-------------------
39+
.. autofunction:: split_aligned_queries

docs/source/tutorials/index.rst

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
Tutorials
22
============
33

4+
This section provides tutorials for core concepts of text_search as follows.
5+
46
.. toctree::
57
:maxdepth: 2
68

0 commit comments

Comments
 (0)