Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cli] support context biasing with ac automaton #2128

Merged
merged 2 commits into from
Nov 7, 2023
Merged

Conversation

cdliang11
Copy link
Collaborator

@cdliang11 cdliang11 commented Nov 7, 2023

热词列表:

荣誉伟
北京
人民大会堂
嗨小奇
base) ➜  wenet git:(chengdong-context) ✗ wenet --language chinese ../1000003_0f90da0d.wav --context_path ../context_path.txt --context_score 3.0         
{'text': '日本内阁官房长官荣誉伟表示', 'confidence': 0.5924476210863998}
(base) ➜  wenet git:(chengdong-context) ✗ wenet --language chinese ../1000003_0f90da0d.wav                                                       
{'text': '日本内阁官房长官荣誉为表示', 'confidence': 0.8130293280382528}

@pkufool, 这里我们搬运了你的代码。

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from wenet.dataset.processor import __tokenize_by_bpe_model
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torchaudio.utils.sox_utils.set_buffer_size(16500)

procesor.py 这一行代码和运行环境关系较大,不保证都能运行成功,是否考虑将__tokenize_by_bpe_model函数换个位置

@robin1001
Copy link
Collaborator

We should aslo add context support in recognize.py, let's do it in future PR.

@robin1001
Copy link
Collaborator

用 yapf 做过格式化了吗?

@robin1001
Copy link
Collaborator

@kaixunhuang0 The new AC automate will override the greedy search before.

@robin1001 robin1001 self-assigned this Nov 7, 2023
@robin1001 robin1001 merged commit 5faf24b into main Nov 7, 2023
6 checks passed
@robin1001 robin1001 deleted the chengdong-context branch November 7, 2023 13:10
@Mddct
Copy link
Collaborator

Mddct commented Nov 7, 2023

paraformer 支持热词容易吗, 只考虑greedy search 就行 ,字数对齐到text了 没有prefix , 会简单些, 有时间支持下吗@cdliang11

@cdliang11
Copy link
Collaborator Author

用 yapf 做过格式化了吗?

格式化了

@cdliang11
Copy link
Collaborator Author

paraformer 支持热词容易吗, 只考虑greedy search 就行 ,字数对齐到text了 没有prefix , 会简单些, 有时间支持下吗@cdliang11

好的,我瞧瞧

@robin1001
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants