New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[질문] merge_tokenizer.py 에서 사용하는 llama2_kor.model 파일은 어디에서 찾을 수 있을까요? #20

Open

idjung96 opened this issue Oct 13, 2023 · 0 comments

idjung96 commented Oct 13, 2023

tokenizer 관련 코드를 찾던 중, KULLM tokenizer를 발견하고 기쁘게 공부하려고 합니다.

llama2 tokenizer에 /data/joon/kopora/lmdata/llama2_kor.model 을 추가해서 사용하는 것이죠?

sentencepiece model을 제가 만들어서 사용해야 하는지, 프로젝트 파일이 덜 공개된 것인지 궁금합니다.

merge_tokenizer.py의 마지막 4줄의 결과가 궁금하여, 질문 드립니다.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment