Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Output: Random Token Generation After Stage3 Fine-Tuning #48

Open
venkat01010 opened this issue Sep 16, 2024 · 0 comments
Open

Comments

@venkat01010
Copy link

Hi, thank you for sharing the code and pre-trained model.

I am currently fine-tuning the ASR model using an internal dataset of 780 hours. For this, I am employing Stage 3: Chain-of-modality Instruction Finetuning in my experiments. After running for 50,000 steps, I tested the model to check its performance. However, I am encountering an issue where the model generates random tokens that are unrelated to the audio input.

We have prepared the data for fine-tuning in the following format:

{"prefix": "You are an AI assistant whose name is SpeechGPT.\n- SpeechGPT is a intrinsic cross-modal conversational language model that is developed by Fudan University.  SpeechGPT can understand and communicate fluently with human through speech or text chosen by the user.\n- It can perceive cross-modal inputs and generate cross-modal outputs.\n", "plain_text": "[Human]: Can you transcribe the speech into a written format?. This is input : <sosp><739><317><453><104><800><894><693><52><424><267><177><44><235><553><360><821><739><127><245><523><818><718><766><544><766><370><553><649><739><6><650><649><739><840><145><614><739><954><739><566><739><285><739><566><739><754><739><498><872><359><87><164><91><544><407><111><621><128><665><991><162><62><246><894><317><640><828><768><700><362><800><319><542><445><800><678><491><800><104><650><747><931><800><566><223><177><544><710><389><423><800><104><108><896><931><877><800><470><821><104><108><404><542><313><607><269><908><246><828><691><794><639><647><640><174><52><424><267><235><892><73><338><877><622><608><800><259><453><521><382><691><821><167><104><778><404><542><75><788><269><908><640><993><177><839><225><823><800><179><931><428><800><754><498><324><338><359><104><693><312><325><281><62><209><246><267><714><609><823><27><655><650><179><498><931><359><462><104><693><382><691><794><75><874><167><104><693><521><382><763><876><233><407><499><407><334><548><621><128><99><991><823><800><640><124><127><640><768><700><362><640><542><794><788><640><660><351><788><435><908><640><816><325><748><872><336><359><87><296><714><446><544><21><293><804><359><579><764><296><839><804><254><504><27><579><104><650><896><498><338><359><800><907><514><70><542><272><313><607><748><931><877><800><85><204><280><668><830><70><768><700><362><432><768><586><884><793><794><788><614><640><92><804><293><eosp><eoh> [SpeechGPT]: you oh oh welcome back hey listen i have my i have my health care through healthcare.  what is this i keep hearing about add back to my social security"}

When I tested the model using one of the training samples, it generated a random sequence of text.

Could you please assist me in identifying the issue? I'm wondering if the data format is incorrect or if something is missing. Any suggestions would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant