You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for sharing the code and pre-trained model.
I am currently fine-tuning the ASR model using an internal dataset of 780 hours. For this, I am employing Stage 3: Chain-of-modality Instruction Finetuning in my experiments. After running for 50,000 steps, I tested the model to check its performance. However, I am encountering an issue where the model generates random tokens that are unrelated to the audio input.
We have prepared the data for fine-tuning in the following format:
{"prefix": "You are an AI assistant whose name is SpeechGPT.\n- SpeechGPT is a intrinsic cross-modal conversational language model that is developed by Fudan University. SpeechGPT can understand and communicate fluently with human through speech or text chosen by the user.\n- It can perceive cross-modal inputs and generate cross-modal outputs.\n", "plain_text": "[Human]: Can you transcribe the speech into a written format?. This is input : <sosp><739><317><453><104><800><894><693><52><424><267><177><44><235><553><360><821><739><127><245><523><818><718><766><544><766><370><553><649><739><6><650><649><739><840><145><614><739><954><739><566><739><285><739><566><739><754><739><498><872><359><87><164><91><544><407><111><621><128><665><991><162><62><246><894><317><640><828><768><700><362><800><319><542><445><800><678><491><800><104><650><747><931><800><566><223><177><544><710><389><423><800><104><108><896><931><877><800><470><821><104><108><404><542><313><607><269><908><246><828><691><794><639><647><640><174><52><424><267><235><892><73><338><877><622><608><800><259><453><521><382><691><821><167><104><778><404><542><75><788><269><908><640><993><177><839><225><823><800><179><931><428><800><754><498><324><338><359><104><693><312><325><281><62><209><246><267><714><609><823><27><655><650><179><498><931><359><462><104><693><382><691><794><75><874><167><104><693><521><382><763><876><233><407><499><407><334><548><621><128><99><991><823><800><640><124><127><640><768><700><362><640><542><794><788><640><660><351><788><435><908><640><816><325><748><872><336><359><87><296><714><446><544><21><293><804><359><579><764><296><839><804><254><504><27><579><104><650><896><498><338><359><800><907><514><70><542><272><313><607><748><931><877><800><85><204><280><668><830><70><768><700><362><432><768><586><884><793><794><788><614><640><92><804><293><eosp><eoh> [SpeechGPT]: you oh oh welcome back hey listen i have my i have my health care through healthcare. what is this i keep hearing about add back to my social security"}
When I tested the model using one of the training samples, it generated a random sequence of text.
Could you please assist me in identifying the issue? I'm wondering if the data format is incorrect or if something is missing. Any suggestions would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
Hi, thank you for sharing the code and pre-trained model.
I am currently fine-tuning the ASR model using an internal dataset of 780 hours. For this, I am employing Stage 3: Chain-of-modality Instruction Finetuning in my experiments. After running for 50,000 steps, I tested the model to check its performance. However, I am encountering an issue where the model generates random tokens that are unrelated to the audio input.
We have prepared the data for fine-tuning in the following format:
When I tested the model using one of the training samples, it generated a random sequence of text.
Could you please assist me in identifying the issue? I'm wondering if the data format is incorrect or if something is missing. Any suggestions would be greatly appreciated.
The text was updated successfully, but these errors were encountered: