Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange prediction #37

Open
gaoCleo opened this issue Dec 4, 2024 · 0 comments
Open

Strange prediction #37

gaoCleo opened this issue Dec 4, 2024 · 0 comments

Comments

@gaoCleo
Copy link

gaoCleo commented Dec 4, 2024

Hello, thank you for sharing your code. I encountered some issues while running the inference code. The video I tested with is 'v_0A6fEUxdDMk.mp4' from the test set of the VATEX dataset, which is a video of a chef making sushi.

Firstly, I downloaded the pre-trained parameters from luoruipu1/Valley2-7b and ran the run_valley_llamma_v2.py file. The user prompt I used was <video> Describe the video concisely, and the answer I got was "['10. Can you describe the scene in the video']";

Then, I used these parameters to run the run_valley.py file, and the answer I received was "['10.']";

I'm not sure why this is happening. I haven't modified any code. Could it be that I used the wrong parameters or the wrong prompt format?

Subsequently, I re-downloaded the parameters from Zhaoziwang/chinese_valley7b_v1 and attempted to run the run_valley.py code. When I used the user prompt "请描述这个视频\n<video>", the returned result was an empty string. When I modified the prompt to "<video>请描述这个视频\n", the result was repeated garbage characters.

How can I correctly run the valley model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant