Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Captions are very long and verbose #34

Open
Ali2500 opened this issue Apr 17, 2024 · 0 comments
Open

Captions are very long and verbose #34

Ali2500 opened this issue Apr 17, 2024 · 0 comments

Comments

@Ali2500
Copy link

Ali2500 commented Apr 17, 2024

Hi,

I'm running inference on a variety of videos with the luoruipu1/Valley2-7b model and the resulting captions are always very long and contain lots of repetitive text. For the two videos given in serve/examples I'm getting the following results:

First, we see a snowmobile driving through a snowy forest with trees in the background. The snowmobile is moving quickly and smoothly through the snow. Next, we see a person riding the snowmobile, enjoying the thrill of the ride. The snowmobile is equipped with tracks in the snow, indicating its path. Then, we see the snowmobile driving through a snowy field with trees in the background. The snowmobile is moving quickly and smoothly through the snow, leaving tracks behind. Finally, we see the snowmobile driving through a snowy field with trees in the background. The snowmobile is moving quickly and smoothly through the snow, leaving tracks behind. Throughout the video, we see the beauty of the snowy landscape and the excitement of the snowmobile ride. The video captures the essence of winter sports and the joy of exploring the snowy wilderness.

First, we see a black and white cat sitting on a toilet in a bathroom. The cat appears to be looking around and observing its surroundings. Next, we see the same cat sitting on the toilet, but this time it seems to be more focused on the toilet itself. The cat is still sitting on the toilet in the following shot, but it appears to be looking down at the floor. Then, we see the cat sitting on the toilet again, but this time it seems to be looking up at the ceiling. In the next shot, the cat is still sitting on the toilet, but it appears to be looking at the camera. Finally, we see the cat sitting on the toilet once more, but this time it seems to be looking down at the floor again. Throughout the video, the cat remains calm and composed, and it does not appear to be startled or disturbed by the presence of the camera.

Is the result supposed to be like this? I was hoping more for a concise caption that explains what is happening in the video in 2-3 sentences. I tried changing the text prompt but it doesn't seem to make a difference to the result.

@Ali2500 Ali2500 changed the title Caption quality is poor Captions are very long and verbose Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant