You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running inference on a variety of videos with the luoruipu1/Valley2-7b model and the resulting captions are always very long and contain lots of repetitive text. For the two videos given in serve/examples I'm getting the following results:
First, we see a snowmobile driving through a snowy forest with trees in the background. The snowmobile is moving quickly and smoothly through the snow. Next, we see a person riding the snowmobile, enjoying the thrill of the ride. The snowmobile is equipped with tracks in the snow, indicating its path. Then, we see the snowmobile driving through a snowy field with trees in the background. The snowmobile is moving quickly and smoothly through the snow, leaving tracks behind. Finally, we see the snowmobile driving through a snowy field with trees in the background. The snowmobile is moving quickly and smoothly through the snow, leaving tracks behind. Throughout the video, we see the beauty of the snowy landscape and the excitement of the snowmobile ride. The video captures the essence of winter sports and the joy of exploring the snowy wilderness.
First, we see a black and white cat sitting on a toilet in a bathroom. The cat appears to be looking around and observing its surroundings. Next, we see the same cat sitting on the toilet, but this time it seems to be more focused on the toilet itself. The cat is still sitting on the toilet in the following shot, but it appears to be looking down at the floor. Then, we see the cat sitting on the toilet again, but this time it seems to be looking up at the ceiling. In the next shot, the cat is still sitting on the toilet, but it appears to be looking at the camera. Finally, we see the cat sitting on the toilet once more, but this time it seems to be looking down at the floor again. Throughout the video, the cat remains calm and composed, and it does not appear to be startled or disturbed by the presence of the camera.
Is the result supposed to be like this? I was hoping more for a concise caption that explains what is happening in the video in 2-3 sentences. I tried changing the text prompt but it doesn't seem to make a difference to the result.
The text was updated successfully, but these errors were encountered:
Ali2500
changed the title
Caption quality is poor
Captions are very long and verbose
Apr 17, 2024
Hi,
I'm running inference on a variety of videos with the
luoruipu1/Valley2-7b
model and the resulting captions are always very long and contain lots of repetitive text. For the two videos given inserve/examples
I'm getting the following results:Is the result supposed to be like this? I was hoping more for a concise caption that explains what is happening in the video in 2-3 sentences. I tried changing the text prompt but it doesn't seem to make a difference to the result.
The text was updated successfully, but these errors were encountered: