Make a machine learning model to predict/extrapolate the scores #1475

glerzing · 2023-03-28T01:29:43Z

It would be nice to be able to guess the Tournesol score of videos that have not been compared yet.
Or to extrapolate the "final" score, when there isn't a lot of comparisons yet (which is harder to do if we want to be careful about biases, and may benefit from the insights of the issue #1474).

We first need to decide which data to use for predictions. It is important to be careful about biases and to avoid being too superficial. So the main source of information should be the actual content of the video, so the captions. But we should also make use of the title, tags, topic category, description, and arguably the channel.

More controversial sources of information include the release date, the number of views, the number of likes, the number of subscribers of the channel, the number of comments or combinations of these (e.g. the ratio of likes per view, or the ratio of comments per view). For these, we may want to decide on a case-by-case basis.

For the model type, we will probably need to combine the results of different weak predictors. Sentence transformers could be fine-tuned to predict the score based on a chunk of the caption, and maybe provide some measure of uncertainty. And to combine the predictions for each chunk, we might use some type of weighted mean.

glerzing self-assigned this Mar 28, 2023

glerzing added Backend Back-end code of Tournesol Research This should be left for researchers to tackle. labels Mar 28, 2023

glerzing mentioned this issue Mar 29, 2023

Topics classification #1468

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a machine learning model to predict/extrapolate the scores #1475

Make a machine learning model to predict/extrapolate the scores #1475

glerzing commented Mar 28, 2023 •

edited

Loading

Make a machine learning model to predict/extrapolate the scores #1475

Make a machine learning model to predict/extrapolate the scores #1475

Comments

glerzing commented Mar 28, 2023 • edited Loading

glerzing commented Mar 28, 2023 •

edited

Loading