Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a machine learning model to predict/extrapolate the scores #1475

Open
glerzing opened this issue Mar 28, 2023 · 0 comments
Open

Make a machine learning model to predict/extrapolate the scores #1475

glerzing opened this issue Mar 28, 2023 · 0 comments
Assignees
Labels
Backend Back-end code of Tournesol Research This should be left for researchers to tackle.

Comments

@glerzing
Copy link
Collaborator

glerzing commented Mar 28, 2023

It would be nice to be able to guess the Tournesol score of videos that have not been compared yet.
Or to extrapolate the "final" score, when there isn't a lot of comparisons yet (which is harder to do if we want to be careful about biases, and may benefit from the insights of the issue #1474).

We first need to decide which data to use for predictions. It is important to be careful about biases and to avoid being too superficial. So the main source of information should be the actual content of the video, so the captions. But we should also make use of the title, tags, topic category, description, and arguably the channel.

More controversial sources of information include the release date, the number of views, the number of likes, the number of subscribers of the channel, the number of comments or combinations of these (e.g. the ratio of likes per view, or the ratio of comments per view). For these, we may want to decide on a case-by-case basis.

For the model type, we will probably need to combine the results of different weak predictors. Sentence transformers could be fine-tuned to predict the score based on a chunk of the caption, and maybe provide some measure of uncertainty. And to combine the predictions for each chunk, we might use some type of weighted mean.

@glerzing glerzing self-assigned this Mar 28, 2023
@glerzing glerzing added Backend Back-end code of Tournesol Research This should be left for researchers to tackle. labels Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Back-end code of Tournesol Research This should be left for researchers to tackle.
Projects
None yet
Development

No branches or pull requests

1 participant