[NLP] Consider adding distinction in research filter for automatically classified posts vs prediction based classification #76

ronentk · 2024-05-08T12:06:38Z

For example, something like this

class SciFilterClassfication(Enum):
    NOT_CLASSIFIED = "not_classified"
    """ For posts automatically classified as research
    (for example based on citoid item types)"""
    RESEARCH_AUTO = "research_auto"
    """ For posts predicted to be related to research"""
    RESEARCH_PRED = "research_pred"
    """ For posts predicted to be unrelated to research"""
    NOT_RESEARCH = "not_research"

From the current form:

class SciFilterClassfication(Enum):
    NOT_CLASSIFIED = "not_classified"
    RESEARCH = "research"
    NOT_RESEARCH = "not_research"

The rationale is
1- it can help with the filter evaluation - differentiating between easy (auto) and hard cases (pred)
2 - we might want to use the information in the app to further organize the queue/UX

What do you think @ShaRefOh ?

ShaRefOh · 2024-05-08T16:39:49Z

We can present the data in a meaningful way, but not to evaluate it as a multi-label problem, as the True Labels are by def binary. What are the conditions for getting "research_auto"? I already have the types logged in the outcome dataset, I can simply use it to run an evaluation that includes aggregation of that data

ronentk · 2024-05-08T16:59:20Z

item_types_whitelist = [
    "bookSection",
    "journalArticle",
    "preprint",
    "book",
    "manuscript",
    "thesis",
    "presentation",
    "conferencePaper",
    "report",
]


# if any item types on the whitelist, pass automatically
    if len(set(result.item_types).intersection(set(item_types_whitelist))) > 0:
        return SciFilterClassfication.RESEARCH

(https://github.com/Common-SenseMakers/sensemakers/blob/nlp-dev/nlp/desci_sense/shared_functions/filters/research_filter.py)

ronentk · 2024-05-08T17:00:31Z

@ShaRefOh this condition holds for your annotations as well, right?

if len(set(result.item_types).intersection(set(item_types_whitelist))) > 0:
        return SciFilterClassfication.RESEARCH

ronentk added the enhancement New feature or request label May 8, 2024

ronentk self-assigned this May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NLP] Consider adding distinction in research filter for automatically classified posts vs prediction based classification #76

[NLP] Consider adding distinction in research filter for automatically classified posts vs prediction based classification #76

ronentk commented May 8, 2024

ShaRefOh commented May 8, 2024

ronentk commented May 8, 2024

ronentk commented May 8, 2024

[NLP] Consider adding distinction in research filter for automatically classified posts vs prediction based classification #76

[NLP] Consider adding distinction in research filter for automatically classified posts vs prediction based classification #76

Comments

ronentk commented May 8, 2024

ShaRefOh commented May 8, 2024

ronentk commented May 8, 2024

ronentk commented May 8, 2024