Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,7 @@ The following tables give you an overview of the tasks in MTEB.
| [SinhalaNewsClassification](https://huggingface.co/datasets/NLPC-UOM/Sinhala-News-Category-classification) (Nisansa de Silva, 2015) | ['sin'] | Classification | s2s | [News, Written] | {'train': 3327} | {'train': 148.04} |
| [SinhalaNewsSourceClassification](https://huggingface.co/datasets/NLPC-UOM/Sinhala-News-Source-classification) (Dhananjaya et al., 2022) | ['sin'] | Classification | s2s | [News, Written] | {'train': 24094} | {'train': 56.08} |
| [SiswatiNewsClassification](https://huggingface.co/datasets/dsfsi/za-isizulu-siswati-news) (Madodonga et al., 2023) | ['ssw'] | Classification | s2s | [News, Written] | {'train': 80} | {'train': 354.2} |
| [SlovakHateSpeechClassification](https://huggingface.co/datasets/TUKE-KEMT/hate_speech_slovak) | ['slk'] | Classification | s2s | [Social, Written] | {'test': 1319} | {'test': 92.71} |
Comment thread
KennethEnevoldsen marked this conversation as resolved.
Outdated
| [SlovakMovieReviewSentimentClassification](https://arxiv.org/pdf/2304.01922) ({ {S, 2023) | ['svk'] | Classification | s2s | [Reviews, Written] | {'test': 2048} | {'test': 366.17} |
| [SlovakSumRetrieval](https://huggingface.co/datasets/NaiveNeuron/slovaksum) | ['slk'] | Retrieval | s2s | [News, Social, Web, Written] | {'test': 600} | {'test': {'average_document_length': 2156.445, 'average_query_length': 143.59833333333333, 'num_documents': 600, 'num_queries': 600, 'average_relevant_docs_per_query': 1.0}} |
| [SouthAfricanLangClassification](https://www.kaggle.com/competitions/south-african-language-identification/) (ExploreAI Academy et al., 2022) | ['afr', 'eng', 'nbl', 'nso', 'sot', 'ssw', 'tsn', 'tso', 'ven', 'xho', 'zul'] | Classification | s2s | [Web, Non-fiction, Written] | {'test': 2048} | {'test': 247.49} |
Expand Down
33 changes: 33 additions & 0 deletions mteb/tasks/Classification/slk/SlovakHateSpeechClassification.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from __future__ import annotations

from mteb.abstasks.AbsTaskClassification import AbsTaskClassification
from mteb.abstasks.TaskMetadata import TaskMetadata


class SlovakHateSpeechClassification(AbsTaskClassification):
metadata = TaskMetadata(
name="SlovakHateSpeechClassification",
description="The dataset contains posts from a social network with human annotations for hateful or offensive language in Slovak.",
reference="https://huggingface.co/datasets/TUKE-KEMT/hate_speech_slovak",
dataset={
"path": "TUKE-KEMT/hate_speech_slovak",
"revision": "f9301b9937128c9c0b636fa6da203aeb046479f4",
},
type="Classification",
category="s2s",
modalities=["text"],
date=None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required

eval_splits=["test"],
eval_langs=["slk-Latn"],
main_score="accuracy",
domains=["Social", "Written"],
task_subtypes=["Sentiment/Hate speech"],
license="cc-by-sa-4.0",
annotations_creators="human-annotated",
dialect=None,
Comment thread
KennethEnevoldsen marked this conversation as resolved.
Outdated
sample_creation="found",
descriptive_stats={
"n_samples": {"test": 1319},
"avg_character_length": {"test": 92.71},
},
Comment thread
KennethEnevoldsen marked this conversation as resolved.
Comment thread
KennethEnevoldsen marked this conversation as resolved.
)