Sinhala-Text-Simplification-Dataset-and-Evaluation

This repository contains the data and code for the paper "SiTSE: Sinhala Text Simplification Dataset and Evaluation".

Abstract of the Paper

Text Simplification is a task that has been minimally explored for low-resource languages. Consequently, there are only a few manually curated datasets. In this paper, we present a human curated sentence-level text simplification dataset for the Sinhala language. Our evaluation dataset contains 1,000 complex sentences and corresponding 3,000 simplified sentences produced by three different human annotators. We model the text simplification task as a zero-shot and zero resource sequence-to-sequence (seq-seq) task on the multilingual language models mT5 and mBART. We exploit auxiliary data from related seq-seq tasks and explore the possibility of using intermediate task transfer learning (ITTL). Our analysis shows that ITTL outperforms the previously proposed zero-resource methods for text simplification. Our findings also highlight the challenges in evaluating text simplification systems, and support the calls for improved metrics for measuring the quality of automated text simplification systems that would suit low-resource languages as well.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Data		Data
mBART		mBART
mT5		mT5
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sinhala-Text-Simplification-Dataset-and-Evaluation

Abstract of the Paper

About

Releases

Packages

Languages

brainsharks-fyp17/Sinhala-Text-Simplification-Dataset-and-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Sinhala-Text-Simplification-Dataset-and-Evaluation

Abstract of the Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages