Kosovo-Parliament-Transcriptions

NOTE: The dataset is maintained exclusively on HuggingFace Datasets. The repository is only used for documentation purposes.

The dataset comprises transcripts of speeches delivered by members of the Kosovo Assembly during parliamentary sessions spanning from 2001. The goal of this repository is to provide a valuable resource for researchers and professionals interested in natural language processing, or political discourse analysis.

Data source

The dataset was compiled from publicly available transcripts published on the current and old official website of the Kosovo Assembly (https://kuvendikosoves.org/).

Data Preperation

The dataset was compiled by downloading PDF files and converting them to a text format using OCR. The resulting text was then cleaned to fix punctuation and spelling errors. It's important to note that due to the complexity of the PDF-to-text conversion process, the dataset may still contain typos and other errors. As a result, the dataset is provided "as is". Additionally, it should be noted that the dataset includes speeches given in languages other than Albanian.

To do

Conduct additional quality assurance checks to identify and correct any remaining errors in the dataset.
Add a column for the language of the speech.
Add a column for the party of the speaker.

Dataset structure

The dataset contains the following fields: text, speaker, date, id, num_tokens.

Usage

from datasets import load_dataset

dataset = load_dataset('Kushtrim/Kosovo-Parliament-Transcriptions')

License

The dataset is licensed under the MIT License.

Citation

If you use this dataset in your research, please consider citing this repository.

@misc{Kosovo-Parliament-Transcriptions,
  author = {Kushtrim Visoka},
  title = {Kosovo-Parliament-Transcriptions},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/Kushtrimvisoka/Kosovo-Parliament-Transcriptions}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
data		data
Kosovo_Parliament_Transcriptions.ipynb		Kosovo_Parliament_Transcriptions.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kosovo-Parliament-Transcriptions

Data source

Data Preperation

To do

Dataset structure

Usage

License

Citation

About

Releases

Packages

Languages

License

KushtrimVisoka/Kosovo-Parliament-Transcriptions

Folders and files

Latest commit

History

Repository files navigation

Kosovo-Parliament-Transcriptions

Data source

Data Preperation

To do

Dataset structure

Usage

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages