Pravda Network Data Collection

This repository contains source data collected from the Pravda Network, a collection of Russian news websites across multiple countries and languages. The data is automatically updated hourly, providing a comprehensive dataset for analysis.

Data Overview

Each CSV file in the data/ directory represents a sub-domain from the Pravda Network. The files contain metadata extracted from articles, including:

URL
Source title
Source URL
Canonical URL
OG:Title
OG:Description
Alternate language versions
Country
Publication date

Data Collection Methodology

The data is collected using an automated web scraper that:

Traverses all articles listed on each domain
Extracts metadata from article pages

Repository Structure

data/
├── abkhazia.news-pravda.com.csv.gz
├── albania.news-pravda.com.csv.gz
├── algeria.news-pravda.com.csv.gz
└── ...

Automating Data Collection

This repository is updated hourly via an automated script. The update process:

Checks each domain for new articles
Appends new data to the appropriate CSV files
Commits and pushes changes to this repository

File Format

All CSV files use the following header structure:

URL,Source Title,Source URL,Canonical,OG:Title,OG:Description,Alternates,Country,Publication Date

Example row:

https://domain.com/article/123.html,Original Source,https://source.com,https://domain.com/canonical,Title,Description,https://alt1.com(en);https://alt2.com(fr),Country,2024-03-15T14:30:00Z

License and Attribution

This dataset is provided for research and analysis purposes. When using this data, please cite:

CheckFirst. Pravda Network Data Collection. GitHub Repository. https://github.com/CheckFirstHQ/pravda-network

Updates

This repository is automatically updated hourly. The last update timestamp can be found at the top of this README.

Maintained by CheckFirst

Name		Name	Last commit message	Last commit date
Latest commit History 384 Commits
data		data
json		json
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pravda Network Data Collection

Data Overview

Data Collection Methodology

Repository Structure

Automating Data Collection

File Format

License and Attribution

Updates

About

Releases

Packages

CheckFirstHQ/pravda-network

Folders and files

Latest commit

History

Repository files navigation

Pravda Network Data Collection

Data Overview

Data Collection Methodology

Repository Structure

Automating Data Collection

File Format

License and Attribution

Updates

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages