Showcase visualizations and code base about the common Japanese morphemes that appear in news.
Morphemes are the smallest units of meaning in a language.
Data was collected from 'https://www3.nhk.or.jp'
Data collecting period: 25 May 2024 - 4 July 2024
Common Japanese Morphemes in News Latest Update: 30 July 2024
Common Japanese Morphemes in News:
Located in data folder
Contain Japanese morphemes data collected from the NHK News website.
Total morphemes collected: 1,015,285
Contain urls which link to the news that the morphemes were collected from.
Total Url collected: 896
Urls in this file should follow https://www3.nhk.or.jp if you want to see the source.
For example: https://www3.nhk.or.jp/news/html/20240523/k10014458551000.html
- Clone this repo: https://github.com/sakan811/Find-Common-Japanese-Character-From-News.git
- Go to main.py
- Adjust the SQLite database name as needed
sqlite_db = 'japan_news_test.db' # adjust as needed
- Run the script:
python main.py
Scrape data from NHK News daily, automated with GitHub Action.