Building a news graph and services around it

Curator: Andrey Babynin (@maninoffice)

Team members: Maxim Dumenkov (@maxodum), Sergey Krivosheev (@FlameInBrain)

Project description: Construction of NLP models to analyze the impact of news on financial instruments and construction of graph for analysis of financial entities interconnections

Project structure:

Research: repository with the .ipynb files containing the processes of data collection, EDA and modeling experiments
Pipeline: repository containing automated process of new data collection, data preprocessing, model training and inferencing
Services: repository containing API for interacting with final models for getting predictions, Telegram Bot as users UI and Streamlit as interactive dashboard of EDA of collected data

Project functionality:

API
- /model/predict_by_link - get prediction of impact of news on financial instruments by news' link
- /model/predict_by_links_batch - get prediction of impact of news on financial instruments by batch of news' links (.csv file with first row skipped and all the links placed row by row in first column must be passed)
- /model/predict_by_text - get prediction of impact of news on financial instruments by news' text
- /model/predict_by_texts_batch - get prediction of impact of news on financial instruments by batch of news' texts (.csv file with first row skipped and all the texts placed row by row in first column must be passed)
Telegram Bot
- Get information about the service and the project
- Disclaimer (some minor details that must be kept in mind)
- Get prediction by link or text of the news
- Rate the app
- Get statistics of the app' rating and the users' comments
- Get information about the ticker from MOEX top-100
- Get the graph of financial entities
Streamlit
- Interactive dashboard of EDA of collected data

Docker-compose content description:

Portainer: Container manager
Bot: Telegram Bot
- Dockerfile: setting up the environment and starting the app
API: FastAPI
- Dockerfile: setting up the environment and giving executable status to scripts required for the API start using gunicorn and start of the celery and the flower
Nginx: Proxy for API
Redis: Caching and brokerage for celery
Postgres: Database for users' ratings and reviews
Celery & Flower: Asynchronous queue of tasks and the UI for this
Selenium: Parsing of the texts of the news on the news websites
Prometheus & Grafana: Services monitoring
Filebeat & Elasticsearch & Kibana: Services logging

Project structure:

Project assembling instruction:

Clone the repo on your machine

git clone https://github.com/FlameInBrainSerj/project_news_graph.git

In folders Services/api and Services/bot create .env files from .env.example with the following configuration

# bot/.env

# Needs to be created using BotFather
BOT_TOKEN=your_bot_token

DB_HOST=db
DB_PORT=5432
POSTGRES_DB=postgres
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres

API_HOST=api
API_PORT=8000

# api/.env

REDIS_HOST=redis
REDIS_PORT=6379

SELENIUM_HOST=selenium

Get all the artifacts (models and tokenizers folders) from the storage and place them in Services/api/artifacts folder
Start the docker containers

docker compose up

Remove the docker containers

docker compose down

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.dvc		.dvc
.github/workflows		.github/workflows
Pipeline		Pipeline
Research		Research
Services		Services
etc		etc
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a news graph and services around it

About

Releases

Packages

Contributors 2

Languages

License

FlameInBrainSerj/project_news_graph

Folders and files

Latest commit

History

Repository files navigation

Building a news graph and services around it

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages