web_similarity

A tool to calculate the similarity between the contents of two websites.

To run the project locally

cd into the project folder and execute the below command from the terminal

pip install -r requirements.txt

execute the following command to call the run.py from the project directory and paste the urls when prompted

python run.py

Please refer to the notebook Web_similarity.ipynb which can be exected directly on colab or click here

The out put will be displayed on the terminal and also will get written into a log file(app.log)

1.Web scrapping :beautifulsoup

2.Web content cleaning : Clustering with sentence embeddings

3.content similarity : A naive implementation of Sentence Mover's Distance with Sentence embeddings