Requirements

python (3.0)
Python packages:
- matplotlib
- scikit-learn
- pandas
- nltk
- request
- json
- numpy

Usage

$ make

if Make is not installed

$ python3 main.py

Notes

Make sure "data" folder have "nato" and "world-cup" folder and the associated data files
Make sure you have "bin" and "src" folder
Go to "src" folder and run following commands: -make clean -make -make install
This code implements following paper: Paudel R, Kandel P, Eberle W. Detecting Spam Tweets in Trending Topics using Graph-Based Approach. (2019 March)
The implementation for Boididou et al. is done using the source code available in their github repository

Description In this work, we implement an unsupervised, two-step, graph-based approach to detect anomalous tweets on trending topics. First, we extract named entities (like place, person, organization, product, event, or activity) present in the tweet and add them as key elements in the graph. As tweets on a certain topic share the contextual similarity, we believe they also share same/similar named entities. These named entities representing similar topics can have a relationship (e.g., shared ontology) amongst themselves, which we believe if represented properly, will provide broader insight on the overall context of the topic. Using a well-known graph-based tool like GBAD, we then discover the normal and anomalous behavior of a trending topic.
Second, we propose adding hyperlinked document information because anomalies that could not be detected from tweets alone could be detected using both the document and tweets. It is our assumption that a better understanding of patterns and anomalies associated with entities like person, place, or activity, cannot be realized through a single information source, but better insight can be realized using multiple information sources simultaneously. For instance, one can discover interesting patterns of behavior about an individual through a single social media account, but better insight into their overall behavior can be realized by examining all of their social media actions simultaneously. Analyzing multiple information sources for anomaly detection on Twitter has been explored in the past.
This code generate graphs from Tweets and News, run graph-based anomaly detection tool on the generated graph for spam detection. Also, it implement following three baseline approaches used in the paper for performance comparision.:

Benevenuto et al.
Chen et al. and
Anantharam et al.

The result of the experiment is shown in following table. Our graph-based approach is superior to all 4 baseline approaches in terms of recall and f1-score.

If you have further inquiry please email at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ResultsBck		ResultsBck
bin		bin
data-collection		data-collection
data		data
figures		figures
gbad		gbad
.gitignore		.gitignore
Makefile		Makefile
anantharam.py		anantharam.py
benevenuto.py		benevenuto.py
blacklist.txt		blacklist.txt
chen.py		chen.py
classifier.py		classifier.py
dataset.py		dataset.py
graph_based_anomaly.py		graph_based_anomaly.py
graph_parser.py		graph_parser.py
main.py		main.py
readme.md		readme.md
similarity.py		similarity.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Requirements

Usage

if Make is not installed

Notes

About

Uh oh!

Releases

Packages

Languages

PriyanshuXcoder/TwitterTrendSpam

Folders and files

Latest commit

History

Repository files navigation

Requirements

Usage

if Make is not installed

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages