- python (3.0)
- Python packages:
- matplotlib
- scikit-learn
- pandas
- nltk
- request
- json
- numpy
$ make
$ python3 main.py
- Make sure "data" folder have "nato" and "world-cup" folder and the associated data files
- Make sure you have "bin" and "src" folder
- Go to "src" folder and run following commands: -make clean -make -make install
- This code implements following paper: Paudel R, Kandel P, Eberle W. Detecting Spam Tweets in Trending Topics using Graph-Based Approach. (2019 March)
- The implementation for Boididou et al. is done using the source code available in their github repository
Description
In this work, we implement an unsupervised, two-step, graph-based approach
to detect anomalous tweets on trending topics. First, we extract named entities
(like place, person, organization, product, event, or activity) present in the tweet
and add them as key elements in the graph. As tweets on a certain topic share
the contextual similarity, we believe they also share same/similar named entities.
These named entities representing similar topics can have a relationship
(e.g., shared ontology) amongst themselves, which we believe if represented
properly, will provide broader insight on the overall context of the topic.
Using a well-known graph-based tool like GBAD, we
then discover the normal and anomalous behavior of a trending topic.
Second, we propose adding hyperlinked document information because anomalies that
could not be detected from tweets alone could be detected using both the document
and tweets. It is our assumption that a better understanding of patterns
and anomalies associated with entities like person, place, or activity, cannot be
realized through a single information source, but better insight can be realized
using multiple information sources simultaneously. For instance, one can discover
interesting patterns of behavior about an individual through a single social media
account, but better insight into their overall behavior can be realized by
examining all of their social media actions simultaneously. Analyzing multiple
information sources for anomaly detection on Twitter has been explored in the
past.
This code generate graphs from Tweets and News, run graph-based anomaly detection tool on the generated
graph for spam detection. Also, it implement following three baseline approaches used in the paper for
performance comparision.:
- Benevenuto et al.
- Chen et al. and
- Anantharam et al.
The result of the experiment is shown in following table. Our graph-based approach is superior to all 4 baseline approaches in terms of recall and f1-score.

If you have further inquiry please email at [email protected]