Skip to content
Miguel edited this page Feb 17, 2021 · 20 revisions

SPYSCRAP

Welcome to the SpyScrap wiki!

Logo

Introduction

SpyScrap is an OSINT tool. The main purpose is to collect information from different sources like Google, Tinder, Twitter and others. It combines facial recognition methods to filter the results and uses natural language processing for obtaining important entities from the website the user appears. The main strength is the use of machine learning algorithms for facial identification combined with scraping techniques across the Internet which helps to provide more accurate results. As a exceptional feature, the tool is able to calculate a final score which indicates the amount of public exposition an user has on the Internet. It has two different modules that can work independently: CLI and Web Interface. Both modules are built using docker and are easy to deploy.

How it works

It uses Selenium for scraping in different sites such as Google Yandex. It takes advantage of different APIs that are open to collect information. It downloads images and data and uses machine learning to identify the target person and filter the content using facial recognition. It uses Spacy for Natural Language processing. With all the obtained information is posible to create a profile of the target user.

how it works

SpyScrap is divided in two modules that can be used separately:

  1. CLI: The main purpose is to collect information about a specific user. You can provide several information to the tool. The only parameter that is compulsory is the name, name and surname or nickname of the person to find. You can also provide an image, in this case the tool with use ML to filter de data according to the biometric information.

  2. Web Interface: The aim of this component is to easy the use of the tool through a GUI. At the backend it uses the core of the CLI. This module can search in several sources at the same time and provides a total score indicating the exposition of the target's exposure on the Internet.

Architecture

CLI

CLI

The CLI is dockerized to easy the deployment. Usage and installation can be found in the CLI page in the Wiki.

Web Interface

GUI

The GUI is dockerized to easy the deployment. Usage and installation can be found in the Web Interface page in the Wiki.

Information Sources

Reverse Image Search

SpyScrap can search in different engines:

  • Yandex:
    • Russian Search Engine which provides facial recognition. As inputs the user must provide an image or an image url.
  • Google:
    • The user must provide a name or nickname. Optional but recommended an image can be provided.

Social Networks

  • Facebook
  • Instagram
  • Tinder
  • Twitter

Social Networks

Governmental Site

BOE is a governmental site in Spain, where official information is published.

BOE