|
1 |
| -# SMAG - mvp |
2 |
| -> social media graph abusal |
| 1 | +# Social Record |
| 2 | + |
| 3 | +> Distributed scraping and analysis pipeline for a range of social media platforms |
| 4 | +
|
| 5 | +**Table of content** |
3 | 6 |
|
4 | 7 | - [About](#about)
|
5 | 8 | - [Architectural overview](#architectural-overview)
|
6 |
| - - [Api](#api) |
7 |
| - - [Postgres DB](#postgres-db) |
8 |
| -- [Requirements](#requirements) |
| 9 | +- [Further reading](#further-reading) |
| 10 | + - [Detailed documentation](#detailed-documentation) |
| 11 | + - [Wanna contribute?](#wanna-contribute) |
| 12 | + - [List of contributors](#list-of-contributors) |
| 13 | + - [Deployment](#deployment) |
9 | 14 | - [Getting started](#getting-started)
|
10 |
| - - [scraper in docker](#scraper-in-docker) |
11 |
| - - [scraper locally](#scraper-locally) |
12 |
| -- [Postgres change stream](#postgres-change-stream) |
| 15 | + - [Requirements](#requirements) |
| 16 | + - [Preparation](#preparation) |
| 17 | + - [Scraper](#scraper) |
13 | 18 |
|
14 | 19 | ## About
|
15 |
| -The goal of this project is to raise awareness about data privacy. The mean to do so is a tool to scrape, combine and analyze public social media data. |
| 20 | + |
| 21 | +The goal of this project is to raise awareness about data privacy. The mean to do so is a tool to scrape, combine and analyze public data from multiple social media sources. <br> |
16 | 22 | The results will be available via an API, used for some kind of art exhibition.
|
17 | 23 |
|
18 | 24 | ## Architectural overview
|
19 |
| -You can find a overview about our architecture on this [miro board](https://miro.com/app/board/o9J_kw7a-qM=/) |
20 | 25 |
|
21 |
| -### Api |
22 |
| -see details [here](api/README.md) |
| 26 | + |
23 | 27 |
|
24 |
| -### Postgres DB |
25 |
| -see details [here](db/README.md) |
| 28 | +You can find an more detailed overview [here](https://drive.google.com/a/code.berlin/file/d/1uE8oTku322-_eN3QGuiM4ayWZiRXfn9F/view?usp=sharing). <br> |
| 29 | +Open it in draw.io and have a look at the different tabs "High level overview", "Distributed Scraper" and "Face Search". |
26 | 30 |
|
27 |
| -## Requirements |
| 31 | +## Further reading |
28 | 32 |
|
29 |
| -- go 1.13 _(or go 1.11+ with the env var `GO111MODULEs=on`)_ |
30 |
| -- `docker` and `docker-compose` are available and up-to-date |
| 33 | +### Detailed documentation |
31 | 34 |
|
32 |
| -## Getting started |
| 35 | +| part | docs | contact | |
| 36 | +| :---------- | :----------------------------------------- | :----------------------------------------------- | |
| 37 | +| Api | [`api/README.md`](api/README.md) | [@jo-fr](https://github.com/jo-fr) | |
| 38 | +| Frontend | [`frontend/README.md`](frontend/README.md) | [@lukas-menzel](https://github.com/lukas-menzel) | |
| 39 | +| Postgres DB | [`db/README.md`](db/README.md) | [@alexmorten](https://github.com/alexmorten) | |
33 | 40 |
|
34 |
| -If this is your first time running this: |
| 41 | +### Wanna contribute? |
35 | 42 |
|
36 |
| -1. Add `127.0.0.1 my-kafka` and `127.0.0.1 minio` to your `/etc/hosts` file |
37 |
| -2. Choose a user_name as a starting point and run `go run cli/main/main.go <instagram|twitter> <user_name>` |
38 |
| - |
39 |
| -As alternative, you can also add the cli to the docker-compose: |
40 |
| - |
41 |
| -```yaml |
42 |
| - cli: |
43 |
| - build: |
44 |
| - context: "." |
45 |
| - dockerfile: "cli/Dockerfile" |
46 |
| - command: ["<instagram|twitter>", "<user_name>"] |
47 |
| - depends_on: |
48 |
| - - "my-kafka" |
49 |
| - environment: |
50 |
| - KAFKA_ADDRESS: "my-kafka:9092" |
51 |
| -``` |
| 43 | +If you want to join us raising awareness for data privacy have a look into [`CONTRIBUTING.md`](CONTRIBUTING.md) |
52 | 44 |
|
53 |
| -### scraper in docker |
| 45 | +### List of contributors |
54 | 46 |
|
55 |
| -```bash |
56 |
| -$ make run |
57 |
| -``` |
| 47 | +- @1Jo1 Josef Grieb |
| 48 | +- @Urhengulas Johann Hemmann |
| 49 | +- @alexmorten Alexander Martin |
| 50 | +- @jo-fr Jonathan Freiberger |
| 51 | +- @m-lukas Lukas Müller |
| 52 | +- @lukas-menzel Lukas Menzel |
| 53 | +- @SpringHawk Martin Zaubitzer |
58 | 54 |
|
59 |
| -### scraper locally |
| 55 | +### Deployment |
60 | 56 |
|
61 |
| -Have a look into [`docker-compose.yml`](docker-compose.yml), set the neccessary environment variables and run it with the command from the regarding dockerfile. |
| 57 | +The deployment of this project to kubernetes happens in [codeuniversity/smag-deploy](https://github.com/codeuniversity/smag-deploy) _(this is a private repo!)_ |
62 | 58 |
|
63 |
| -## Postgres change stream |
| 59 | +## Getting started |
| 60 | + |
| 61 | +### Requirements |
64 | 62 |
|
65 |
| -The debezium connector generates a change stream from all the changes in postgres |
| 63 | +| depency | version | |
| 64 | +| :----------------------------------------------------------- | :----------------------------------------------------------------- | |
| 65 | +| [`go`](https://golang.org/doc/install) | `v1.13` _([go modules](https://blog.golang.org/using-go-modules))_ | |
| 66 | +| [`docker`](https://docs.docker.com/install/) | `v19.x` | |
| 67 | +| [`docker-compose`](https://docs.docker.com/compose/install/) | `v1.24.x` | |
66 | 68 |
|
67 |
| -To read from this stream you can |
| 69 | +### Preparation |
68 | 70 |
|
69 |
| -- get [kt](https://github.com/fgeller/kt) |
70 |
| -- inspect the topic list in kafka `kt topic`, all topic starting with `postgres` are streams from individual tables |
71 |
| -- consume a topic with, for example `kt consume --topic postgres.public.users` |
| 71 | +If this is your first time running this: |
| 72 | + |
| 73 | +1. Add `127.0.0.1 my-kafka` and `127.0.0.1 minio` to your `/etc/hosts` file |
| 74 | +2. Choose a `<user_name>` for your platform of choice `<instagram|twitter>` as a starting point and run |
| 75 | + ```bash |
| 76 | + $ go run cli/main/main.go <instagram|twitter> <user_name> |
| 77 | + ``` |
72 | 78 |
|
73 |
| -The messages are quite verbose, since they include their own schema description. The most interesting part is the `value.payload` -> `kt consume --topic postgres.public.users | jq '.value | fromjson | .payload'` |
| 79 | +### Scraper |
| 80 | + |
| 81 | +Run the instagram- or twitter-scraper in docker: |
| 82 | + |
| 83 | +```bash |
| 84 | +$ make run-<platform_name> |
| 85 | +``` |
0 commit comments