Incidence Intelligence 🔎

A data pipeline that utilizes the ELT (Extract, Load, Transform) process to fetch, load, and transform San Francisco Police Department incident data, with the aim of generating insights.

Description

As new incidents occur on a daily basis, I am attempting to gather insights that will enable us to take better measures. Specifically, I am seeking answers to the following questions:

Dataset Link

What is the frequency of incidents over time?
Which incidents happen most frequently?
Which areas are hotspots of crime in San Francisco?
On which day and month do most incidents happen?
What are the trends of incidents in districts over the years?
How many active and closed cases are there?

Tech Stack

Infrastructure: Terraform

Cloud: Google Cloud

Data lake: Google Cloud Storage

Data warehouse: BigQuery

Orchestration: Prefect

Data transformation: DBT

Data visualization: Google Looker Studio

Architecture

Here is brief explanation:

Using Google Cloud infrastructure
Setting it up using Terraform
Extracting data from api servers
Ingesting it to google bucket
Loading into Bigquery, with basic checks
DBT to transform
The whole process is being orchestrated by Prefect
Finally using Data Studio for Analytics

Demo

Live Demo

Production table structure

Column	Description
incident_datetime	The date and time when the incident occurred
incident_date	The date when the incident occurred
incident_time	The time when the incident occurred
incident_year	The year of the incident
incident_month	The month of the incident (as a number)
incident_month_name	The name of the month when the incident occurred
incident_day	The day of the month when the incident occurred
incident_day_of_week	The day of the week when the incident occurred
report_datetime	The date and time when the incident was reported
incident_id	A unique identifier for the incident
incident_number	A unique identifier assigned to the incident by the reporting system
report_type_code	A code representing the type of report made
report_type_description	A description of the type of report made
incident_code	A code representing the type of incident
incident_category	The general category of the incident
incident_subcategory	A more specific category of the incident
incident_description	A brief description of the incident
resolution	The outcome or resolution of the incident (e.g. arrest made, no further action)
police_district	The police district in which the incident occurred
filed_online	A string indicating whether the incident was reported online
intersection	The intersection where the incident occurred
analysis_neighborhood	The neighborhood in which the incident occurred, based on an analysis by the reporting system
supervisor_district	The district number of the supervisor representing the area where the incident occurred
supervisor_district_name	The name of the supervisor district where the incident occurred
supervisor_name	The name of the supervisor representing the area where the incident occurred
latitude	The latitude coordinate of the location where the incident occurred
longitude	The longitude coordinate of the location where the incident occurred
geo_location	A string containing both latitude and longitude coordinates of the location where the incident occurred

Here is the lineage graph:

Installation

Install Project

Make sure you have docker, make and git installed in your pc.

Git clone the repo

git clone https://github.com/rkscodes/incident_intelligence.git
cd incident_intelligence

Make a virtual env with python=3.9 and install prefect:
```
conda create -n incidence python=3.9 prefect
```
Activate your env:
```
conda activate incidence
```
Refer Setup Terraform to setup cloud infra.
Make changes in config.json file with same details as in variables.tf.
Run prefect server:
```
prefect server start
```
Register blocks and deployments using:
```
 make prefect_setup
```

Set prefect api url:

prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api

Run docker Deployment infra-docker-storage-docker using prefect ui or:
```
prefect deployment run parent-etl-flow/infra-docker-storage-docker
```
To run other deployment install all dependencies using: make env_setup (Tested on Mac, optional)

Setup Terraform

Show instructions

Install terraform on your pc

Clone this (if not already):

git clone https://github.com/rkscodes/incident_intelligence.git
cd incident_intelligence/terraform

Initialise terraform with : terraform init
Refer Setup GCP and follow all steps.
Update project var in variables.tf with GCP project id.

Authenticate the service first:

export GOOGLE_APPLICATION_CREDENTIALS={{path_to_application_credential}}
gcloud auth activate-service-account --key-file ${GOOGLE_APPLICATION_CREDENTIALS}

Generate a SSH key using:

ssh-keygen -t rsa -f ~/.ssh/<name> -C <username> -b 2048

Update gce_ssh_user, gce_ssh_pub_key_file variable in variables.tf with generated public key username and file path.
Optionally could modify region, zone, data_lake_bucket if required

Please make sure you have exported:

export GOOGLE_APPLICATION_CREDENTIALS={{path_to_application_credential}}

terraform validate
terraform plan
terraform apply
Your infra should be up and running.
One caveat if you decide to use terraform destroy VPC network are sometimes not destroyed so destroy my-network manually in google cloud ui.

Setup GCP

Show Instructions

Roadmap

Write unit test

Acknowledgements

I would like to express my sincere gratitude to the team behind the Data Engineering Zoomcamp course for providing me with the opportunity to enhance my skills in this field. The course has been an enriching and insightful experience, and I have gained a deeper understanding of the concepts and practices related to data engineering.

Additionally, I want to express my appreciation for the vibrant community on the course Slack channel. The discussions and interactions with fellow students have been an excellent source of support, and I have learned a great deal from their insights and experiences.

License

Apache License

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
Assets		Assets
pipeline		pipeline
terraform		terraform
transform_dbt		transform_dbt
.gitignore		.gitignore
.prefectignore		.prefectignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.json		config.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Incidence Intelligence 🔎

Description

Tech Stack

Architecture

Demo

Production table structure

Installation

Install Project

Setup Terraform

Setup GCP

Roadmap

Acknowledgements

License

About

Releases

Packages

Languages

License

rkscodes/incident_intelligence

Folders and files

Latest commit

History

Repository files navigation

Incidence Intelligence 🔎

Description

Tech Stack

Architecture

Demo

Production table structure

Installation

Install Project

Setup Terraform

Setup GCP

Roadmap

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages