Dimensional Data Modeling - Week 1

This week's assignment involves working with the actor_films dataset. Your task is to construct a series of SQL queries and table definitions that will allow us to model the actor_films dataset in a way that facilitates efficient analysis. This involves creating new tables, defining data types, and writing queries to populate these tables with data from the actor_films dataset

Dataset Overview

The actor_films dataset contains the following fields:

actor: The name of the actor.
actorid: A unique identifier for each actor.
film: The name of the film.
year: The year the film was released.
votes: The number of votes the film received.
rating: The rating of the film.
filmid: A unique identifier for each film.

The primary key for this dataset is (actor_id, film_id).

Assignment Tasks

DDL for actors table: Create a DDL for an actors table with the following fields:
- films: An array of struct with the following fields:
  - film: The name of the film.
  - votes: The number of votes the film received.
  - rating: The rating of the film.
  - filmid: A unique identifier for each film.
- quality_class: This field represents an actor's performance quality, determined by the average rating of movies of their most recent year. It's categorized as follows:
  - star: Average rating > 8.
  - good: Average rating > 7 and ≤ 8.
  - average: Average rating > 6 and ≤ 7.
  - bad: Average rating ≤ 6.
- is_active: A BOOLEAN field that indicates whether an actor is currently active in the film industry (i.e., making films this year).
Cumulative table generation query: Write a query that populates the actors table one year at a time.
DDL for actors_history_scd table: Create a DDL for an actors_history_scd table with the following features:
- Implements type 2 dimension modeling (i.e., includes start_date and end_date fields).
- Tracks quality_class and is_active status for each actor in the actors table.
Backfill query for actors_history_scd: Write a "backfill" query that can populate the entire actors_history_scd table in a single query.
Incremental query for actors_history_scd: Write an "incremental" query that combines the previous year's SCD data with new incoming data from the actors table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

homework.md

homework.md

Dimensional Data Modeling - Week 1

Dataset Overview

Assignment Tasks

Files

homework.md

Latest commit

History

homework.md

File metadata and controls

Dimensional Data Modeling - Week 1

Dataset Overview

Assignment Tasks