VisionGuard Architecture

High-Level Architecture

graph TD
    subgraph Client
        UI[User Interface]
        GVD[Gaze Vector Display]
        GCW[Calibration Window]
        STW[Screen Time Widget]
        STS[Statistics Window]
        CPR[Camera Permission Request]
        RCK[Run-Time Control Keys]
    end

    subgraph Backend
        CL[Core Logic]
        GDM[Gaze Detection Engine]
        GVC[Gaze Vector Calibration]
        EGT[Eye Gaze Time Tracker]
        BNS[Break Notification System]
        SC[Statistics Calculator]
        MC[Metric Calculator]
        PC[Performance Calculator]
    end

    subgraph Data
        UM[Usage Metrics]
    end

    UI <-->|Input/Output| CL
    CPR -->|Permission Status| CL
    RCK -->|Control Commands| CL
    CL <--> UM
    CL <--> GDM
    CL <--> GVC
    CL <--> EGT
    CL --> BNS
    CL <--> SC
    CL <--> MC
    CL <--> PC
    
    BNS --> UI
    CL --> GVD
    CL --> GCW
    CL --> STW
    SC --> STS
    PC --> UI

    style UI fill:#f0f9ff,stroke:#0275d8,stroke-width:2px
    style GVD fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style GCW fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style STW fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style STS fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style CPR fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style RCK fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style CL fill:#fff3cd,stroke:#ffb22b,stroke-width:2px
    style GDM fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style GVC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style EGT fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style BNS fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style SC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style MC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style PC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style UM fill:#f2dede,stroke:#d9534f,stroke-width:1px

Loading

Detailed Components

Client

The client consists of two main components:

Main Window Application: Runs in the foreground and provides the primary user interface.
System Tray Application: Runs in the background within the OS system tray.

Backend

Core Logic
Gaze Detection Engine
Gaze Vector Calibration
Eye Gaze Time Tracker
Break Notification System
Metric Calculator
Performance Calculator
Statistics Calculator

Data Storage

Usage Metrics: Stores data on user's screen time

For a detailed architectural overview of each component, please refer to the Detailed Component Architecture document.

Calibration Process

Detailed Calibration Steps

Calibration Process Flow

graph TD
    A[Start Calibration] --> B[Four-Point Gaze Capture]
    B --> C{Capture Successful?}
    C -->|Yes| D[Combine Gaze Points]
    C -->|No| B
    D --> E[Calculate Convex Hull]
    E --> F[Apply Error Margin]
    F --> G[Intersect with Screen Boundaries]
    G --> H[Determine Final Calibration Points]
    H --> I[End Calibration]

    style A fill:#98FB98,stroke:#333,stroke-width:2px
    style B fill:#87CEFA,stroke:#333,stroke-width:2px
    style C fill:#FFA07A,stroke:#333,stroke-width:2px
    style D fill:#87CEFA,stroke:#333,stroke-width:2px
    style E fill:#87CEFA,stroke:#333,stroke-width:2px
    style F fill:#87CEFA,stroke:#333,stroke-width:2px
    style G fill:#87CEFA,stroke:#333,stroke-width:2px
    style H fill:#87CEFA,stroke:#333,stroke-width:2px
    style I fill:#98FB98,stroke:#333,stroke-width:2px

Loading

a. Four-Point Gaze Capture

[Diagram: Four-Point Calibration Screen] Description: A full-screen view with four numbered green dots in the corners and center text guiding the user.

Process:

Look at each green dot as it appears for 1.2 seconds.
Multiple gaze points are captured for each corner.

b. Convex Hull Calculation

Process:

All captured gaze points are combined.
A convex hull algorithm finds the smallest polygon enclosing all points.

c. Error Margin Application

Process:

The convex hull is extended by the specified error margin (default: 150 pixels).
This accounts for potential gaze tracking inaccuracies.

d. Final Calibration Point Determination

Process:

The extended convex hull is intersected with screen boundaries.
The four corners of this intersection become the final calibration points.

Backend Processing Pipeline

Image Input: Raw frame from the camera.
Face Detection: Locates faces in the image.
Facial Landmark Detection: Identifies key facial points.
Head Pose Estimation: Determines the orientation of the head.
Eye State Estimation: Checks if eyes are open or closed.
Gaze Estimation: Calculates the gaze vector.
Gaze Time Estimation: Determines if the gaze is on the screen.
Screen Gaze Time Accumulation: Updates the total screen time.
Usage Metrics Update: Records the latest usage data.
Break Notification Trigger: Initiates break alerts if necessary.

VisionGuard Core

The VisionGuard Core is powered by the OpenVINO model zoo to estimate a user's gaze and calculate the accumulated screen gaze time. The following models are integral to the backend, each playing a crucial role in the processing pipeline:

Face Detection Model: Identifies the locations of faces within an image, serving as the first step in the gaze estimation process. Supported networks include:
Head Pose Estimation Model: Estimates the head pose in Tait-Bryan angles, outputting yaw, pitch, and roll angles in degrees, which are crucial inputs for the gaze estimation model:
- head-pose-estimation-adas-0001
Facial Landmark Detection Model: Determines the coordinates of key facial landmarks, particularly around the eyes, which are necessary for precise gaze estimation. Supported networks include:
- facial-landmarks-35-adas-0002
- facial-landmarks-98-detection-0001
Eye State Estimation Model: Assesses whether the eyes are open or closed, an important factor in determining gaze direction:
- open-closed-eye-0001
Gaze Estimation Model: Utilizes inputs from both eyes and the head pose angles to output a 3D vector representing the direction of a person's gaze in Cartesian coordinates:
- gaze-estimation-adas-0002

For a detailled overview of the low level architecture refer to the wikipage

Processing Pipeline

The following diagram illustrates the processing pipeline in VisionGuard, demonstrating how different models interact to produce accurate gaze estimation:

graph TD
A[Image Input] --> B[Face Detection]
B --> |Face Image| C[Facial Landmark Detection]
B --> |Face Image| D[Head Pose Estimation]
C --> E[Eye State Estimation]
D --> |Head Pose Angles| F[Gaze Estimation]
C --> |Eye Image| F
E --> |Eye State| F
F --> |Gaze Vector|G[Gaze Time Estimation]
G -->  H[Accumulate Screen Gaze Time]

%% Styling
style B fill:#FFDDC1,stroke:#333,stroke-width:2px
style C fill:#FFDDC1,stroke:#333,stroke-width:2px
style D fill:#FFDDC1,stroke:#333,stroke-width:2px
style E fill:#FFDDC1,stroke:#333,stroke-width:2px
style F fill:#FFDDC1,stroke:#333,stroke-width:2px
style G fill:#FFDDC1,stroke:#333,stroke-width:2px

style A fill:#C1E1FF,stroke:#333,stroke-width:2px
style H fill:#C1E1FF,stroke:#333,stroke-width:2px

Loading

Frame Processing and Gaze Time Update Algorithm

The VisionGuard system processes each video frame to determine the user's gaze direction and update screen time metrics. Here's a high-level overview of the algorithm:

Face and Gaze Detection: Detect faces and estimate gaze direction.
Gaze Screen Intersection: Convert 3D gaze vector to 2D screen point.
Gaze Time Update: Update screen time or gaze lost duration.
Visual Feedback: Display detection results and metrics.
Performance Tracking: Update resource utilization data.

flowchart TD
    A[Process Frame] --> B[Detect Face & Estimate Gaze]
    B --> C{Gaze on Screen?}
    C -->|Yes| D{Eyes Open?}
    C -->|No| E[Update Gaze Lost Time]
    D -->|Yes| F[Accumulate Screen Time]
    D -->|No| E
    E --> G{Gaze Lost > Threshold?}
    G -->|Yes| H[Reset Screen Time]
    G -->|No| I[Update Metrics]
    F --> I
    H --> I
    I --> J[Display Results]

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style I fill:#bbf,stroke:#333,stroke-width:2px
    style J fill:#bfb,stroke:#333,stroke-width:2px

Loading

Gaze Screen Intersection

The following image illustrates how the gaze vector intersects with the screen:

This visualization helps understand how the 3D gaze vector is projected onto the 2D screen space.

Point-in-Polygon Algorithm

To determine if the gaze point is within the screen boundaries, VisionGuard uses a ray-casting algorithm, which is a common method for solving the point-in-polygon problem. Here's a visual representation of how this algorithm works:

Image credit: Wikimedia Commons

The algorithm works as follows:

Cast a ray from the gaze point to infinity (usually along the x-axis)
Count the number of intersections with the polygon's edges
If the count is odd, the point is inside the polygon; if even, it's outside

This method is efficient and works for both convex and concave polygons, making it suitable for various screen shapes and calibration setups. The test also works in three dimensions, which is particularly useful for VisionGuard's 3D gaze estimation.

For more detailed information about this algorithm, please refer to the Point in polygon article on Wikipedia.

This algorithm provides a robust way to determine if the user's gaze is directed at the screen, allowing VisionGuard to accurately track screen time and manage break notifications.

Metric Calculation

The Metric Calculator computes usage metrics:

Total screen time
Continuous gaze away durations

Performance Calculation

The Performance Calculator analyzes system performance and resource usage:

CPU Utilization: Tracks CPU usage of the VisionGuard application.
Memory Usage: Monitors RAM consumption.
Frame Processing Speed: Calculates frames per second for video processing.
Latency of inference: Calculates latency in milliseconds for video processing.

Notification Alert System

The Break Notification System manages alerts based on user settings and gaze behavior:

Break Reminders: Triggered after prolonged screen time.
Custom Alerts: User-defined notifications based on specific conditions.

Statistics Calculation

The Statistics Calculator generates comprehensive reports on usage patterns:

Daily Usage Summary: Screen time, break frequency, and duration per day.
Weekly Trends: Week-over-week comparisons of usage patterns.
Mean Screen Time: Mean screen time on a daily and weekly basis.

Data Management

The only data that is persisted and stored locally is the screen time statistics. The weekly statistics are maintained, and once the data becomes older than a week, the stale data is automatically cleared.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARCHITECTURE.md

ARCHITECTURE.md

VisionGuard Architecture

Table of Contents

High-Level Architecture

Detailed Components

Client

Backend

Data Storage

Calibration Process

Detailed Calibration Steps

Calibration Process Flow

a. Four-Point Gaze Capture

b. Convex Hull Calculation

c. Error Margin Application

d. Final Calibration Point Determination

Backend Processing Pipeline

VisionGuard Core

Processing Pipeline

Frame Processing and Gaze Time Update Algorithm

Gaze Screen Intersection

Point-in-Polygon Algorithm

Metric Calculation

Performance Calculation

Notification Alert System

Statistics Calculation

Data Management

Files

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

VisionGuard Architecture

Table of Contents

High-Level Architecture

Detailed Components

Client

Backend

Data Storage

Calibration Process

Detailed Calibration Steps

Calibration Process Flow

a. Four-Point Gaze Capture

b. Convex Hull Calculation

c. Error Margin Application

d. Final Calibration Point Determination

Backend Processing Pipeline

VisionGuard Core

Processing Pipeline

Frame Processing and Gaze Time Update Algorithm

Gaze Screen Intersection

Point-in-Polygon Algorithm

Metric Calculation

Performance Calculation

Notification Alert System

Statistics Calculation

Data Management