-
Download OpenVino ToolKit and install it locally.
-
Clone this repository from this URL: https://github.com/mayor04/computer_pointer_controller.git
-
Create Virtual Enviorment in working directory.
cd Computer-Pointer-Controller python3 -m venv venv
-
Activate Virtual Enviorment
source venv/bin/activate
From the main directory run the command below and it automatically runs with all the default value
python main.py
To run with a custom value you have to explicitly specify the value with the instruction provided below
All argument have been made optional so it can successfully run the default configuration, even a default video have been provided
Argument | Description |
---|---|
-m_fd | Path to the face detection model |
-m_pe | Path to the head pose estimation model |
-m_fl | Path to the facial landmark detector model |
-m_ge | Path to the gaze estimation model |
-i | This could either be a video,image or 'cam' |
-sv | Save the image or video file to the current directory |
-d | Device used for inference |
-dl | For drawing the visualisation lines |
To perform inference directly from a camera source specify 'cam' as the input
Model | FP16 | FP32 |
---|---|---|
head-pose-estimation-adas-0001 | 109 | 123 |
landmarks-regression-retail-0009 | 48 | 49 |
gaze-estimation-adas-0002 | 134 | 118 |
Model | FP16 | FP32 |
---|---|---|
head-pose-estimation-adas-0001 | 6.07 | 3.07 |
landmarks-regression-retail-0009 | 0.88 | 1.13 |
gaze-estimation-adas-0002 | 41.2 | 41.7 |
FP32 is single-precision floating-point arithmetic (uses 32 bits to represent numbers), FP16 is half-precision (uses 16 bits).
You can represent numbers more accurately using FP32, but the training for FP32 is big and slow. since you don't need much precision to do deep learning you can use FP16 instead of FP32 in a deep learning model to reduce the time and energy spent computing, at little/no cost in accuracy.
Using the result gotten above Both models could be use interchangeably without having a signifacant difference in the load time and inference time but FP16 should have a less storage space, memory bandwidth, power consumption, lower inference latency and higher arithmetic speed because FP16 only occupies 16 bits in memory rather than 32 bits,
When more faces are detected(uses the first face detected only and prints out error). When user inputs a wrong file(gives error). User trying to save result from web cam