How I Deployed A Computer Pointer Controller Using Gaze Estimation.

10 Min Read

Story

Imagine controlling your computer mouse pointer using nothing else other than your eyes and head pose. Well stop imagining it.

In post project, I will walk you through a project I created using the Intel® OpenVINO toolkit utilizing the Gaze Detection model to control the mouse pointer of my computer. I used the Gaze Estimation model to estimate the gaze of the user’s eyes and change the mouse pointer position accordingly. This project demonstrated the ability of running multiple models in the same machine and coordinate the flow of data between those models.

TL;DR

Checkout the code here: https://github.com/mmphego/computer-pointer-controller/tree/v1.0

How It Works

Used the Inference Engine API from Intel’s OpenVino ToolKit to build the project.

The gaze estimation model used requires three inputs:

The head pose
The left eye image
The right eye image.

To get these inputs, use the three other OpenVino models model below:

Face Detection

Implementation: https://github.com/mmphego/computer-pointer-controller/blob/bb5f13c6d2567c0856407db6c35b3fa6345f97c2/src/model.py#L156

face_Detection

Facial Landmarks Detection.

Implementation: https://github.com/mmphego/computer-pointer-controller/blob/bb5f13c6d2567c0856407db6c35b3fa6345f97c2/src/model.py#L239

facial_landmarks

Head Pose Estimation

Implementation: https://github.com/mmphego/computer-pointer-controller/blob/bb5f13c6d2567c0856407db6c35b3fa6345f97c2/src/model.py#L305

head_pose

Gaze Estimation

Implementation: https://github.com/mmphego/computer-pointer-controller/blob/bb5f13c6d2567c0856407db6c35b3fa6345f97c2/src/model.py#L422

all

Project Pipeline

Coordinate the flow of data from the input, and then amongst the different models and finally to the mouse controller. The flow of data looks like this:

Demo

vide-demo

Project Set Up and Installation

Directory Structure

tree && du -sh
.
├── LICENSE
├── main.py
├── models
│   ├── face-detection-adas-binary-0001.bin
│   ├── face-detection-adas-binary-0001.xml
│   ├── gaze-estimation-adas-0002.bin
│   ├── gaze-estimation-adas-0002.xml
│   ├── head-pose-estimation-adas-0001.bin
│   ├── head-pose-estimation-adas-0001.xml
│   ├── landmarks-regression-retail-0009.bin
│   └── landmarks-regression-retail-0009.xml
├── README.md
├── requirements.txt
├── resources
└── src
    ├── __init__.py
    ├── input_feeder.py
    ├── model.py
    └── mouse_controller.py

3 directories, 16 files
37M .

Setup and Installation

There are two (2) ways of running the project.

Download and install Intel OpenVINO Toolkit and install.
- After you’ve cloned the repo, you need to install the dependencies using this command: pip3 install -r requirements.txt
Run the project in the Docker image that I have baked Intel OpenVINO and dependencies in.
- Run: docker pull mmphego/intel-openvino

Not sure what Docker is, watch this

For this project I used the latter method.

Models Used

I have already downloaded the Models, which are located in ./models/. Should you wish to download your own models run:

MODEL_NAME=<<name of model to download>>
docker run --rm -ti \
--volume "$PWD":/app \
mmphego/intel-openvino \
bash -c "\
  /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py \
  --name $MODEL_NAME"

Models used in this project:

Application Usage

$ python main.py -h

usage: main.py [-h] -fm FACE_MODEL -hp HEAD_POSE_MODEL -fl
               FACIAL_LANDMARKS_MODEL -gm GAZE_MODEL [-d DEVICE]
               [-pt PROB_THRESHOLD] -i INPUT [--out] [-mp [{high,low,medium}]]
               [-ms [{fast,slow,medium}]] [--enable-mouse] [--show-bbox]
               [--debug] [--stats]

optional arguments:
  -h, --help            show this help message and exit
  -fm FACE_MODEL, --face-model FACE_MODEL
                        Path to an xml file with a trained model.
  -hp HEAD_POSE_MODEL, --head-pose-model HEAD_POSE_MODEL
                        Path to an IR model representative for head-pose-model
  -fl FACIAL_LANDMARKS_MODEL, --facial-landmarks-model FACIAL_LANDMARKS_MODEL
                        Path to an IR model representative for facial-
                        landmarks-model
  -gm GAZE_MODEL, --gaze-model GAZE_MODEL
                        Path to an IR model representative for gaze-model
  -d DEVICE, --device DEVICE
                        Specify the target device to infer on: CPU, GPU, FPGA
                        or MYRIAD is acceptable. Sample will look for a
                        suitable plugin for device specified (Default: CPU)
  -pt PROB_THRESHOLD, --prob_threshold PROB_THRESHOLD
                        Probability threshold for detections
                        filtering(Default: 0.8)
  -i INPUT, --input INPUT
                        Path to image, video file or 'cam' for Webcam.
  --out                 Write video to file.
  -mp [{high,low,medium}], --mouse-precision [{high,low,medium}]
                        The precision for mouse movement (how much the mouse
                        moves). [Default: low]
  -ms [{fast,slow,medium}], --mouse-speed [{fast,slow,medium}]
                        The speed (how fast it moves) by changing [Default:
                        fast]
  --enable-mouse        Enable Mouse Movement
  --show-bbox           Show bounding box and stats on screen [debugging].
  --debug               Show output on screen [debugging].
  --stats               Verbose OpenVINO layer performance stats [debugging].

Usage Example

In order to run the application run the following code (Assuming you have docker installed.):

xhost +;
docker run --rm -ti \
--volume "$PWD":/app \
--env DISPLAY=$DISPLAY \
--volume=$HOME/.Xauthority:/root/.Xauthority \
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
--device /dev/video0 \
mmphego/intel-openvino \
bash -c "\
    source /opt/intel/openvino/bin/setupvars.sh && \
    python main.py \
        --face-model models/face-detection-adas-binary-0001 \
        --head-pose-model models/head-pose-estimation-adas-0001 \
        --facial-landmarks-model models/landmarks-regression-retail-0009 \
        --gaze-model models/gaze-estimation-adas-0002 \
        --input resources/demo.mp4 \
        --debug \
        --show-bbox \
        --enable-mouse \
        --mouse-precision low \
        --mouse-speed fast"

Packaging the Application

We can use the Deployment Manager present in OpenVINO to create a runtime package from our application. These packages can be easily sent to other hardware devices to be deployed.

To deploy the application to various devices using the Deployment Manager run the steps below.

Note: Choose from the devices listed below.

DEVICE='cpu' # or gpu, vpu, gna, hddl
docker run --rm -ti \
--volume "$PWD":/app \
mmphego/intel-openvino bash -c "\
  python /opt/intel/openvino/deployment_tools/tools/deployment_manager/deployment_manager.py \
  --targets cpu \
  --user_data /app \
  --output_dir . \
  --archive_name computer_pointer_controller_${DEVICE}"

OpenVino API for Layer Analysis

Queries performance measures per layer to get feedback of what is the most time consuming layer: Read docs.

xhost +;
docker run --rm -ti \
--volume "$PWD":/app \
--env DISPLAY=$DISPLAY \
--volume=$HOME/.Xauthority:/root/.Xauthority \
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
--device /dev/video0 \
mmphego/intel-openvino \
bash -c "\
    source /opt/intel/openvino/bin/setupvars.sh && \
    python main.py \
        --face-model models/face-detection-adas-binary-0001 \
        --head-pose-model models/head-pose-estimation-adas-0001 \
        --facial-landmarks-model models/landmarks-regression-retail-0009 \
        --gaze-model models/gaze-estimation-adas-0002 \
        --input resources/demo.mp4 \
        --stat"

Edge Cases

Multiple People Scenario: If we encounter multiple people in the video frame, it will always use and give results one face even though multiple people detected,
No Face Detection: it will skip the frame and inform the user

Future Improvement

Intel® DevCloud: Benchmark the application on various devices to ensure optimum performance.
Intel® VTune™ Profiler: Profile my application and locate any bottlenecks.
Gaze estimations: We could revisit the logic of determining and calculating the coordinates as it is a bit flaky.
Lighting condition: We might use HSV based pre-processing steps to minimize error due to different lighting conditions.