How I Deployed A Computer Pointer Controller Using Gaze Estimation.
10 Min Read
Story
Imagine controlling your computer mouse pointer using nothing else other than your eyes and head pose. Well stop imagining it.
In post project, I will walk you through a project I created using the Intel® OpenVINO toolkit utilizing the Gaze Detection model to control the mouse pointer of my computer. I used the Gaze Estimation model to estimate the gaze of the user’s eyes and change the mouse pointer position accordingly. This project demonstrated the ability of running multiple models in the same machine and coordinate the flow of data between those models.
TL;DR
Checkout the code here: https://github.com/mmphego/computer-pointer-controller/tree/v1.0
How It Works
Used the Inference Engine API from Intel’s OpenVino ToolKit to build the project.
The gaze estimation model used requires three inputs:
- The head pose
- The left eye image
- The right eye image.
To get these inputs, use the three other OpenVino models model below:
Implementation: https://github.com/mmphego/computer-pointer-controller/blob/bb5f13c6d2567c0856407db6c35b3fa6345f97c2/src/model.py#L156
Implementation: https://github.com/mmphego/computer-pointer-controller/blob/bb5f13c6d2567c0856407db6c35b3fa6345f97c2/src/model.py#L239
Implementation: https://github.com/mmphego/computer-pointer-controller/blob/bb5f13c6d2567c0856407db6c35b3fa6345f97c2/src/model.py#L305
Implementation: https://github.com/mmphego/computer-pointer-controller/blob/bb5f13c6d2567c0856407db6c35b3fa6345f97c2/src/model.py#L422
Project Pipeline
Coordinate the flow of data from the input, and then amongst the different models and finally to the mouse controller. The flow of data looks like this:
Demo
Project Set Up and Installation
Directory Structure
tree && du -sh
.
├── LICENSE
├── main.py
├── models
│ ├── face-detection-adas-binary-0001.bin
│ ├── face-detection-adas-binary-0001.xml
│ ├── gaze-estimation-adas-0002.bin
│ ├── gaze-estimation-adas-0002.xml
│ ├── head-pose-estimation-adas-0001.bin
│ ├── head-pose-estimation-adas-0001.xml
│ ├── landmarks-regression-retail-0009.bin
│ └── landmarks-regression-retail-0009.xml
├── README.md
├── requirements.txt
├── resources
└── src
├── __init__.py
├── input_feeder.py
├── model.py
└── mouse_controller.py
3 directories, 16 files
37M .
Setup and Installation
There are two (2) ways of running the project.
- Download and install Intel OpenVINO Toolkit and install.
- After you’ve cloned the repo, you need to install the dependencies using this command:
pip3 install -r requirements.txt
- After you’ve cloned the repo, you need to install the dependencies using this command:
- Run the project in the Docker image that I have baked Intel OpenVINO and dependencies in.
- Run:
docker pull mmphego/intel-openvino
- Run:
Not sure what Docker is, watch this
For this project I used the latter method.
Models Used
I have already downloaded the Models, which are located in ./models/
.
Should you wish to download your own models run:
MODEL_NAME=<<name of model to download>>
docker run --rm -ti \
--volume "$PWD":/app \
mmphego/intel-openvino \
bash -c "\
/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py \
--name $MODEL_NAME"
Models used in this project:
- Face Detection Model
- Facial Landmarks Detection Model
- Head Pose Estimation Model
- Gaze Estimation Model
Application Usage
$ python main.py -h
usage: main.py [-h] -fm FACE_MODEL -hp HEAD_POSE_MODEL -fl
FACIAL_LANDMARKS_MODEL -gm GAZE_MODEL [-d DEVICE]
[-pt PROB_THRESHOLD] -i INPUT [--out] [-mp [{high,low,medium}]]
[-ms [{fast,slow,medium}]] [--enable-mouse] [--show-bbox]
[--debug] [--stats]
optional arguments:
-h, --help show this help message and exit
-fm FACE_MODEL, --face-model FACE_MODEL
Path to an xml file with a trained model.
-hp HEAD_POSE_MODEL, --head-pose-model HEAD_POSE_MODEL
Path to an IR model representative for head-pose-model
-fl FACIAL_LANDMARKS_MODEL, --facial-landmarks-model FACIAL_LANDMARKS_MODEL
Path to an IR model representative for facial-
landmarks-model
-gm GAZE_MODEL, --gaze-model GAZE_MODEL
Path to an IR model representative for gaze-model
-d DEVICE, --device DEVICE
Specify the target device to infer on: CPU, GPU, FPGA
or MYRIAD is acceptable. Sample will look for a
suitable plugin for device specified (Default: CPU)
-pt PROB_THRESHOLD, --prob_threshold PROB_THRESHOLD
Probability threshold for detections
filtering(Default: 0.8)
-i INPUT, --input INPUT
Path to image, video file or 'cam' for Webcam.
--out Write video to file.
-mp [{high,low,medium}], --mouse-precision [{high,low,medium}]
The precision for mouse movement (how much the mouse
moves). [Default: low]
-ms [{fast,slow,medium}], --mouse-speed [{fast,slow,medium}]
The speed (how fast it moves) by changing [Default:
fast]
--enable-mouse Enable Mouse Movement
--show-bbox Show bounding box and stats on screen [debugging].
--debug Show output on screen [debugging].
--stats Verbose OpenVINO layer performance stats [debugging].
Usage Example
In order to run the application run the following code (Assuming you have docker installed.):
xhost +;
docker run --rm -ti \
--volume "$PWD":/app \
--env DISPLAY=$DISPLAY \
--volume=$HOME/.Xauthority:/root/.Xauthority \
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
--device /dev/video0 \
mmphego/intel-openvino \
bash -c "\
source /opt/intel/openvino/bin/setupvars.sh && \
python main.py \
--face-model models/face-detection-adas-binary-0001 \
--head-pose-model models/head-pose-estimation-adas-0001 \
--facial-landmarks-model models/landmarks-regression-retail-0009 \
--gaze-model models/gaze-estimation-adas-0002 \
--input resources/demo.mp4 \
--debug \
--show-bbox \
--enable-mouse \
--mouse-precision low \
--mouse-speed fast"
Packaging the Application
We can use the Deployment Manager present in OpenVINO to create a runtime package from our application. These packages can be easily sent to other hardware devices to be deployed.
To deploy the application to various devices using the Deployment Manager run the steps below.
Note: Choose from the devices listed below.
DEVICE='cpu' # or gpu, vpu, gna, hddl
docker run --rm -ti \
--volume "$PWD":/app \
mmphego/intel-openvino bash -c "\
python /opt/intel/openvino/deployment_tools/tools/deployment_manager/deployment_manager.py \
--targets cpu \
--user_data /app \
--output_dir . \
--archive_name computer_pointer_controller_${DEVICE}"
OpenVino API for Layer Analysis
Queries performance measures per layer to get feedback of what is the most time consuming layer: Read docs.
xhost +;
docker run --rm -ti \
--volume "$PWD":/app \
--env DISPLAY=$DISPLAY \
--volume=$HOME/.Xauthority:/root/.Xauthority \
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
--device /dev/video0 \
mmphego/intel-openvino \
bash -c "\
source /opt/intel/openvino/bin/setupvars.sh && \
python main.py \
--face-model models/face-detection-adas-binary-0001 \
--head-pose-model models/head-pose-estimation-adas-0001 \
--facial-landmarks-model models/landmarks-regression-retail-0009 \
--gaze-model models/gaze-estimation-adas-0002 \
--input resources/demo.mp4 \
--stat"
Edge Cases
- Multiple People Scenario: If we encounter multiple people in the video frame, it will always use and give results one face even though multiple people detected,
- No Face Detection: it will skip the frame and inform the user
Future Improvement
- Intel® DevCloud: Benchmark the application on various devices to ensure optimum performance.
- Intel® VTune™ Profiler: Profile my application and locate any bottlenecks.
- Gaze estimations: We could revisit the logic of determining and calculating the coordinates as it is a bit flaky.
- Lighting condition: We might use HSV based pre-processing steps to minimize error due to different lighting conditions.