Offline Video Surveillance using AWS DeepLens

AWS DeepLens

I have recently been prototyping with an AWS DeepLens device. We were playing around with an idea of theme songs for people working at the office, and it seemed like a fun idea to have the DeepLens automatically play the theme song for the person who enters the office. With this in mind I was trying to find a smooth development workflow for the DeepLens and the related AWS infrastructure and to create a working prototype of the project. In this blog, I’ll discuss the project itself and some of the things I have learned during the project.

What is DeepLens?

DeepLens is a deep learning enabled video camera from AWS. Its main purpose is to offer developers an easy way to get hands-on experience in deep learning. The first version became available in 2018, and a second version with some hardware and software improvements was published in 2019.

The device is basically a small computer with a built-in 4MP video camera and it has the ability to deploy and run visual deep learning models created using SageMaker, Amazon’s tool for building such models. Amazon offers a wide variety of example projects for the DeepLens, such as bird classification, face detection, and activity recognition, that can be deployed to the device in a matter of minutes.

Behind the scenes, DeepLens uses Amazon IoT Greengrass to deploy and run lambda functions in the device, and the device can also be used as an ordinary Greengrass core instead of deploying projects through the DeepLens console.

The Standard Development Pipeline

When building a DeepLens project from scratch, the first thing that you need is a deep learning model. The models are built using Amazon SageMaker from which they can be transferred to the DeepLens device. It is also possible to select a model from a library of pre-trained models offered by AWS.

I addition to the trained model, a DeepLens project needs a lambda function (written in Python 2.7) that at minimum passes the visual data from the device’s camera to the model for inference and then publishes the results.

The model and the lambda function are then deployed to the DeepLens device using a dedicated DeepLens console in the AWS Management Console. Behind the scenes, the console creates a Greengrass deployment with certain hard coded parameters and deploys that on the device. This is excellent for fast prototyping, but makes certain things a bit more complicated, which I will discuss more later in the blog.

AWS offers a nice selection of example projects which showcase not only the models and lambda functions deployed into the DeepLens, but also how the device can be used in connection with other parts of the AWS infrastructure, especially the Amazon Rekognition Image. Most examples involve dividing the image analysis into different subtasks. Some of these are performed in the DeepLens device, after which the images that require further processing are uploaded to an S3 bucket and analysed using Rekognition.

As mentioned above, DeepLens uses Greengrass to run the projects on the device. When the device is registered to an account, AWS automatically sets it up as a Greengrass core. The project deployments are also basically Greengrass deployments using certain hard coded resources and settings. Being aware of this is useful, since examining the Greengrass deployment parameters makes it easier to understand what can and cannot be done without extra tweaking.

The Project

I wanted to build a quick prototype of a light video surveillance system using DeepLens as an autonomous security camera. This would be done by pointing the camera at the door, and performing face recognition on the people entering the office, logging information about unidentified people. I took a privacy-oriented approach, where I wanted to build the system so that everything is done on the device. In other words, no uploading images of passersby to S3 for classification using Rekognition. In addition to identifying people, I wanted to be able to encrypt images containing unidentified faces locally before storing them to an S3 bucket, or failing that, to store the images locally to be retrieved later using an SSH connection over the local Wi-Fi network.

The first option that came to mind was to build a custom model using SageMaker using employee profile photos as training data. This again required uploading the images to cloud, and seemed like it might prove to be pretty complex, so I did some more research and found a DeepLens project called OneEyeFaceDetection that does manage to perform face recognition on the DeepLens device. The project uses face-recognition, a wonderful python library by Adam Geitgey, and I decided to try the same method. Geitgey’s library performs face recognition using a C++ machine learning toolkit called dlib, which includes excellent face detection and recognition functionality.

The great thing about this library is that there is no need to retrain the underlying model when adding or removing faces to the list of known persons. The system works by first detecting faces in the image, and then calculating feature vectors for each of the detected faces. After obtaining the feature vectors, we simply compare those to feature vectors generated from our reference photos of known persons, and find the closest one in the sense of the Frobenius norm [1].

If none of the reference vectors are similar enough, we label the face as unknown. Adding a new person to the list of known people is as simple as generating a feature vector for that person, and adding it to the feature vector database, together with some id linking the feature vector to the identity of the person.

Using this library makes it easy to perform face detection on the DeepLens device, although at the cost of not being able to take advantage of the GPU processing capabilities of the device. I am aware that this is really not the way that DeepLens is primarily meant to be used, but it is a commercially available device that works well with AWS, and is an interesting hardware alternative for custom devices.

Problems and Solutions

There were a few problems I encountered while working on the project. The first was figuring out how to get the face-recognition library installed on the device. At that point I was completely new to the device, and didn’t really have a clear picture of the environment in which the lambda functions are executed so a certain amount of trial and error over an SSH connection was involved.

Installing the face-recognition library requires connecting to the device over SSH, installing cmake (and possibly other supporting libraries), then installing dlib library from source files, and finally installing the face-recognition python library itself.

After this I was able to use face-recognition inside the SSH connection, but it was still inaccessible to the lambda function, probably due to the Greengrass container in which the lambda is executed. I fixed this simply by copying the necessary libraries to another directory that is visible to the lambda functions (/usr/lib/python2.7/dist-packages/), but there are probably other ways of doing this as well.

Accessing S3 required adding new policies to the AWSDeepLensGreengrassGroupRole defined automatically when the DeepLens device is registered to an AWS account. I disliked the idea of editing an existing role for the purpose, and ended up creating a new role for the project. In order to have the DeepLens use my newly created role, I needed to go to the Group settings in the IoT Greengrass console, where the Group role is defined. The group role is one of the things that are defined automatically when a new DeepLens device is registered to an AWS account.

Even if you do decide to edit the existing role instead of creating a new one, the Group settings are worth checking out. They might give some insight into why something is not working as expected in your DeepLens project. You can for instance set the user ID and group ID that are used to run the lambda functions, and enable or disable containerisation.

If you want to have write access to the filesystem on the DeepLens while keeping the containerisation, you can also add write permissions for the lambda functions in the Resources section in the Greengrass group console. This is a bit more fiddly though, since whenever a new DeepLens project is deployed, the resources available to the lambda function are reset to the default settings and need to be adjusted again.

The End Result

Here’s how the project feed from the DeepLens looks like with the face recognition enabled. The bounding box is produced by the face-recognition library and the initials are stored together with the feature vectors.

The next phase is to detect visually when the door opens. With this addition, the door can be used as a trigger for the face detection, instead of keeping the face detection algorithm running all the time. One way to do this is by using the floodfill algorithm from the OpenCV library. Floodfill basically starts from a specific pixel in the image, and then proceeds to colour all connected pixels that are close enough in colour, similar to the ‘paint bucket’ fill tool in image editing software.

If we limit the examined area to the top part of the door and its surroundings, we can fill the door, measure the filled area, and use the area to decide whether the door is open or not. If the area is too small, the door must be open. The following image shows the idea. I’m limiting the flood filled area to the top part of the door to minimise false positives caused by objects between the camera and the door. How well this works depends on the colouring of the door and its surroundings, and various other factors. Below you can see the system in action.


All in all, I’m quite happy with the results of this experiment. The theme song functionality wasn’t yet implemented, but with working face recognition, it’s only a matter of sending a message to another device. Realising that the DeepLens basically functions as a Greengrass core made it easier to find solutions to the problems I encountered. Of course, the way I built the project means that I’m intentionally ignoring the deep learning capabilities of the device, since I’m not taking advantage of SageMaker models and the GPU computing capabilities that the device offers. Then again, I consider being able to use the device also for this sort of prototyping an added bonus that increases its value for me.


[1] https://en.wikipedia.org/wiki/Matrix_norm#Frobenius_norm