I was speaking with a leader in industrial robotics about some of the challenges they face in maintaining compliance with GDPR. I learned about the concerns for protecting personally identifiable information when images are being sent to the cloud at 5 Hz.

My initial thought was that, since these robots are performing object detection on-device, why not blur within the bounding box around a person pictured? However, I learned the problem was more subtle.

This is because it may be of interest to identify an object held in the hand of a worker on the factory floor, for example. Let's assume that personally identifiable info is not limited to photos with faces, otherwise we can get away with face detection and occlusion.

My partner and I have also been working on a project to develop an AI personal trainer. In this project, we are making use of PoseNet. As I went about my day, it occurred to me that this model is much more specialized to localizing on humans and so I started experimenting.

With the use-case of tracking hands in mind, I ran PoseNet on some video of dancers. Applying opencv, we have a fast implementation which masks everything but the hands.

track body part PoseNet

Since PoseNet does not annotate every image frame with the location of a wrist, some images were discarded. Additionally, you'll notice the dancer move his wrist in front of the face. Also, for this quick demo, I chose one fixed size for the circular mask.

The natural next steps would include exploring ways to improve the quality of inference to identify the wrists. Or perhaps using continuity in time to guess the most likely position. In fact, using the configuration of the rest of the body, we can make a pretty good guess i.e. the wrist bone is connected to the arm bone :). This additional information will also help refine our mask when the face and hands are framed closely together. Finally, a simple rule which scales the mask with respect to the relative size of the figure pictured would help keep a tight crop around the hand and whatever we expect it to hold.

PoseNet is fast and specialized to identifying keypoints on the body. This makes the model a natural choice when we are interested in analyzing human behavior. Using the fact that bodies move smoothly in time along with their geometric regularities, we can more efficiently and precisely answer questions like these.