Metal Toad’s winter 2018 hackathon was all about Virtual Reality and Augmented Reality (VR and AR). At its most basic, AR is taking a camera and adding extra information to it, like PokemonGo or AR Stickers on the Pixel Camera.
The idea
When the theme of VR/AR was announced in September, I was reading an article about vOICe technology and was intrigued. What if we could overlay images on what the camera saw? Then you could have audio-based AR for everyone without needing to look at a screen all the time.
Upon researching this, I discovered that using a sensory substitution technology can take weeks, months, or even years to fully train on—not exactly ideal for a two-day hackathon. But the idea of using AR with audio was stuck in my head, and I wanted to use it.
Eventually, I realized that this idea could offer a solution to an everyday problem: remembering people’s names (something I’m not great at). So I settled on using image recognition to give audio feedback of who you were looking at. Knock-Knock was born.
Implementation
Team Knock-Knock consisted of Vaughn Hawk, Oren Goldfarb, and myself. We didn’t have any knowledge of how to do facial recognition or video processing. But we leaned into what we did know: Amazon Web Services (AWS). We discovered they had a facial recognition service, Rekognition (part of the AWS Machine Learning suite of products)—and, best of all, it took a video stream from their Kinesis Video Stream.
Now that we had tools, we found some tutorials and AWS Training documents. We divided the AWS stack and started building. What we built would eventually look like the illustration below.
Once the AWS infrastructure was running, we set about pair programming on the Raspberry Pi. We put together a simple yet effective device you can use to aim the camera at a person, and the device recognizes their face and reads out their name.
Along the way we decided that returning the name wasn’t enough. We wanted to get information about the person we were interacting with. So we setup a simple database with Metal Toad names and titles. Then when the data came back, it would check Dynamo and read out the name of title as well.
Tough problems
We had two major problems working on this stack.
We thought about using a different tool, but with limited time, we came up with a different method to speed this up. We doctored some fake json records and directly put into Kinesis Firehose. These records were used to fill up the 1MB buffer and cause Kinesis Firehose to output a file. The fake records were doctored in such a way to allow us to filter them from the real data. This sped up the data to 20 seconds. Still slow, but much better for our demo.
Next steps
If we keep working on this project for real world distribution, there are some problems we encountered that need to be solved:
Learning for the future
This was the team’s first time developing something that used Kinesis Video Stream, Kinesis, Data Stream, Kinesis Firehose, DynamoDB, and Rekognition all together. Within just 48 hours, we were able to get familiar and comfortable working with these technologies. Knowing that we can leverage our existing AWS expertise to integrate more Amazon services—and do it quickly—gave us all a lot of confidence going forward.
During the demos, Metal Toad’s CTO Tony said, “We have clients who want to try using facial recognition technology, but they're hesitant because it seems to hard." Maybe it was a little hard, but I didn’t really focus on the difficulty. I simply saw a problem that sounded fun to solve, then broke it down to manageable tasks, and worked through each step. We had problems, we worked through them. Working as a team, we found that the difficulty of a problem is irrelevant; it’s all about the process and an Agile approach that makes even large challenges manageable.
Whether we use the facial recognition features again or not, the experience with moving data in the suite of Kinesis tools will be a step forward as Metal Toad continues to build more IoT and data analysis projects for our clients.