June 15th, 2021
It has been a while since the last time I posted a tutorial, or something in general. Basically life happened and I decided not to share rather than sharing low quality content. Today, I'll walk you through a computer vision project that takes your live video input and translates your blinks into Morse Alphabet so you can blink short and long to write messages.
The source code for the project is here, I also used this awesome tutorial as a boiler plate to start with, if you want to learn more about Computer Vision applications you can check the channel owner's channel from the link I posted. So without further ado let's dive right into it.
As for the beginning I want to explain MediaPipe library a little bit, "MediaPipe offers open source cross-platform, customizable ML solutions for live and streaming media.", this definition is from their own website and explains what you can do with that library shortly and cleanly, they offer several other solutions that can run on different platforms and I'll explain all of them in a different post in the future. The feature that we'll use today is called "Face Mesh", this solution provides us a face landmark map with the most important 468 landmarks that can be seen in a human's face. Using that map we'll calculate the ratio between some particular points in the face and with that information we'll detect if the person on the camera blinked or not.
In the picture above you can see the point that I mentioned earlier, with that kind of an access to the landmarks in human face you can make all sorts of detection that are related to facial expressions, another cool library that offers all of them built-in is OpenFace you can also check out that library.
To be able to detect if an eye is blinked or not we use "ear" ratio which stands for "eye aspect ratio" to be able to calculate an eye's "ear" value we need to access 6 landmarks at one's face. You can see how "ear" is calculated and which landmarks we need from the link here.
Now let's move on the coding part; first we need to do the imports that we need. I also defined a function to play around with frame size. We first define a previous time variable to detect FPS later on, then in order we define drawing object so we can draw between landmarks, we create FaceMesh object and finally we edit our drawing specs.
We then define 2 arrays to keep old ear values, 1 array to keep letters, 1 bool to keep if signal is long, 1 for keeping blinking duration, 1 for keeping non blinked margin, 1 to keep is blinked, 1 string to keep Morse Code string, and the last one is to keep found letter.
Dictionary to keep Morse Alphabet.
Main loop of our project, read source image, upscale input by 50%, convert from OpenCV's BGR to RGB because it's needed in this form from MediaPipe, then we calculate FPS via taking current time and dividing 1 with the difference of current time and previous time.
If there are landmarks for each landmarks first draw the connections between and then calculate ear values.
If count is bigger than 10 then make it zero, else increment by 1. If previous ear arrays are longer than 10 change the current element with the new ear value and make long check true.
If is long and current ear values to 9 older ear values check is holding, detect blink and increment "blinkedFor" by 1, inside the else if blink was longer than 0.8 secs then count it as a L(Long), else S(Short), otherwise increment the not blinked time by 1, if not blinked duration is longer than 2 seconds ( 2 times as FPS ), if the letter we found in MorseCode add it to letter array if not just clear the array.
Then finally take the found array, turn into string, and then put to the page, a small surprise though, if text has my GF's name (Ece) inside than it automatically adds <3 to the end. :D
So that's it everyone, I hope that the tutorial was easy to understand and interesting. I'll be keep publishing regularly again without a day restriction this time, I hope to see you in the next ones. :)