Can’t believe I am using ML!

I am still a newbie with P5js. Last year I had tried to make a simple platform game with keypress actions to move a character around the screen in my leisure time. So when it came to applying ML library for ‘body as controller’ there was more curiosity than fear of using a new library or attempting something new. It was delightfully easy to start with Ml5 and poseNet. My only foray into using AI/ML till now was building a perceptron program coding along Daniel Shiffman’s video.

I have tried to use code shared from coding train video + and modified it to try out newer ways to click and scroll. I didn’t want to keep the benchmark too high for myself and fall into the trap of chasing it. So I thought up a simple experiment of making (reusing from tutorial video) bubbles float around and using body gestures to click for the first exercise. I am still not touching the confidence scores as well looking at fine-tuning the positions of the pose Key points. 

I did learn about and have tried to use LERP function and liked how it allows us to smoothen the movement of pose Key points by taking the midway point between what’s predicted and current position. It makes the jumps a little less jittery. I also liked capturing click events using distance function from Dan’s videos. It was strikingly effective and I couldn’t believe it solved the problem so quickly of checking whether both wrist key points are inside the bubble.




The action is to cross your wrists to make a pose ‘X’. Where the two keypoints of left wrist and right wrist intersect any of the floating bubble, the response changes the colour of the translucent bubble to white indicating the CLICK. The code checks for distance between each of the circle’s centre and the bubble’s centre and qualifies it to be a CLICK event if the range is within the size of the bubble’s radius.

So by taking one wrist, the program doesn’t consider it a CLICK event until the second wrist is brought in the are engulfed by the bubble.

Present View –

Code View –

Video –




The action is to rotate your elbow as if you are turning a wheel by holding one of the edge of the wheel. The key points of right wrist defines the angle which is read by arctangent to be translated into SCROLL. SCROLL value can be mapped to the rotation angle described. I attempted to use both wrist and elbow point for accuracy but couldn’t use it employ in the code.

The gesture is similar to browsing a carousel where in you are repeatedly rotating the carousel to bring the next horse towards you. Got the chance to use Push & Pop function here to draw the rectangle. And used Translate and Rotate for positioning and orienting it. the Translate put the origin to the centre of the screen and made the angle detection possible. The challenge here was working with trigonometry and arriving at a common understanding of whether angles or radians can be called into the arguments.

Present View –

Code View –

Video –


Just because you have ears 😛


A simple gesture of tapping your ear to your shoulder to register a click has been tried with Blazeface model/detector of Google’s TensorFlow library. A boolean flag has been made use to capture the click event and show the eyes for “MOUSEDO

WN” event and make the eyes disappear on “MOUSEUP” events.

Present View –

Code View –

Video –



This experiment has given me some kind of confidence to use computer vision. I can now think about all the possibilities of creating simple experiments around body guestures and interactions that we embody on regular basis and whats possible in using AI to classify them, make inferences. It made me think about making use of this to define things to a digital assistant. I could just signal to alexa to change a song by winding my hand or something on those lines.


References and code reused from:


The Coding Train:

Multiple hands detection  for p5js coders. Accessed 8 Sept 2021.