Machine Learning: PoseNet

By: April and Randa
Link to sketch code:

To explore and understand how PoseNet works and how it can be altered and changed. We don’t have a lot of coding experience so we decided to take this slow and really understand the tutorial by Daniel Shiffman. Once we understood that we wanted to see what the limitations were and what we could build using this platform.


Following the tutorial, we went step by step to create the canvas and set up the video capture. Before moving on we played with a few of the filters provided in the P5 library.


Once that was set up we moved on to setting up PoseNet. By using TensorFlow.js PoseNet model, the computer pulls up poses that match ours in real time, and displays skeleton-like graphics of our body. PoseNet can be used to estimate either a single pose or multiple poses, but just for simplicity we experimented with the single pose that detects only one person in a video.

We copied into the p5.js editor’s index.html file the below url for the ml5.js library: <scriptsrc=”type=“text/javascript></script>

Importing this library allows the program to detect and trace keypoints positions in the face and body using the webcam. Then one can create interactive art or graphics that respond to body and face movement.

All keypoints are indexed by part id. The parts and their ids are:

When it detects a person, it returns a pose with a confidence score and an array of keypoints indexed by part id, each with a score and position.


This information allowed us to identify the positions of each pose keypoints, and also the confidence score of detecting them. After retrieving this information we played around with the code and tried out assigning shapes to different positions and keypoints. We learned also that to make shapes scalable we need to input their size to be a variable defined by the distance between the nose and an eye.


This is another example we explored that creates wobbly eyes.


We were both really blow away with how accurate PoseNet’s location detection was. We both played with it alone on our computers in class and it was eire how other it was right.

However, There is a lower accuracy detecting keypoints of the body with multiple people then there is with just one person. We played around with a few demos and noticed the lag time increase when we were both in view. In the video above, the clown nose was juggling between both our noses, trying to detect the more confident output. It actually was kinda a fun effect, we felt it felt like playing digital ping pong with our noses.

Now that we played with the demos, time to see what we can do with it.



Unfortunately, we forgot to hit record on this part because we had a lot of fun trying to figure out how target other points in the PoseNet library and add silly alien anteni and understand how the example code for the googly eyes worked.

Here is the breakdown of what we went through to become extraterrestrials.

Step 1: Locate where the ears are
Using the example code from the hour of code tutorial and the index of part id’s, we could easily locate the ear location – Left Ear 3 and Right Ear 4. As a placeholder we drew two ellipses over the ears to make sure it was working.

Step 2: Create anteni above our ear location
Now that we had red ellipses floating on top our ears, we needed to figure out a way to move them up above the ears. Since we just wanted to move them up, we only needed to change the Y axis. In the string that indicates the placement of the ellipses over the location of the ear, we made one slight change. We multiplied the the ‘ear1Y’ variable by .5 and that gave us enough height.

ear(ear1X, ear1Y*0.5);
ear(ear2X, ear2Y*0.5);

To create the anteni we simply change the x and y height so that the ellipse would we long and skinny and change the red colour to a RGB value that will give us bright green.

function ear(x, y, size, n) {
fill(164, 244, 66);
ellipse(x, y, 5, 100);

Step 3: Bring in the googly eyes
The googly eyes was part of the demo code that was included in the hour of code tutorial we watched on PoseNet. All we changed for this was the frame count to 2 so the eyes would spin faster and we changed the eye colour to green.

function eye(x, y, size, n) {
let angle = frameCount * 2;
ellipse(x, y, size, size);

fill(164, 244, 66);
ellipse(x+cos(angle*n)*size/5, y+sin(angle*n)*size/5, size/2, size/2);

Step 4: Add a filter to the video capture
The last thing we did to achieve this strange alien look was to add a filter to the draw function.

filter (INVERT);

Final Obduction #OOTD


Information sources

Overall, we had a lot of fun playing with this. There is so much more we can do with it and though this learning experience we noticed how achievable it was for us to use it. Neither of us know much about javascript but we were able to figure out how it worked and how to start making changes. We think it would be interesting to continue exploring this tool.

PoseNet Experimentations

Workshop 6 Notes

PoseNet Experimentations: Replacing the mouse with different tracked body points, and using gesture as a type of controller through PoseNet js.


PoseNet js is a position tracking software that uses real-time data analysis to track 17 key points on your body. The key points are translated into a javascript object which documents at what pixel position each point is currently at on the webpage. These experiments attempt to see if these pixel positions can replace the computer mouse functionality and create new control gestures through tracked body movement.


I introduced myself to PoseNet through Daniel Shiffman’s coding train example that uses PoseNet with p5.js. This example was thorough and explored the base concepts of:

  1. What does PoseNet track?
  2. How do you extrapolate that data?
  3. How can you use the key data points analyzed through computer vision in relation to another?

A clip of myself following Daniel Shiffman’s PoseNet example. Note how the code is written to isolate certain keypoints in “keypoints[1]” etc. 

PoseNet uses computer vision to document the following points in the form of “pixels” on your body with a declared accuracy. The library for PoseNet allows you to identify the poses of multiple people in a webcam. My explorations will be focussing on actions and gestures from one participant.

A list of the key points on the body that PoseNet analyzes and creates positions for in relation to the video.
A list of the key points on the body that PoseNet analyzes and creates positions for in relation to the video.

This data is then translated into JSON that the PoseNet library instantiates as a data structure and that beginning of your code.

A screenshot of the JSON that PoseNet provides that documents the coordinates.
A screenshot of the JSON that PoseNet provides that documents the coordinates.

The JSON gives you an x and y coordinates that indicate the position of each point being tracked. Because of the way the JSON is structured you are always able to accurately grab points to use as inputs.

I found that the points are very similar to p5.js mouseX and mouseY coordinates. I decided to pursue a series of small experiments to see how to use gestures as a mouse movement on the screen.

Screen position & interaction

I wanted to see how I could use the space of the screen to interact with my body that was analyzed for data points on a webcam. I wanted to explore raising my hand and see if I could activate an event within a part of the screen. Interestingly enough, PoseNet does not track the point of a user’s hand but is able to track their wrist. I decided to experiment with the wrist point as it was the closest to the hand gesture I wanted to explore.

I used the method in Shiffman’s tutorial to isolate my wrist (key point 10). I then divided the webcam screen into 4 quadrants. I found the x and y coordinates of my right wrist. If the coordinates of my right wrist were located within the top left* quadrant, an ellipse would appear on my wrist.

*Note, the webcam image is a flipped image of the user. The right hand is shown on the left side of the screen.

Video documenting myself activating the top left quadrant with my hand. 

I expanded on this idea by seeing how I could use this gesture as a controller to stop and start an action.  I chose to draw an ellipse centred on my nose. Upon page load, the colour of the ellipse would randomly be chosen. If the coordinates of my right wrist were located within the top left quadrant, the colour of my nose ellipse would start to change.

Changing the colour of my nose by using the gesture action of raising my right wrist to randomly chose the red, green, and blue values. 

When the wrist leaves the quadrant, the nose takes on the last colour that was randomly chosen. This was a quick proof of concept. The loop happens quite quickly so that the user has little control over colour is selected.

The right wrist point was an interesting key point to chose. I found myself always tracking my hand rather than my wrist. Upon testing, I noticed myself having to raise my hand higher than I anticipated to activate the quadrant. I do not think this is because of the tracking accuracy, but rather the human discrepancy vs the machine understanding of where does my wrist start and end.

2) Buttons (almost)

I was able to isolate a specific quadrant on the page and could start and stop actions. I was curious to see if I could emulate a button click, and give further didactic information through HTML inputs.  

I created a button that would change the colour of the nose upon hover. I found the coordinates of the button, the width and height, and then compared this to the value of the nose pose. For this experiment, I chose the nose rather than the wrist because I wanted a one to one relationship with the control of the action and the gesture (rather than having a different body part control the output of another point).

As per the previous experiment, p5.js generated a random colour for the nose. When the nose interacted with the button on the page that declared “change colour of nose”, the colour of the nose would change colour.

PoseNet tracking my nose and interacting with a button created in p5.js

This button is an artificial stand-in since there is no mouse press event that is fired. It is simply tracking through position. I considered attempting to feign a mouse press, but chose to not because of the outcome of this design was the same desired result.

3) Mapping

My next experiment went to further gesture through the exploration of mapping. I found the nose ellipse colour to be an effective representation of action on the page. I chose to map the coordinates of my hand across the screen to the colour of the nose ellipse.

A screenshot showing the coordinates of the right elbow.
A screenshot showing the coordinates of the right elbow.

A video of my wrist controlling the colour of my nose through a sweeping gesture. 

I enjoyed how this emulated a slider. P5.js has the capability to include a slider for volumes and other values, such as RGB. This gesture almost replaces the functionality. The next step for this gesture would be to “lock” the value of the slider rather than just having a continuously mapped value.

4) Two body points interacting together

For this experiment, I wanted to explore two parts: the first, how was it possible to create an “interaction” with two points; the second, is it possible to trigger an output like a sound?

To achieve this, I chose to see if I could use my knee and my elbow to create a gong sound when they “collided” together.

I started off by finding the coordinates for the knee and the elbow. The x and y coordinates provide very specific points, it is not a range, nor is there an identifiable threshold that is given. These two sets of coordinates are not tied to the screen as the previous experiments and would need to respond in relation to each other.  This posed a bit of an issue, how would I determine if the knee and elbow were colliding into each other? It would be too difficult and unintuitive to attempt to have both the knee and elbow interact when they had the exact same coordinates. I decided that the knee would need to surpass the y coordinate of the elbow marginally to indicate a connection between the two points.

Two start the visual representation of this experiment, I did not draw anything on the screen. When the knee and the elbow “interacted”, an ellipse on the knee and elbow would draw.

A video of my knee and elbow attempting to trigger the drawing of two ellipses.

I found that this was effective every other attempt. I added ten extra pixels in each direction to see if this would help expand the range. This helped only marginally.

My next step was coordinating sound. I found a free gong sound file that I loaded into p5. This, unfortunately, did not work as I wanted to. Since I was triggering a sound through the .play() function, when my knee and elbow collided for more than a 1/60 of a second, the .play() function was executed. This was the beginning of a journey of callbacks and booleans that I decided not to include as it was deterring me from the exploration of PoseNet.

The knee and the elbow colliding was not as satisfying as an action or of use as a controller as I would have liked. The action itself is very enjoyable and having the sound of a gong would turn the body into an instrument. This action would work better if the sound was continuously playing, and then body controlled the volume or other aspects of the song rather than controlling the “starting” and “stopping” of the audio.


PoseNet provides a useful data model that allows people to located their key body points on a screen. The points are accurate coordinates that can be used to interact with the screen without having to use a mouse. The experiments I conducted are initial explorations into how it is possible to use the data from the PoseNet as inputs. The explorations can be used as basic models for common mapping functions, mouse clicks, and multiple input events. The output of the colour changing nose can be replaced with outputs such as sliders, changing filters, or starting and stopping events. PoseNet is a powerful tool that can take complex physical data and create a virtual map of coordinates for people to interact with online.

My next steps in these explorations would be to explore more complex and combined patterns of interaction. All of these explorations are single actions that provide one result. PoseNet offers the ability to use up to seventeen points of data from one person, which can be multiplied with every person interacting on the same screen. The multiple poses could lead to interesting group cooperation exercises, such as all participants raising their left wrist to create an output. The basic experiments of input and output still apply, though the logic would need to change to account for multiple bodies.

Overall, gesture-based controllers seem futuristic but the technology has become extremely accessible through PoseNet and built-in webcams. PoseNet allows an easy introduction to browser-based computer vision applications. These experiments offer a very basic understanding and introduction into common gesture interactions that now become possible through PoseNet.


PoseNet documentation

How to Build a Gesture Based Game Controller 

Real-Time Estimation Using PoseNet



In a world full of interactive devices we find ourselves surrounded with sensors, joysticks, screens, etc. For this assignment, we decided to explore a different kind of input that does not require the user to press any buttons or screens, instead, we wanted to explore using the camera and user’s face to send commands to a computer. FaceTracking is a p5 application that uses the computer’s camera and an algorithm to read and understand the user’s face. This application is currently set to play a sound with the user’s face movement, however, any function can be added to this application.



In this project, we will explore using the user’s face, as a controller to send commands to the device. This tool is a basic prototype but has potential to be scaled to include any number of functions that run based on the user’s face manipulation and movement.

This tool is based on the FaceTracking application using Haar Detection technique, which uses an algorithm that contours the user’s eyes, nose, mouth, eyebrows, and chin. Each element of the face is given a number and then a vector is drawn connecting the numbers. Using this tool, we were able to make a simple beat player that the user can play simple music with.  


Link to Code


Design Process

High-Level Computer Vision focuses on a complex analysis of images. When talking about CV and faces, there are three major sections:

1)      Detection: spotting the difference between a face and non-face,

2)      Recognition: distinguishing different faces,

3)      Tracking: a combination of detection and recognition over time.

We wanted to explore the face tracking option and create a controller using our faces. We started with Kyle McDonald’s Face Tracking Example.



We found this Class Notes from McDonald that explains all you need to know about CV and faces. OpenCV uses a Haar Detection technique, developed by Paul Viola and Michael Jones in 2001. Haar detection can be used for any two-dimensional object, but it can not handle any significant rotation or skew. It is also very limited on the colour variation that it requires. There is a video about HP Computers that could not follow Black faces.

The face tracking example identifies 70 points on the user’s face.

We took the key points that layout the face’s elements and draw contours around them. We didn’t notice a lot of change with the eyes, eyebrows, or the nose. But we were able to rotate the general contour of the shape. So we took the two points that defined the edges of the faces and compared them with each other. By comparing their Y position, we were able to identify if the face was tilting in any direction. After that, we added music to each direction so that the user would be allowed to play music by moving their heads.


Tools & Materials Used

P5.JS online editing tool


Two Mp3 files






Trying to determine which direction the face was angled either up or down was slightly confusing. We used a calculation to determine the exact point at where the controller would be activated but realized we did not need to after re-thinking our logic. (Omid can you elaborate on this part a bit? )


Future Steps

Future iterations could include a new beat everytime the user opens the page. So other could make a variety of beats together in front of the camera on their own devices. The webcam could also include more than one face and once again provide a random mix of  different beat set per controller, since it currently only recognizes one.


Useful links to look into:


OpenCV Website:

Kyle McDonald’s Class Note :

Kyle McDonald’s CV Examples:

OpenCV Face Detection: Visualized

How To Avoid Facial Recognition:

Face Detection For Beginners:

Subverting Body Tracking

Veda Adnani, Nick Alexander, Amreen Ashraf

Our response includes slide deck linked here.

We examined the field of countering computer vision (with a focus on face detection), began to speculate on further developments, and consider research and design projects.


For our research on computer vision, we used a top-down approach. We started out trying to understand what “computer vision” is and what its implications are. Computer vision is the name given to a series of technologies which help the way a computer sees. The human eye is important to the way we use our visual understanding of the world to piece together information, in the same way, the camera is the eye of the computational device.

As of 2019, computer vision is all around us. Our smartphones, apps, social media, banks and other industries, use computer vision every day in aiding humans to carry out tasks with computational devices.

In Class Activity:

We started out by doing the class activity which was to research our topic. Some of the apps we looked at were those used commercially like the newly acquired app “…” by L’oreal. We also looked at the list “faces in new media” which is a list by Kyle Mcdonald (Face in new media art Kyle Mcdonald). The list is comprised of artists using computer vision in new and novel ways. In this list, there is a section on intervention which highlights arts using computer vision to counter tracking and thereby subverting these technologies.


We conducted a broad range of research to understand face tracking used by industries and governments to not just collect data but to classify humans and other potential uses of computer vision. Some initial concepts we jotted down were:

  1. Deepface: using facial recognition AI algorithms to alert or highlight when being detected.
  2. Blockchain: using blockchain technologies to scramble and save data on different databases for security.
  3. Physical: Using physical objects or clothing to misdirect.


Some interesting things we came across in this phase of the research was the way governments across the world are using computer vision. Privacy international is an NGO that does a lot of work with the legality of the ways in which computer vision is currently being implemented.



Instagram Face Filters:

Our first and most basic experiment was experimenting with Instagram face filters to understand the extent to which they can be used to alter, modify or even transform the face. One of the most striking filters that we found is shown below. It is called “Face Patch” and it gradually eliminates all the features from the user’s face leaving them only with a blank patch of skin and the outline of their head. We leave this finding open to your interpretation.


Beating Apple’s Facial Recognition

We tried deceiving Apple’s “True Depth” Face ID by using photographs, however this did not work. What did work was when we tried using a mirror to detect the face, and we found this odd since a mirror is a flat surface and cannot convey depth. Yet it somehow managed to cheat the software and unlock the device.


Modiface :

We experimented with Modiface an AR app that uses facial recognition to mockup different cosmetic products on the wearer’s face. A range of brands like INGLOT use this platform to advertise their products but what caught our attention was the apps ability to remove any scars and blemishes on the user’s face, even ones that the user was unaware of. It also allowed the user to change their eye colour if they desired. This was quite disturbing, and a rude awakening into the lengths that the beauty and cosmetics industry goes to, to promote vanity and unrealistic aesthetic perfection.



Free and Accessible Resources:

Accessible and free sources for body tracking are easy to find. Simple but robust face tracker tools made by independent developers, like CLM Face Tracker or Tracking.js are available with a minimum of web searching. More robust face tracking technology, such as that developed by Intel and Microsoft, is easily accessible by businesses. Body tracking code such as Posenet can also be found very easily.

For those who care to look, face and body tracking is widely available and can be adapted to a user’s purpose with no oversight.


Deceive v/s Defeat:

Through our research process, we came across two possible scenarios to subvert face recognition products. The first one was to “deceive” the intelligence into thinking that the user was someone else, and the second one was to “defeat” the system by rendering the user unidentifiable using certain tactics. Our findings below cover both of these possibilities.


In light of the examples listed below, we see an emerging need for subversion. Our identities, faces and bodies are sacred and personal. But we are constantly being violated by multiple entities, and it is unfair to be subjected to this kind of surveillance unknowingly. Where does this impending lack of trust leave humankind?


Amazon Rekognition:


Amazon claims “ “Real-time face recognition across tens of millions of faces and detection of up to 100 faces in challenging crowded photos.” And was recently caught secretly licensing this facial recognition software to multiple state governments in the USA. With is real-time tracking and the ability to analyze several camera-feeds in multiple cities simultaneously, this is a serious concern for privacy and consent with government surveillance entities.




Using the same technology provided by Amazon’s recognition. Butterfleye is a B2C facial recognition device, that was built to help businesses get to “know their customers better”. Every time a customer enters any business establish like a coffee shop/salon/bank, the person serving the customer is immediately given a bank of data including the customers personal details, preferences and purchase history. They claim its a way for businesses to become more “efficient” and serve customers better, but where does this leave any possibility of privacy for the average human being?


SenseTime: Viper Surveillance System


SenseTime is a Chinese company that focuses on AI-based facial recognition systems. It is currently the most highly valued entity of its kind in the world at a net worth of 3 BILLION dollars. It’s flagship product the Viper Surveillance System detects faces in crowded areas and is most used by the government. What is shocking is that the government uses this technology the most in provinces with dense Muslim populations to track “terrorist” activity. However, its claims for doing so are far different.


Government claims across the globe:

Most governments are employing facial recognition software for various reasons. Some claim it is to find missing children, others claim it is to prevent and stop human trafficking. However, the actual uses are far from the truth they project.


AI-Generated Human Faces

New Website Generates Fake Photos of People Using AI Technology

AI-assisted image editing is used in the creation of “deepfakes” (a portmanteau of “deep learning” and “fake”) which are high-quality superimpositions of faces onto bodies. Generative Adversarial Networks have also been used to generate high-quality human faces, which, using face tracking technology, can be made to seem to be speaking in real-time.

Video forensics can be used, or image metadata can be extracted and analyzed, to identify AI-generated faces and videos.

How does one evade these various entities?

Classifiers v/s Detectors:


One of the key differences in surveillance systems is that between classifiers and detectors. While classifiers work towards categorizing pre-determined objects and are commonly used in face surveillance systems such as Apple’s True Depth Face ID with 30,000 touch points to identify faces. Detectors have to locate and determine objects themselves, i.e. create their own bounding boxes and are used in areas like autonomous driving vehicles.

NSAF: Hyphen Labs


Hyphen Labs is a multidisciplinary lab which focuses on using technological tools to empower women of colour. They use human-centred design and speculative design methodologies in the aid of prototyping technologies. They have developed a concept called Neurospeculative Afrofuturism which integrates computational technologies, virtual reality and neuroscience to aid in the design of prototypes. HyperFace is a prototype which uses many faces drawn onto a scarf to misdirect the use of computer vision in data collection and profiling. It uses the data points used by tracking software to graphically design a scarf which has many of these points. It also uses certain colours which are not recognized by this software.


Glasses that confuse surveillance:

Researchers at Carnegie Melon University have devised a pair of glasses that “perturb” or confuse facial recognition systems



Facial Camouflage that disturbs surveillance:

A team of researchers at Standford U led by Dr. Jiajun Lu have devised facial camouflage patterns to confuse cameras. This pattern renders the face unidentifiable from various angles, distances, lighting and so sob. They are experimenting with “living tattoos” for the face to create long term solutions to fight surveillance.



NIR LED Glasses, Caps or Burqa:


A low cost and feasible way to avoid any facial surveillance system is using Near Infrared LED lights. The lights are practically invisible to the naked human eye and when designed well in a prototype they can go unnoticed. The lights successfully blind cameras. The first prototype was a pair of eyeglasses designed by professors Isao Echizen and Seiichi Gohshi of Kogakuin University, and since then various prototypes ranging from caps to burqas have been made. The lights are inexpensive and available on Sparkfun.


URME Mask:


THe URME mask is a 400$ mask sold at cost by its founder to help people evade surveillance. When worn, it is extremely realistic and the only time a wearer can be detected is when the lack of lip movement is noticed.

Facial Weaponization suite:


Facial Weaponization is a series of modelled masks created in revolt to the political spectrum of facial surveillance. The masks are made in workshops using aggregated data from participants that are unrecognizable by biometric facial surveillance systems.



In addition to exploring existing forms of facial countermeasures (like CV Dazzle) we considered utilizing the technology against itself. We imagined a digital mask that superimposed itself over any image recognized as a face taken by a device it was installed on, scrambling it and rendering it useless for facial data. We also considered bio-powered Near-Infared LED stickers that could be placed subtly on a face, and powered by body electricity.


Group: Erman & Jing


  • Strategy:  

Project 1: Birthday Filter.

March 6th is one of my best friends birthday. She lives in China. I always want to come up with a way to celebrate the unbreakable friendship bond with a very special gift for friends. She lives in China. I thought the best and memorable gift to give her is a birthday filter I made. We used p5 javascript and HTML to build a computation visualization experience based on Kyle McDonald’s CV examples.



Project 2: Mixing Face

Try mixing traditional art skills with your digital painting process for unique-looking imagery, says illustrator Jean-Sébastien Rossbach. Mixing Face is a filter mixing your face and traditional art pieces build with HTML, p5 javascript and ml5 javascript. It will change your skin tone using the colour and shape in that painting. What paint to use is a matter of personal preference and style. Mix your face with art pieces you choose and you don’t need to use photoshop this time.



Software and libraries:

  • Text Editor
  • Download p5 javascript libraries.
  • Download ml5 javascript libraries.
  • Cyberduck
  • Documentation:

Project one is built based on Kyle McDonald’s CV examples

Experience 1: Kyle McDonald’s CV examples

We played with a collection of interactive examples using p5.js through the link(CV examples) Kate gave us. The examples are meant to serve as an introduction to CV and the libraries we can use. The examples in this link use p5.js to access live video. All examples are self-contained and can be run independently, so we tried all the examples and tried to learn the p5.js code.

The example I liked most is nose theremins and light painters that used our body as a pointer in p5.js. One key feature of this experiment allows people to use their body parts as pointers, instead of the mouse.


(experiences of trying example code online)

Beyond the example code, I made a few changes:

To change the amplification

input.amplification = 2;

To track other body parts:

Change the code “input.part = ‘nose’;” to other part of body you want to track:


(syntax for input.part)

The Creatability experiments include several musical instruments. Having multiple interaction modes can make creative coding projects more expressive and engaging.


(experience with creatiability musical instruments)

Instead of having body posts as input only, I want to have some output for the overall experiment.

(Things need to use when building this online project)

Then we found Tensorflow.js and Tone.js is beyond our capability that we couldn’t find example code for triggering music online. We decided to go back to our original idea of birthday filter.

We used photoshop to create images we need for the filter. We downloaded 3 celebrities my friend loves and wrote some words.


(Filter image1: birthday hat)


(Filter image2: background)


(Filter image3: boy1)


(Filter image4: boy2)


(Filter image4: boy3)

I also added a birthday song in p5 javascript.

Experience 2 with Processing

We have found two interesting codes. One of them is Daniell Shifman’s motion detection. Other is Abhinav Kumar’s colorDrawing. They both work with Processing.

These are the codes we used: ColorDrawing and Motion Detection.

Motion Detection: This application detects the motion in the camera. Motion appears in white colour and turns to black when motion stops. A created object follows the motion. After seeing this application we decided to that if make some changes and make it leave a track behind we can draw on the screen with our motions.

We could make a few changes in the code, like changing the colour, shape, speed of the object.

ColorDrawing: This application was basically had the feature of what we could not make with Motion Detection app. After selecting a colour by clicking on it it starts making lines with colour and follows the same coloured images in the view. If you click on another colour it starts colour with that colour and keeps the previous line the same. It was hard to draw or write a synchronized camera because switching sides, but with some experience, it could be succeeded.  



We made a few changes in the code. It was easy to change the size and shape of the tracing object.

We also tried to combine two of the code and customize the motion tracking app first. What we wanted was colouring with motion. We focused on motion detection and tried to modify its codes; however, codes did not match and gave an error for each attempt.


Image. One of our trails and errors. A red dot appears and does not move with motion. You can see its code here.


  • Insights:  

I imagined this tool could be used also for video calling. As we use emojis in our chats, we can create instant and live emojis while we are using our camera. We can combine features of the codes we found. When we use the camera, our creation can follow our body parts and can appear when other people or another object appears. You can create a mask or a make-up on your face and can keep it while you are seen on camera. Digital game design is also a possibility. There are many possibilities for CV for colour, motion, face tracking; however, lack of experience and knowledge with coding was a drawback.   

Experience 1: Kyle McDonald’s CV examples_Nose theremins

  1. ml5.js does not depend on p5.js and you may also use it with other libraries.
  2. If you need to run the examples offline you must download the p5.js library and ml5 library or any other library you need.
  3. Attach for the library you are using in html file. For example, the url for the ml5.js library to copy into an index.html file is:<script src=”” type=”text/javascript”></script>
  4. PoseNet on TensorFlow.js runs in the browser, no pose data ever leaves a user’s computer.
  5. PoseNet can be used to estimate either a single pose or multiple poses, meaning there is a version of the algorithm that can detect only one person in an image/video and one version that can detect multiple persons in an image/video.
  • Information sources:

Next Steps:

It would be nice to make an app or web page which people can draw pics. Their instant images can be used and they can just draw with their hand or body motions. The objects around them can be their colour palette. Saving the images and stopping and activating the brush would be necessary. Different filters could give different art outcomes and create different experiences. With some practice, painting with motion, image filters and additional images could be fun for video use.



Computer Vision & Graphics Explorations

Exploring PoseNet & ML5


Workshop insights:

During the workshop I was part of a group that explored PoseNet which allows for real-time human pose estimation in the browser using the tensorflow.js library. Read more about it here . We were able to test PoseNet in the demo browser and during explorations I noticed that the program would slow down when using their multiple pose capture feature. Additionally, I noticed that the skeleton drawn was pretty accurate regardless of how form fitting or loose one’s clothing was. At the time we were not able to test the effect of different colors of clothing as coincidentally all four of us had worn varying shades of gray. We attempted to download the Github repository found here however we had a lot of trouble running the code; A lot of dependencies and setup is required, that we didn’t quite understand.

When I couldn’t get the demo working locally on my laptop I tried following the Coding Train Hour of Code tutorial on using PoseNet that is available here. In the tutorial Daniel Shiffman uses ml5.js and p5.js – ml5.js is a tensorflow.js wrapper that makes the PoseNet and tensorflow.js more accessible for intermediaries or people who haven’t had much experience with tensorflow.js. The tutorial is however not suitable for people who haven’t used p5.js before although in the video, Shiffman links to other videos for complete beginners.

Insights from the tutorial:

In this tutorial I learned:

What is ml5.js? A wrapper for tensorflow.js that makes machine learning more approachable for creative coders and artists. It is built on top of tensorflow.js, accessed in the browser, and requires no dependencies installed apart from regular p5.js libraries. Learn more here

NOTE: To use ml5.js you need to be running a local server. If you don’t have a localhost setup you can test your code in the p5.js web browser – you’ll need to create an account.

You can create your own Instagram like filters! The aim of the tutorial was to create a clown nose effect where a red nose would follow your nose on screen. In theory, once you master this tutorial you can create different effects like adding a pair of sunglasses, or other effects. I learned about p5.js filter() effect which adds a filter to an image or video. I tested out THRESHOLD, which converts the image to black or white pixels if they are below a certain threshold, and GRAY, which adds a greyscale to the video. usage is filter(THRESHOLD) or filter(GRAY);

Pros & Cons of using a pre-trained model vs. a custom model? When using a pre-trained model like tensorflow.js a lot of the work has already been done for you. Creating a custom model is beneficial only if you are looking to capture a particular pose e.g. If you want to train the machine on your own body but in order to do this you will need tons of data. Think 1000s or even hundred of thousands of images, or 3D motion capture to get it right. You could crowdsource the images however you have to think of issues of copyright and your own bias of who is in the images and where they are in the world. It is imperative to be ethical in your thinking and choices.

Another issue to keep in mind is diversity of your source images as this may cause problems down the line when it comes to recognizing different genders or races. Pre-trained models too are not infallible and is recommended that you test out models before you commit to them.

What are keypoints? These are 17 datapoints that PoseNet returns and they reference different locations in the body/skeleton of a pose. They are returned in an array where the indices 0 to 16 reference a particular part of the body as shown below:

Id Part
0 nose
1 leftEye
2 rightEye
3 leftEar
4 rightEar
5 leftShoulder
6 rightShoulder
7 leftElbow
8 rightElbow
9 leftWrist
10 rightWrist
11 leftHip
12 rightHip
13 leftKnee
14 rightKnee
15 leftAnkle
16 rightAnkle

In the array additional information for the pose such as the certainty percentage and x,y co-ordinate of the keypoints are returned. These keypoints are important as they are how you will determine where to generate your filter  or effects e.g. clown nose.


source: TensorFlow here


Some keypoint readings and accuracy recorded from the motion capture of the image above of me sitting down were. These results are printed to the console and are shown here with the array expanded: 0.99 “leftEye”, 0.84 “rightEye”, 0.97 “leftEar”, 0.41 “rightEar”, 0.01 “leftShoulder”, 0.00 “rightShoulder” … 0.02 “leftHip”.

Once I determined that ml5 was working correctly. I drew the clown nose – a red ellipse drawn at the x and y co-ordinates of my nose. To do this I used the keypoint data at index 0 of the array which corresponds to nose info.  To access this data I first needed to access the 0 index of the poses array which holds all the detected poses. This will give me latest pose. Once I have the latest pose, I used the following to update a global variable noseX and noseY e.g.

noseX = poses[0].pose.keypoints[0].position.x

noseY = poses[0].pose.keypoints[0].position.y

The result:


The nose following crashes when you go off screen! You need to use an if-function to detect whenever at least one pose has been found, otherwise the nose will remain stuck at the last part you were on screen



The red nose is too bouncy! I noticed that the red nose was a little jumpy as it moved from position to position. To fix this, I used the lerp function to smooth the values so that the nose doesn’t jump immediately to a new positions. The value to use in the lerp function depends on what looks good to you. Tried 0.2 at first but this was too choppy, so I upped it to 0.5. Since I knew how to detect the nose, I attempted to add an additional keypoint tracking and tracked my left-eye which is at keypoint 1 index.


Red nose is out of proportion! I learned that the distance between keypoints is bigger when you are closer to the camera and smaller when you are further away which caused the nose to be really big when far away and really small when closer. In order to fix this I needed to estimate the camera distance and draw the nose proportional to the distance between my eye and my nose keypoints. This corrects the proportions so that up close, the nose is big and far away it shrinks in size.


Proportions are off



Fixed proportions

It is possible to continue adding effects e.g. I could create sunglasses or a hat to go with my red nose. I however did not like this approach, as it works best only for selfies and not full body poses because there are too many keypoints to keep track off when attempting to create a unique effect at each point, especially with the addition of lerping. To create an effect where there is no keypoint e.g. there is no keypoint for the top of your head but you can use the position of the right and left eye to determine where a hat should go.

Video Classification Example

I was toying around with the idea of having the algorithm detect an image in a video and explored for a video classification. It quickly dawned on me that this was a case for a custom model as the pre-trained model seemed to only work best when generic objects were in view. e.g. At times it recognized my face as a basketball, my hand as a band-aid, my hair as an abaya etc. I also noticed that if I brought the objects closer to the screen, the detection was slightly better. Below are some of my findings using MobileNet Video Classification in p5.


Ideation & Exploring PoseNet with Webcam:

I wanted to leverage the power of PoseNet to track poses in music videos but also subvert its usage to create a trivia game that I called Name That Singer. The idea was to create a video that showed only the pose skeletons dancing and a viewer would have to guess who the singer was based on the pose on the screen. I chose a viral video – Beyonce’s Single Ladies – that I assumed would be easy to figure out. I didn’t take into account how fast they dancers in the videos were moving and this made it hard to determine which song was playing when the skeletons were showing on the screen.

For this part, I decided not to use the lerping function to create a unique effect and instead used the pre-determined functions in ml5.js for PoseNEt Webcam to capture the skeleton. These pre-determined functions were beneficial in this case as my points and skeletons are identical in aesthetic so I was able to cut down on coding needed. I followed the tutorial here and instead of using webcam I loaded my own videos.

Below are some screenshots from my testing. I also tested the poses when filters such as threshold, invert, and blur were added to the video and found that the tracking was really good. Even with cartoons.


Artists/Creative Coding Projects:

Chris Sugrue – She is an artist and programmer working across the fields of interactive installations, audio-visual performances, and experimental interfaces. website


source: Chris Sugrue

Delicate Boundaries – Light bugs crawls off a computer screen onto human bodies as people touch the computer screen, exploring how our bodies interact with the virtual world if the world in our digital devices could move into our physical world.

I liked this project [Delicate Boundaries] because it explores beyond the computer screen, it could be cool to do something like this with PoseNet where instead of just mapping onto the screen, poses can be mapped onto the body.


Real-Time Human Pose Estimation in the Browser with TensorFlow.js – here

PoseNet with webcam in ML5.js – here

Github code: here

Monstrous Anonymity

  • Strategy:

This week the goal was to take a step back and work not with technology, but against it (obviously still with, but, you know). My background in photography has always made my slightly fascinated by facial recognition. Particularly since the liquify tool on Photoshop started to incorporate facial recognition in order to streamline beauty retouching. This, as a tool, generally promotes problematic and harmful ideas around beauty, but can also be used to create somewhat monstrous manipulations of the face. I wanted to explore where this technology starts to break down – when does the manipulation stop registering as a face? When does my face stop being my face to something like google? The ultimate goal here is to make a photoshop action that would take a normal photo of a face and make it not recognizable as human or not identifiable as a specific person.

  • Documentation:

I started off by turning on the facial recognition in my google photos. It took the better part of a day to scrub through all of my photos (26,260), but once set up google auto configures as series of albums that are groupings of the same face. The first photos google has access to of me is from 2010, which as an interesting side note is 3-4 years before I started transitioning, but you can’t fool google!

Or can you???

I have been told in passing the the forehead and general symmetry of the face are the things to manipulate to try and confuse google computer vision so I made a few different liquify presets and then then reapplied it to the same photo and uploaded the results until google stopped recognizing the face as me. I liked the idea of using the facial recognition of photoshop to confuse the facial recognition of google so I wanted to keep the parameters to the parts of the face that can be directly targetting by photoshop.

(as an aside, the rediscovery of many photos from the last 10 years was not always pleasant, so I wouldn’t necessary recommend it as something to go into in and unconsidered way if you’re a person who may experience *feelings*)


Starting Photo!


Experiment ONE:

run Once (still me)


Run Twice  (STILL ME)


Run Three Times (NOT ME)


Experiment TWO

It took 5 rounds of this effect to get to a place where google wouldn’t see me.








A disturbing Gif:

Experiment THREE







Another Gif:

Here are the photoshop actions for people to play with themselves!

  • Insights:

This experiment felt less insightful and more inspiring of more questions. Im curious as to what the results would be if I were to play with colour editing, or noise, or transparencies. It was surprising difficult, or rather, the warping effects felt as though they need to be quite extreme in order to be effective, which was unexpected. It feels as though there should be more errors if the parameters for what registers as my face is so broad, like other people should be getting caught in that net, but they are not. This is part of why I am curious about the manipulation of colour or noise in a photograph for potential further tests. When I was sharing the results with some friends one of them mentions that the eyebrow bridge between your eyes is very crucial in how our faces get read, but that part of the face is not targetable by the liquify panel so would be much harder to incorporate into an action.

  • Information sources:

Photoshop’s 2015.5’s new Face Aware Liquify for Portrait Retouching –

  • Next Steps:

It would be nice to make a site where people upload photos of themselves and get returned a series of results of their unidentifiable. Or maybe even simpler, just a gallery to upload other peoples photos two after they have run the actions in photoshop. I envisioned this primarily as a weird little art project, so it would be interesting to display the results together. If I was going to make it much larger I would try to incorporate some of the colour testing to get more interesting photo outputs and would be interested to do more precise testing to see if i can discover more information about exactly where the line for identifiability is.

Brief Overview of Artificial Neural Network Systems

Last week I gave myself a crash course on Neural Networks and Machine Learning in an attempt to disseminate the concepts behind the core algorithms and processes behind artificial neural networks, machine learning and agents of artificial intelligence. The topics are often discussed from a mathematical perspective; I wanted to review a set of readings and make my own interpretation of what I found in terms of programming terminology for the sake of applying the concepts in my research.

Artificial Neural Networks (ANN) are assemblies of artificial neurons, sometimes called units, which are not designed to imitate actual neurons but modeled after biological findings of how the brain works. We are not quite at a stage of scientific discovery where we can say with certainty how neurons work individually or within a network in the brain but scientists have nonetheless created digital models of their theories of how neurons communicate. These ideas have carried over to computer science as inspiration for how to develop and model algorithms that seemingly have the capacity to perceive, learn, and make decisions based off of sensory data.

neuronAbove: a model of the neuron (Cichocki)

There are many artificial models of the neuron, but the general idea is that neurons take in a numerical array of information (often binary) and pass the array through a weighted threshold  in order to determine an output value. The output is also represented numerically, often as a value between 0 and 1 or -1 and 1, either in binary and/or analog format. The output is dependent on the weighted value threshold, which determines how ‘confident’ the neuron is that the input passes or fails its threshold (e.g. if at least 60% of the inputs are 1’s, output a 1 signal). In some models, the output can be fed back to a weighting algorithm within the neuron to determine whether the weighting threshold should be modified and to what degree.

Above: ANN arrangements: a) FeedForward b) FeedBack c) cellular arrangement (Cichocki)

Neurons can be arranged in a number of ways within an Artificial Neural Network. Their outputs are fed to the inputs of other neurons which can arrangements such as the examples above.  FeedForward is essentially a chain of neurons, while FeedBack incorporates neurons dedicated to returning incoming information back into the chain. (Artificial) cellular arrangements involve neurons with multiple, nonlinear connections with each other.


In many neural networks, these chains occur in layers such as described in the feedforward multi-layer Perceptron model above, one of many models used for designing neural networks. Multiple neurons with very similar input domains and tasks are arranged to communicate with neighboring layers but not neurons on the same layer. These networks can have multiple layers, but apparently run most effectively with three layers for most purposes. The layers themselves do not have to have the same number of neurons as any of the other layers. This arrangement of neurons allows the Perceptron to analyze data chunks instead of individually at a ‘pixel’ scale.

For example, in an image recognition process a specialized neural network might be tasked to analyze the direct capture of data from a camera in one layer, (possibly split across red, green, and blue channels), determine small clusters where brightness values form discrete lines or patterns in a second layer, then determine if these clusters form a particular shape in a third, then finally feed this information to a single neuron to determine how closeley the collective analysis from the third layer matches a ‘learned’ model for a cat (many machine learning algorithms undergo a ‘training process’ to prepare a neural network model to recognize certain features; those that do not are considered ‘unsupervised machine learning algorithms’).

In essence, ANN’s use the collective processing power of many smaller units in order to solve higher level problems. The fact that these machine learning algorithms are being brought up in our Body-Centric class is rather eye-opening with regards to how such technologies might be integrated with bodily technologies. A friend of mine referred me to an example of neural networks being integrated with prosthetic, where the prosthetic limb interpreted incoming electrical signals and gradually learned to perform actions reflective of its user’s intentions. If I remember correctly, I believe he was describing imitative machine learning, where the machine tries to imitate particular ‘memories’ that are implanted in it during its training process; perhaps such a device could imitate human hand motions. I wonder if and how some of these concepts could be carried forward to some of the sensory technologies we experimented with in the past, including vision, the EMG sensor, etc.


Cichocki, Andrezej. “Neural Networks for Optimization and Signal Processing” Chchester, NY: J. Wiley, 1993. Print.

Buduma, Nikhil. “Fundamentals of Deep Learning” Sebastopol, CA: O’Reilly Media, Inc., 2017. Print.

Castano, Arnaldo Perez. “Practical Artificial Intelligence”. Apress, 2018. Print.