Explorations #3

Technologies and short projects

Slit Scan with ml5.js

These last couple weeks I have been looking at different aspects. One of the things I tried out was looking at slit scanning. I wanted to incorporate slit scanning with ml5.js. With this experiment I wanted to observe the machine trained model to recognize faces if the participant stood still.

This did not seem to work so well. The slit scan work but the way the frame was spliced by p5.js in the code made it hard for ml.5 system to work.


CLM Tracker

This week we had also started looking at CLM tracker.

clmtrackr is a javascript library for fitting facial models to faces in videos or images. It currently is an implementation of constrained local models fitted by regularized landmark mean-shift, as described in Jason M. Saragih’s paper.”(https://github.com/auduno/clmtrackr).




CLM tracker is better at tracking the face and it works on many points of the face. Due to the significant detail in mapping out the face, CLM tracker has some unique capabilities in being able to map out interesting shapes onto the face. This code was released in 2016 and this is almost a precursor to some of the filter technology that exists now.


This key unique feature shows the mapping of shapes on the face.


Sometimes the tracker works well however most of the times the tracker takes time locating the face. It isn’t as steady as ml5. This makes it hard to code with it as we figured it  out while trying map shapes onto each key point.


One of the unique codes create Clm tracker which is the emotion reader. The emotion reader which recognizes the emotion based on how each key point is programmed to be in a particular face. The interesting insight for me was the fact that having my facial emotions recognized allowed me to make my face take on different features to take on different emotions. I also wanted to make some notes on what the potential for this kind of technology could be for the future. 
This kind of led me to research further into this topic. I wanted to kind of see what is out there in terms of the pros and cons of this type of technology. As clm tracker is open source and is a library which is compatible with p5.js it isn’t as sophisticated as to what is out there. Emotion recognition is a subfield within the facial recognition industry which is said to grow from 19 billion to 37 billion USD.(https://neurodatalab.com/blog/how-do-technologies-recognize-our-emotions-and-why-it-is-so-promising/)


Open frameworks

One of our goals throughout this process was to also look at openframeworks. Openframeworks which is a program written in c++ allows for creative coding. We had looked at some of zach lieberman’s videos which earlier in the course which gave us interesting insights into the openframeworks platform. We downloaded open frameworks and Xcode which is the IDE to be used for the code. The advantage is of starting to use openframeworks is that we move away from the browser which is the main disadvantage with p5.js. 


We tried out the examples. The first one was the blob example which is great for hand tracking.

Body as Controller / Making Web Games

Objective: Learn how one’s body or body parts can be used to interact with objects on the screen, where the body’s gestures becomes input or the controller.

Interacting with objects on a screen:

We hit a snag and discovered that there’s not much documentation on using openframworks with Kinect. And that the body tracking we wanted to do is possible with Kinect V2 with Microsoft SDK which is exclusive to Windows. We also don’t have access to the Kinect V2. An alternative Daniel Shiffman uses the Kinect V1’s depth tracking of a single pixel to “track a body part” in theory, however this method isn’t as intuitive or seamless as when using PoseNet. The problem with PoseNet is that it is difficult to differentiate unique poses as all the poses are stored in an array and you get back the latest pose…perhaps I could play with the interval between saving poses…but i think this may mess with the real-time tracking aspect.

Hand-position detection.

  1. Web-based using p5.js and PoseNet

I began with a tutorial on the p5.play.js which is a game library for p5.js. The good thing about this library is that you don’t have to do math. It handles the calculation of where to position your sprite based on it’s x,y position and the velocity you set for the sprite. So when you change the x,y position the library does the calculation for you.

prt1 . I worked on Allison Parrish‘s tutorial on how p5.js and did the following activities: Making a single sprite, sprites on the move, following the mouse, mouse events, multiple sprites, events on multiple sprites.

So far I really like how the library works and the functionality it provides. I was able to mimic gravity, causing rectangle sprites to fall down, bounce back up when they got to the ground, disappear when the mouse was waved over them, set a score when a sprite disappeared, and handle multiple sprites on the screen. I wasn’t too sure how to add my own function to the Sprite class. I may have to create a custom class that is of type Sprite i.e. extends Sprite of maybe I can create a custom class that has a Sprite as a variable.

From my explorations today, I’d like to look into how I can remove the spr.onMouseOver() and use a body part as the mouseover so that for example when you wave your hand over a brick it would disappear. I’d also like to continuously randomize the position of the bricks so that they aren’t static and are more challenging to swat.

i finished working on the second part of Allison Parrish’s p5.play.js tutorials. I did the following tutorials. Sprite groups, collisions – overlap(), collide(), group collisions, collision callbacks, images and animations, animations.

Things I learned:

  1. The overlap function and collide function determine the kind of interaction two sprites can have. Overlap() allows one sprite to move on top of the other, and collide() makes the second sprite solid blocking the other sprites path. How it works:

spr1.collide(spr2); // Sprite 1 will collide with sprite 2

spr1.collide(sp2); // Sprite 1 can move ontop sprite 2.

  1. The Group() class allows you to create multiple of similar sprites and save them together almost like an array. It uses add() and remove() functions to build the group.
  2. To create an animated sprite, you need three image variations of the sprite to be played to create the animated effect.
  3. To keep score, and display points you can use a global variable score and update it whenever a sprite is removed.
  4. collision callbacks allow you to call a function whenever two sprites overlap or collide, so If a player overlaps a coin you can have a getCoin() function and if a player runs into a wall you can have a getAHeadache() function. This function takes the player sprite and the other sprite as parameters.

Constraints: Because poseNet updates too quickly with the poses, I think it isn’t a good choice for a game where you need complete accuracy. I was thinking of making a snake game, but I can’t figure out how to use the body part to control the snake because the body part doesn’t move in a straight line but the snake needs to. I think I might come back to this idea but I’ll begin with a simple paddle game where the hand position controls a paddle. I will keep the y position constant and just have the x position change according to where the player is holding their hands so that they don’t have to keep their hands in a certain position and thus simplifies the interaction. 

Step 1. Creating the paddle.

I was having trouble getting the poseNet model to track my wrists. The shapes would jump all over the screen. Only the nose and eyes worked best. I think this may be due to the area the camera can see and also the light in the room. since my face is closer to cam, it it easier/clearer to track. I switched to using Google’s creatability library and had better results however, this meant that I couldn’t track both hands to create two paddles.

The paddle is created using a simple rectangle shape whose y value is constant i.e. 380 and whose x value is updated depending on the x value of the right wrist.


Step 2. Creating Walls

When the user’s hand goes off the screen the paddle follows off the screen too. To prevent this I created 4 wall sprites set to collide with the paddle so that it remains in the canvas. For this I created a group of sprites called walls and set the player to collide with any sprite from the wall group.



Step 3. Adding a bouncing ball

  • Set the ball to fall and when it collides with the paddle, stops falling and to test the interaction, I set the two to change colour upon colliding.

screen-shot-2019-07-06-at-7-47-52-pmthe ball sprite before collision with paddle


the ball sprite after collision with paddle

I was able to get the ball to bounce off the paddle and the walls post collision, however i wasn’t able to get the ball to speed up. Whenever I added the speed, it would bounce and go off the screen. see: bouncing ball sprite

Step 4. Adding complexity

I was able to speed up the ball a little bit by upping the velocity in the y direction. To add some complexity, I added a killer group that reduces the size of the paddle upon collision, killers are randomly generated every time a player successfully bounces the ball. I also added a coin group, that allows the player to score points if they collect falling coins.

Step 5. Controlling using the body

I switched out the mouseX for pX which is updated using the x position of the wrist, tracked using Google’s creatability library. The value is used to update the velocity of the player. I noticed that when using one’s hand as the controller, the paddle becomes a little jittery and the movement isn’t as fluid as when using mouseX. This could be because of the light in the room. Moving away from the webcam seems to improve things a bit. Also, running the webcam and the p5.play.js library at the same time seems to overwhelm the browser, and i get lags that didn’t show up when I was just tracking the mouse.

player.velocity.x = (pX – player.position.x)*0.1;


I ended up having a lot of trouble trying to run Creatability library with p5.play.js. and ended up switching back to PoseNet on ml5.js. I also switched tracking to the nose instead of wrist, as this proved to be much smoother. Unfortunately the webcam is still flipped, so having the skeleton showing as the player plays is a little disorienting as movements are flipped, adding the drawing actions to the translation matrix that flips the video didn’t change anything.

link to code on github: here



Explorations #2


Weeks 3 and 4 were dedicated to running short experiments and continuous research on projects that were interesting in the way that combined the body and computer vision.

I have divided the blogpost up into:

  1. Mini-experiments
  2. Research

1. Mini-Experiments

For the min experiments I started out first by simply sketching out body points using posenet and Ml5.js library.



I referenced an example sketch on the pose net library to draw a skeleton first. The idea of the sketch was to understand the perspective of how the points connect to do further exploration. I want to see how I could make two point collide in order to make something happen. This “something” could take place in the form of a sound or a visual.

I continued the exploration by focusing on the facial tracking. As I am drawn  to the idea of the facial recognition.


These snippets of trying out the code were to help with understanding all trackable points.

Another experiment I tried was Bodypix(). This is a code which only tracks the body and blurs out anything that is not the body. I wanted to see if instead of dull black background I could use a function to help me load and image which would act and the background. Since it is real time and dynamic, in which sense the sketch is always dynamic and moving the code draws out the color pixels every time.


The code was pretty good at recognizing the body however it wasn’t steady enough. It kept tracking the painting behind me as the body as well.


The code for this was:

function setup() {
createCanvas(320, 240);

// load up your video
video = createCapture(VIDEO);
video.size(width, height);
// video.hide(); // Hide the video element, and just show the canvas
bodypix = ml5.bodyPix(video, modelReady)

function modelReady() {
bodypix.segment(gotResults, options)

function gotResults(err, result) {
if (err) {
// console.log(result);
segmentation = result;

image(video, 0, 0, width, height)
image(segmentation.maskBackground, 0, 0, width, height)

bodypix.segment(gotResults, options)


Using an Image as a background did not work. Nor did other colors other than black. This is something I would like to come back to in the coming weeks for further exploration.


2. Further research

Body, Movement, Language:AI sketches with Bill T.Jones

This experiment conducted with tensorflow and built by the google creative team and famous choreographer Bill T. Jones. This was a remarkable experiment in the sense that it used real time track and speech recognition as a way to track and move words in real time. This opens up the possibilities of storytelling through this medium.



Space-Time Correlations Focused in Film Objects and Interactive Video 

I wanted to see how I could also understand these minexperiemnts through a theroetical lens. I really wanted to understand the idea of time and space through this medium. I also got interested in the applications of these technologies and the future of cinema. Will the future of cinema change due to our experimentations with these new technologies?

A number of contemporary and more recent art projects have transformed film-material into interactive virtual spaces, in order to break through the traditional linear quality of the moving image and the perception of time, at the same time to represent, or to visualise the spatial aspects of time respectively. In the times of resampling, the concentration on an relatively old picture medium and its transformation into a space-time phenomenon open to interactive experience does not seem surprising. The results of these experimental works exploring and shifting the parameters of the linear film are often oddly abstract and quite expressive in their formal composition, and, consciously elude simple legibility.

Body Tracking Experiments

Real-time Human Body Tracking Explorations

Interacting/ Creating with your face: Painting with my nose.

This exploration was inspired by this article that explores turning one’s nose into a pointer & Experiments by Google which explores creative tools accessible for differently-abled people.


Track your nose and use it to draw a picture.

Questions I’m trying to answer:

In thinking of this idea, I was wondering how could I make it so that the person, when only using their nose as a controller, can start and stop the drawing action so that it doesn’t end up being a continuous line drawing all the time?

Solution 1 : Maybe I could track the distance of the nose from the webcam so that whenever the person leans forward or backwards the drawing action would turn on and off.

Solution 2: Another idea would be to use a different part of the body as to toggle the drawing action.


Creatability though it makes working with PoseNet much simpler, is restrictive in that in simplifying the process, only allows you to track one body point at a time. I will be returning to the poseNet with ml5 tutorial which does the same thing but with less abstraction.

One thing I like about creditability is that since a lot of code is abstracted, you can refer to parts using strings such as ‘nose’, ‘rightEye’ instead of having to remember indexes in the pose array as in with PoseNet. The library would work well in instances where you need only to track a single part of the body. 

Trying to use leaning back and forth to change the background of an image. Below are the results of my explorations. To determine the distance from the screen, I calculated the distance between my eye and my nose as when a face is closer to the screen, the distance between the two is larger and smaller when the person is leaning back. I then hid the video and tested leaning back and forth to change the background of my sketch as seen in the green and red. I set a range whereby when the distance was less than 25 or greater than 70 and I set an isDrawing boolean value to false.


I was not able to create the drawing lines with my nose effect because for some reason the drawing action is flipped so that there is a mirroring effect which makes it confusing when trying to think and draw. Additionally, I couldn’t figure out how to make a continuous line instead of dots. When tracking the mouse I would have been able to mouseX, mouseY and the pmouseX, pmouseY positions but I wasn’t able to do that with tracking my nose. I believe this would be possible if created an array and stored all the points where my nose had been, then used the draw function to draw those points instead. I decided to continue exploring other things. Below is a screenshot from my attempts at drawing with my nose.


Observations: Using leaning as an interaction turned out not to be so great. It was hard to control the reaction to the movements because you either had to be still and lean at the same time. Additionally, it just didn’t feel too natural, especially for drawing. Perhaps a different body part would have been better although I feel this just adds more complexity to the interaction.

Reading through past projects and works:

Text Rain (1999) by Camille Utterback is an art installation where one’s body is tracked and used to interact with digital text. This piece uses color black or white to detect where the body is and then animates the text accordingly. I think it would be interesting to see if I can achieve something similar with the body tracking technologies we have today. It might be harder to achieve with PoseNet as it doesn’t give you the skeleton region perhaps with the Kinect. The animation below is an example from the PoseNet sketchbook that tracks a point on the body and tracks it as a person dances then draws text animated on that path. I will look into this further in our interaction explorations for week 5 &6


Technology as a tool vs medium? 

In my explorations this week I keep coming across artists and technologists who are asking “what can the technology do for me” and the idea of this body-tracking tech as a tool to convey meaning. e.g. “We quickly discovered that PoseNet was only interesting to Bill if it helped him convey meaning. The tech wasn’t an end in itself, it was only useful to him as a tool for artistic expression.” I also came across the same sentiment on Utterback’s website and it is something I will be reading more up on this week. Utterback links to this chapter as a reference to her Text Rain project on her site. The Tool Model: Augmenting the Expressive Power of the Hand – Inventing the Medium – Principles of Interaction Design as a Cultural Practise. “A lesson from the design of the google experiment was that the technology itself shouldn’t be the star of the show…push the boundaries of digital interaction design beyond the current standard of clicks, presses, and taps” – Maya Man 








This blog post is set to be an exploration for what would be 12 week study on computer vision but very specifically “using body as a controller”.  The first two weeks we conducted broad research which involved looking at:

1- Technology

2-Lit review

3-Artists in Computer Vision

4-Computer vision in mobile technology

5-Ethics and design of computer vision



We got the chance to experiment with the kinect 1. Due to the library written by Daniel Shiffman for using Kinect in conjunction with Processing (using Kinect with Processing). We set up the kinect and ran the example codes provided in the library. This helps in understanding the code as well as quickly try out possibilities of using the kinect in creative coding. An interesting question which arose during our experimentation with the kinect was the idea of using it as a camera to capture narrative in real time akin to a film camera.

2- Research (Literature Review)-

There are many artists and designers who have used the idea of computer vision in a variety of ways to explore the possibilities of computer vision in combination with machine learning. Below is a brief list of research we conducted to learn different approaches taken around the subject.


Fei-Fei Li: If We Want Machines to Think, We Need to Teach Them to See

“Computer vision, Li argues, is the key enabling technology for all of AI. Understanding vision and building visual systems is really understanding intelligence”.

Computer Eyesight Gets a Lot More Accurate      BY JOHN MARKOFF AUGUST 18, 2014   https://bits.blogs.nytimes.com/2014/08/18/computer-eyesight-gets-a-lot-more-accurate/   

“Machine vision has countless applications, including computer gaming, medical diagnosis, factory robotics and automotive safety systems. Recently a number of carmakers have added the ability to recognize pedestrians and bicyclists and stop automatically without driver intervention”.


 If you’re a darker-skinned woman, this is how often facial-recognition software If you’re a darker-skinned woman  https://www.media.mit.edu/articles/if-you-re-a-darker-skinned-woman-this-is-how-often-facial-recognition-software-decides-you-re-a-man/

“Automated systems are not inherently neutral. They reflect the priorities, preferences, and prejudices—the coded gaze—of those who have the power to mold artificial intelligence”.


Cornell University CS5670:Intro to Computer Vision

Cornell University CS5670: Intro to Computer Vision Instructor: Noah Snavely

This is a pdf from a class offered on Computer Vision. It gives a useful guide to what computer Vision entails.


  1. Will computer vision ever be as good as human vision?

Most probably not as human vision is better at perception. However due to some shortcomings in the human perception, Computer vision can detect vast number of things and aids humans in many activities. I picked a few slides and have attached the areas I find the most interesting in the vast field.


This may not be much of a surprise since most phones these days use face detection technology  unlock our phones. The interesting question was why do phones detect faces in real time? I couldn’t find a satisfactory answer but I found some informative on the lieu of information now available around the applications which have spurred especially face detection applications and social media companies like snapchat.

Computer Vision in Mobile Phones

Computer vision and the future of mobile devices

Google’s project tango



This area was extremely exciting to me in the exploration angle.  In the last few years we’ve seen a rise of facial filters and facial tracking. Recently facebook’s company instagram allowed computer vision artists to become developers to release facial filters.

2.What are the current state of affairs?

A lot of the applications have been developed in the last five years, hence making this a very active field of research.

For a list of companies and research active in this field David Lowe’s website maintains a comprehensive list. This list is mostly dedicated to corporations actively working in the field.

Most of these companies work in building large databases to aid in the machine learning and training. A glance to the list and further into the websites was informative personally to me in understanding how corporations are using this technology.

3. Artists in Computer Vision

Golan Levin

Golan Levin in many ways is an artist who is the most well known for combining computer vision with creative coding which asks the question of interaction.

Golan Levin has also been instrumental in collaborating with many artist within this field. His work is the basis on which a lot of social media filters were built upon.

For a list of his work his website flong provides a list of all his projects.

One of my favourite projects is :


Kyle McDonald


This video gives a great introduction to his work.

Dan Oved

Dan Oved’s final project at NYU’s ITP


Background: for my final project for Intro to Physical Computing and Design for Digital Fabrication at ITP, I created Presence, a kinetic sculpture that moved in the direction of a viewers gaze. It used a gaze detection convolutional neaural network to estimate gaze position for a single user. I showed this installation at the 2017 ITP Winter Show.

Lisa Jamhoury – Artist


Lisa Jamhoury’s works with facial and pose recognition software to build interactive art through creative coding.

Her work with Kinect2 helps connect libraries to different editors such as P5.js and processing.

Isabel Hirama – algoRhythmic Salsa (PoseNet AI Assisted Salsa Lessons)

Real time feedback using a laptops camera to teach salsa.

Stanford and MIT both lead the way in terms of research into neural networks, machine learning and training of computer vision.

Stanford Vision Lab


MIT Media Lab



Lisa Jamhoury – Artist


Lisa is a senior experience designer in machine intelligence at Adobe. Lisa’s work uses data from the kinect 2 through a library developed and API developed by her called kinectron. This is an interesting application in terms of the ability to control multiple kinects in multiple locations.



Isabel Hirama – algoRhythmic Salsa (PoseNet AI Assisted Salsa Lessons)



Joy Buolamwini


Joy Buolamwini is a poet of code who uses art and research to illuminate the social implications of artificial intelligence. She founded the Algorithmic Justice League to create a world with more ethical and inclusive technology.Gender shade-


Gendershades is a project by Joy Buolamwini which focuses on the idea of “coded gaze”. Joy’s research focuses on the biases within these systems based on the training dataset that was available. The research data showed that these classifiers are more likely to miss gender and miss skin shades.

Zach Lieberman


Zach Lieberman is the founder of Openframeworks. His most famous project is the eyewriter which uses the eyes to draw. The design of this project focusing on harnessing computer vision and Augmented reality to enable people with disabilities to draw in real time.



3- Computer Vision in Mobile Technology

Computer vision and machine learning models in the last five years have gotten better. One of the reasons is due to the rise of mobile technology and various applications including Social media platforms. In the last few months of 2019, platforms like instagram have opened up their platform to artists to make “new filters”. This included a release of Facebook proprietary software called Spark AR. I downloaded spark to understand how it works. Sparks’ facial recognition software is very good, especially in terms of facial recognition. I tried out the features, this is interesting as it opens up artists in this field with no previous coding knowledge.

Trying out Spark AR face tracker
Trying out Spark AR face tracker



Spark AR tutorial
Spark AR tutorial


There are many artists using this platform including artists like zach Lieberman . I found this a potential area for further research. To understand it further I conducted an experiment where I would try out a different filter everyday and post it on my social media. As somebody is very averse to the idea of selfies, I felt extremely comfortable behind these “masks”.

This made me think of research questions and thoughts that I would like to carry forward for this class.

a) With body and face tracking models becoming extremely sophisticated and consumable vis social media, what does it mean for the future of human body.

b) The idea of digital avatars and cyborg identities being controlled by the human body.

C) The body as art. The rise of facial recognition artists building AR filters which are now displayed on the body via the mobile phone cameras. Are bodies going to be depositories for art?

d) With large corporations like facebook opening up this to large masses of people what are the ethics for the body being involved. Who owns the data of the body. How is this data being further used for large corporations to train Machine vision?

4-Ethics and Design

This field of science and technology brings about ethical questions of the idea of body as controller.

One of the goals of the first week was to also seek out those conducting research and making works which are conducted within the umbrella of computer vision that necessarily don’t fit the status quo and hence are not in the “mainstream” discussions. In my further research I specifically wanted to see if there was a feminist discussion around this field. At first I was getting the same resources as we we have listed above. It required a bit of digging within multiple resources to find the kind of work which was interesting.

As our goal for further research was steer clear from a status-quo driven American dominated field. Furthermore, personally my research interest during this independent study was also to find work which uses a feminist lens on topics like Computer vision.

I came across DeepLab.

Deep Lab is a congress of cyberfeminist researchers, organized by STUDIO Fellow Addie Wagenknecht to examine how the themes of privacy, security, surveillance, anonymity, and large scale data aggregation are problematized in the arts, culture and society.

Facial recognition software is everywhere and users readily give out information, in terms of surveillance and body and face as a data, as a designer entering this topic it is very pertinent to be aware and be mindful of how this will impact all those that use these features.

Computer Vision & Graphics Lit Review

Computer Vision & Graphics Notes

Week 1 – 2: Literature and tech review & exploration of related works and related projects.

Objective: Discover what is possible with real-time pose tracking, learn about what technologies and devices can be used and how they can be used, put together a reference list/document of our discoveries, understand real-time pose tracking better. Discover more diverse, BIPOC creators working in the field and explore critical discourse.

Applications Beyond Identifying Things In Our Physical World (Visual Descriptors of Humankind)

McNeal, M. (2015, August 07). Fei-Fei Li: If We Want Machines to Think, We Need to Teach Them to See. Retrieved May 24, 2019, from https://www.wired.com/brandlab/2015/04/fei-fei-li-want-machines-think-need-teach-see/

This article describes Stanford University’s Vision Lab projects and plans for AI research, at the time of the article the lab was working on creating AI first responders to help save lives during crisis. Computer vision faces many challenges because vision is humankind’s most complicated cognitive ability and it is critical to how we understand the world.

“Today, computers can spot a cat or tell us the make, model, and year of a car in a photo, but they’re still a long way from seeing and reasoning like humans and understanding context, not just content. (A bat on a youth baseball field and at a crime scene has two very different meanings.)”

Key Insights:

We are still a long way from computers being seeing and reasoning like humans and understanding  context, not just content.

“Understanding vision and building visual systems is really understanding intelligence…And by see, I mean to understand, not just to record pixels.”

Computer vision isn’t just about identifying things in our physical environment but can actually reveal details and provide insights on things that we don’t even know yet:

Every day, the Internet generates what Li calls the “dark matter of the digital age”—trillions of images and videos and other scraps of digital minutiae. More than 85 percent of content on the Web is multimedia imagery—and it’s a chaotic mess. “There is a fundamental reason that we need to understand this,” she says. “The recording of our lives, our daily activities, our relationships—be it my personal life or what’s going on in society—is in these contents.”

Computer vision can have various applications such as monitoring and combating the effects of climate change, building smart homes, and aiding in the medical field but it can also be used for nefarious purposes such as intrusive surveillance. An important factor in combating this is diversity in technology research and AI which will ensure checks and balances when creating AI.

“Every technology can be an enabler of vices,” she says, “but as a scientist you have to have that social awareness and be very aware of these potential risks.”

Facial Recognition Software is Bad at Recognizing Faces

This article describes MIT research that found that computer vision was bad at facial recognition when it came to black women, failing ⅓ times when compared to being accurate 99% of the time when subjects were lighter skinned males.

Buolamwini, J. (2018, February 13). If you’re a darker-skinned woman, this is how often facial-recognition software decides you’re a man – MIT Media Lab. Retrieved May 24, 2019, from https://www.media.mit.edu/articles/if-you-re-a-darker-skinned-woman-this-is-how-often-facial-recognition-software-decides-you-re-a-man/

Key insights:

Machine learning algorithms can discriminate based on classes like race and gender because of the lack of diversity in their datasets that were used to train the AI which are composed of about 79-87% of lighter-skinned subjects. Limited data sets can impact the effectiveness of artificial intelligence, which might in turn heighten bias against individuals as AI becomes more widespread.

We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%).

The maximum error rate for lighter-skinned males is 0.8%.The Substantial disparities in the accuracy of classifying darker females, lighter females,darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms.

What role will AI play in the future of our systems? At the moment AI-based systems aren’t being used in making high-stakes decisions such as determining a person’s prison sentence, but are being used to identify subjects against hot lists of people suspected of gang activity or of having open warrants, so what does that mean when the algorithms are biased and faulty?

Cornell University: Intro to Computer Vision Class

Course: http://www.cs.cornell.edu/courses/cs5670/2017sp/lectures/lec00_intro.pdf

* Goals of Computer Vision:

  • Compute the 3D World e.g animated films
  • Recognize objects & people e.g Terminator
  • “Enhance” images e.g CSI
  • Forensics – fingerprint readings from photos?
  • “Improve” photos e.g removing noise, computational photography (filters)

* Currently used in:

  • OCR – optical character recognition
  • Face detection e.g smartphone camera, Facebook photo tagging
  • Face recognition i.e runs against database and compares
  • Vision based biometrics e.g Afghan girl was identified using iris pattern
  • Columbia & Maryland Uni with Smithsonian Institute made Leadsnap – an electronic field guide that identifies tree species using high res images of leaves.
  • Bird identification – image recognition
  • Special effects – shape and motion capture
  • 3D face tracking e.g Snapchat filters
  • Sports e.g tracking players like in swimming for Olympics when names are overlaid in lanes
  • Games (Wii, Kinect) & vision based interaction e.g for assistive tech
  • Smart cars
  • Medical imaging e.g image guided surgery
  • Robotics – Mars Rover, Amazon picking robot, Amazon delivery drones
  • Virtual & Augmented Reality

* Difficulties that Computer Vision & Graphics face

  • Viewpoint variations e.g looking at picture of a car from different angles
  • Intra-class variations e.g different models of a car over the years
  • Illumination (light) i.e how much/little light there is on the subject
  • Scale i.e. how small or big subject is can distort understanding
  • Occlusion e.g if a cow is behind some trees, computer won’t detect it.
  • Background clutter
  • Motion
  • Computer can’t deduce context like humans can. Computers are great for the easy things but when context is complex or cultural, humans are better.

Source : Intro to Computer Vision & Graphics : Cornell Uni


Stephanie Dinkins

Transmedia artist and associate professor, who creates platforms for dialog about artificial intelligence (AI) as it intersects race, gender, aging, and our future histories. Her work calls to question the need for diversity in creation of AI as artificial intelligence is already permeating our daily lives in fields such as criminal justice, healthcare, education, and employment were it affects diverse communities in different ways.

Addresses: questions of bias in AI, consciousness, data sovereignty and social equity.

Project of Note:

Not The Only One (N’TOO): is a voice interactive AI, that acts as a storyteller telling a multigenerational tale of one black American family. The bot is trained on conversations and interviews between three generations of Dinkin’s family – her, her aunt, and her niece and is an amalgamation of their experiences, histories, and unique life stories.

Tamiko Thiel

Is a visual artist whose work looks at the intersection space, place, and cultural memory. Her work draws on her experience as an American of mixed German and Japanese descent, living between Germany, Japan, and the US.

Project of Note:

Land of Cloud: a virtual reality and 3D sound installation (2017-2018), which I thought was an eerie yet beautiful way to question our obsessive use and over dependence on technology, where it seems like we worship our tech especially smart phones.


Three days journey beyond Space and Time lies the Land of Cloud. The people there are silent. They communicate not through speech, gesture or gaze, but instead through strange and wondrous “cloud mirrors.” These devices keep them in constant contact with their deity, The Cloud, in whose image they are created – their bodies are themselves composed of softly billowing clouds.

The Land of Cloud is a beautiful garden, but the Cloud People are oblivious to their surroundings. They stare into their devices, motionless, spellbound by whispers from The Cloud. The garden slowly envelops them in its boughs.

If you visit the Land of Cloud, you will hear a susurration of voices when you enter the space. If you place your head inside the head of a cloud person, walking up to them, sitting or lying down next to them, you will hear that each one repeats its own mantra, given by the Cloud Deity.”

Note: Thiel provides a list of VR/AR theory and suggested readings on her site that have informed her own practise which may be interesting to look at.

Hyphen Labs is a team of international women who combine art and technology to protect, represent,  and honour  women of color. source I thought their Neurospeculative Afrofeminism project was an interesting way of using virtual reality as radical self-care, especially when a lot of women of color never take time for, or feel like they cannot take time for their own self-care.

Project of note:

Neurospeculative AfroFeminism:

“Enter Brooks’ Salon, a beauty salon of the future, and peruse a line of speculative low- to high-tech beauty products while waiting for your appointment to get a set of “Octavia Electrodes,” transcranial extensions designed to make the brain’s synapses more excitable and primed to increase neuroplasticity.” source.

Note: Torontonian & OCAD grad, creative technologist Michelle Cortese worked on this project. Her work blurs the lines between artist and engineer and encompasses physical computing, art direction, fabrication, and graphics programming. source

Indiewire write-up on the project – details the installation experience well.


Intro to Processing & Kinect by Daniel Shiffman

We used the Kinect version one and connected it through the kinect library in processing. Below are some experiments we conducted using the Daniel Shiffman’s processing and kinect videos and examples from the library. I learned that the Kinect v1 has an in-out system where an infrared light is emitted of one camera and when it reflects off a surface is read back by an RGB camera on the device which can detect colours. Kinect v1 may give inaccurate readings on the depth of a pixel because the depth camera and the RGB camera are separate so they readings as skewed, this was fixed in v2 by combining the 2 sensors. For our explorations we mostly focused on the point cloud example and were particularly interested in testing whether Kinect could detect through windows to see a person’s surroundings. We found that it could not and deduced that the transparent glass allowed the IR light to pass through and may have refracted it, additionally nearby building were still too far away for the Kinect to detect any bouncing back IR light if any. Below is a video of our window test…the areas where the screen goes black is when we were attempting to “look” outside the window.

We also found that the Kinect was able to detect better when camera was pointed against flat surfaces as seen in the second test below. 

We also tested Snapchat’s new gender swap filters and concluded to never trust another digital photograph again! LOL. The filter has also raised some criticism from the trans community for transphobia


The filters have been used by many in the cis-gender community as fun and gimmicky and have been viewed with criticism by those in the trans-community especially when they are accompanied by derogatory comments mostly from straight men e.g in this twitter post that swapped the genders of famous footballers. Views are conflicted as others have viewed it as something that makes a joke out of transitioning while others think it has actually helped those who have been thinking of transitioning to see what is possible.


Computer Vision (creative) in Africa

I wanted to explore more about computer vision and graphics use in Africa in the creative sector. It was difficult to come across many artists however I came across several game designers, fashion designers, and filmmakers who were using virtual reality. I learned that in Africa virtual reality is mostly used for storytelling and particularly with the themes of African mythology and legends. VR provides a sense of agency for African creatives who through their work can challenge dominant narratives of Africa. I also learned that at the moment in Africa, South Africa is leading in the use of virtual reality and this momentum was fuelled by the corporate sector driven by the use of AR and VR in advertising. There is also a barrier to access due to cost of VR headsets so the technology is still out of reach for most people at the consumer level. I will be continuing to look at the use of computer vision and graphics in Africa particularly as I am interested in the storytelling aspects of the technology. The following links are some of the resources I will be looking at further in the coming weeks…updates will be added to this post.

Cool things I’m continuing to check out:

Celebration of Black History Month in virtual reality by Oculus.

South African gallery dedicated to virtual reality and art.

The brilliantly disturbing future of virtual reality

Breaking the borders of the art canvas in AR art

Digital Lab Africa (DLA)

Africa in Virtual Reality

Africans want in on virtual reality

Electric South – South Africa

City Dreams by Bodys Isek Kingelez

New Dimensions – Virtual Reality Africa