PubNub & PoseNose GitHub Working App Link (If by yourself for testing, you can use your phone as a second webcam)
Objective & Concept
Through PoseNet I wanted to explore networking with computer vision to visualize multiple people watching a screen together. This project tracks the position of the nose on every unique browser page and shares the data amongst the connected users. The tracking allows for users to be aware of the physical position of where the other users are. This creates a spatially aware sensation by either encouraging others to follow other “noses” or for users to move away and create their own space on the browser page.
I followed along on Daniel Shiffman’s Coding Train tutorial where he explores the concepts of what is PoseNet, what is the data that is given, and how can you visualize the data. In his example, he visualizes a nose through an ellipse that follows the nose along with the user on the screen.
The most interesting (and surprisingly simple) part of PoseNet is that it simply changes your body into what could be perceived as “cursors” on the screen (x & y coordinates).
There are a few different examples of how to expand on this data with P5, such as creating a light nose. These examples with PoseNet are interesting because it uses the body as a controller. I had previously explored PoseNet in my Body Centric class in this blog post here where I expanded upon this concept.
For this project, to emphasize the ubiquity of the webcam, I wanted to explore what would it look like to have multiple people be visualized on the screen together.
Starting with Shiffman’s code for tracking the nose, I used this to create the ellipse that follows the nose along with the user on the open browser.
I then found the x and y coordinates of the nose and connected my page to PubNub to publish and subscribe to the position of the ellipse.
I followed along in a previous example from my Creation and Computation class in the fall that tracks how many users are on a webpage using PubNub. In this example, every user loaded on the page sends the x and y coordinates of their cursor on the webpage when they click. The code then takes the average of all the user coordinates and draws lines to the average point of all the cursors.
I connected my code of the nose to PubNub and sent the coordinates of my nose. I chose a nose because it is the center of the face, and most accurately depicts where the user is in relation to their screen. Upon receiving the subscribed data I would check to see if there was a “new nose”. If there was a new nose, that user would be added into the active user array of “All Noses”. Upon every time a message from PubNub was received I would check to see if their ID was in the array and if so then I would update the coordinates of where they are on the screen.
Two noses chasing each other on the page.
The code then loops through the array and draws an ellipse with the received/sent coordinates of the user’s noses. When the user leaves, the ellipse stays there which shows a trace of all the users that have been active on the page.
Along with sending the x & y coordinates, I also sent along to PubNub the RGB values of the user’s nose ellipse. This was to differentiate the different user’s on the page and also allow the user’s to uniquely identify themselves on other’s browsers.
Video documentation of moving around another nose.
The interaction of the two noses was interesting because it prompted either an aversion of the different noses overlapping or an urge to make the dots touch. The action of moving your nose along the screen was not as direct as it was perceived. The motion was laggy, which prompted by jerky motions from the users.
This experiment was an initial exploration into mapping multiple physical spaces together into a virtual space. In further iterations, I would make an action occur if the different positions of the noses overlapped on the browser page. I chose not to this time because I did not want to make anything that could be interpreted as a game mechanic. I wanted to see what the actual reaction would be amongst a few users. In another iteration, I would include other parts of the body to track, such as eyes, or wrist. The single tracking of the nose was effective for tracking the facial position of the user which is the most common angle seen from sitting down at a webcam (while working at a computer).
Overall I am interested in exploring networking computer vision technologies further in a pursuit to examine the multiplicity of spaces and existences we inhabit simultaneously.
My interest in ML5 is focused on real-time animated effects. Compared to other professional software such as Adobe Character, to make real-time face animation using ML5 is more customizable and simpler. Though the result may not be so highly finished, it is a great choice for designers and coders to produce visual work.
I found it easier for me to just use the p5 editor, however the ML5 script needs to be put in the HTML file in the p5 editor. (the fourth “script”)
The model used is poseNet. It allows real-time human pose estimation, it can track for example where my eyes, nose, hands are and then build visual work on those positions.
Then I set the canvas and draw functions in the p5 editor, I used the gray filter to add more fun.
Program the poseNet into my coding. When everything is settled, we can see that the ml5 recognizes “object, object, object (which should be my face)…” from the WebCam.
After some research, I learned that nose to feet are coded as 0 to 16 in poseNet. The left eye and the right eye should be 1 and 2.
The first try:
As the gif showed, if I move out of the screen the circles will not be able to track back.
The second try solved it: (if (poses.length > 0))
In fact, I can call my project successful at this point, however, I wanted to make it more finished.
In the third try, I tested the lerp function and instead of a set size, the size of the ellipses are defined by the “distance”, which allows the ellipses to become larger or smaller as I move forward and backward:
Eavesdropper is a web application that actively listens for the mention of a user’s name spoken in conversation. The application uses a voice to text API that transcribes conversations that are in listening range. The transcriptions are analyzed, and if the name of someone is said the application will sound an alert noting that that person is going to be notified through text message. Simultaneously, the clip of what was being said around the user is saved. The user then receives a text message and can go see what was being said about them.
Building upon my previous voice to text “DIY Siri” project, I wanted to play around with the concept “what if my computer could notify me if it heard something specific?”. I initially thought that it would be interesting to build directly off of the Wolf Ram Alpha API from the DIY Siri project to notify me if something specific was searched. From here I decided that I wanted to isolate the working parts and start with the concept of “the application hears a specific word, the user gets a notification”. I chose to use names as a trigger because they are rare enough that the trigger would not be sent frequently. This is important because both IFTTT and Adafruit IO have data sending and receiving limits. IFTTT has a limit of sending up to 100 text messages a month, and Adafruit IO has a limit of updating channels 30 times a minute.
I started off by using my existing code from DIY Siri and removing any of the PubNub server integration. I then changed the code to analyze the transcripts of what was being said. If my name was mentioned, then log this information.
My next step was to connect my Adafruit IO channel to the page. I created a new feed titled “overheard” with two channels: listening, and transcripts. Listening would indicate whether or not my name was overheard, and transcripts would save whatever was being said about me.
After creating those two channels, I connected my voice to text API to Adafruit to see if I would be able to save the value “true” and the transcript of the conversation. I tested with “if my name is included in this transcript, send the data to Adafruit”. This was successful.
Upon the guidance from Adafruit, I started to create an applet of my own to connect this feed to my phone. I chose the if “this” (trigger) to be Adafruit IO, and the “then that” (action) to be an SMS text message. On the Adafruit side, I selected to monitor the feed “overheard” and the channel “listening”. If “listening” was equal to the data “true” then send a text message. The UX of IFTTT made it simple to connect the two platforms together.
I started testing my application with all of the parts now connected. At first, I was not receiving text messages. This was because I was sending Adafruit a boolean value and not a string. The “equal to” on the IFTTT side of the platform was comparing the channel value to the string “true”. I changed the value of what I was sending to Adafruit to a string and was able to receive a text message.
Once I received a text message, I resulted in receiving six in a row. I realized that my voice-to-text alert that played upon hearing my name was vocalizing my name out of the speakers, which in result my application was picking up. This created an infinite loop of alerts. “Alert alert Olivia has been notified that you mentioned her and received a text message”. I attempted to stop the recursive loop by turning off the voice recognition and restarting it. The issue was with each time a new voice recognition object is instantiated explicit permission from the user to have their microphone activated was required. A quick fix for this was so that I could continue development was to not use my name in the text to voice alert from my speakers. I chose to use “O Prior has been notified” rather than using my name, Olivia.
For the UX/UI of the application, I chose to use a simple button. When the application was not actively listening a button would appear that said “Shhhhhhh!”. If the button was clicked, a microphone permissions prompt would display requesting access. Once the application was listening to the room the entire screen would turn black to be “inconspicuous”. The stop button was designed to be black and appears if the cursor hovers overtop of the element. If the name Olivia is heard in conversation, then a .gif file plays showing someone pressing an alert button. The video and message loop twice before returning to a black screen.
Video demo of Eavesdropper
One challenge I faced was attempting to connect two channel to the IFTTT applet. I wanted to additionally send the transcript as data through the SMS notification. The applet that was connected to Adafruit only allowed for the data of one channel to be used in the SMS. Due to the set up of the applet, I could only compare on direct values (such as greater than, is not equal too, etc.) This inhibited me from using the transcript channel as a trigger to send the message. Alternatively, I could have set up the applet so that it sent a message anytime the transcript channel was updated. With this method, I would have to be concerned with character length and substring the message to ensure that the data would not exceed the character limit for the SMS. I did not want to cut the transcript short, so I chose to use the boolean method. If the user wanted to see what was being said about them, they could investigate the transcript channel and use the time the text message was sent as an indicator for what was being said about them at that moment.
The other challenge I noted was the text to speech API. I had to use a function that checked many different iterations of “Olivia”. This included all different capitalizations of Olivia and with different punctuation. This was only an issue once so all of the iterations may not be necessary. The function that I used is incredibly useful if this program were adapted to listen for multiple keywords. The program loops through the transcript and checks for strings for a list of words that are kept in an array. The array can be modified to store whatever the user would want to be notified of in a conversation.
Next steps & conclusion
The next step for this project would be to find a way to use the data from both of the channels. Having different customized messages from the triggered conversation I think would provide a better experience for the user and would stop the application from being redundant.
Overall IFTTT is an intuitive platform that uses simple and accessible logic that allows many people to create bespoke trigger and responses. Some limitations may be an issue if one were to have their applet on a grander scale, such as the SMS message limit of 100 per month. This web application is a simple demo of what can be achieved with lots of other potentials to adapt to more bespoke applications.