PubNub & PoseNose

Olivia Prior
Ubiquitous Computing
Experiment 6

PubNub & PoseNose
GitHub
Working App Link (If by yourself for testing, you can use your phone as a second webcam)

Nose tracking using PoseNet and PubNub
Nose tracking using PoseNet and PubNub

Objective & Concept

Through PoseNet I wanted to explore networking with computer vision to visualize multiple people watching a screen together. This project tracks the position of the nose on every unique browser page and shares the data amongst the connected users. The tracking allows for users to be aware of the physical position of where the other users are. This creates a spatially aware sensation by either encouraging others to follow other “noses” or for users to move away and create their own space on the browser page.

Process

I followed along on Daniel Shiffman’s Coding Train tutorial where he explores the concepts of what is PoseNet, what is the data that is given, and how can you visualize the data. In his example, he visualizes a nose through an ellipse that follows the nose along with the user on the screen.

The most interesting (and surprisingly simple) part of PoseNet is that it simply changes your body into what could be perceived as “cursors” on the screen (x & y coordinates).

There are a few different examples of how to expand on this data with P5, such as creating a light nose. These examples with PoseNet are interesting because it uses the body as a controller. I had previously explored PoseNet in my Body Centric class in this blog post here where I expanded upon this concept.

For this project, to emphasize the ubiquity of the webcam, I wanted to explore what would it look like to have multiple people be visualized on the screen together.

Starting with Shiffman’s code for tracking the nose, I used this to create the ellipse that follows the nose along with the user on the open browser.

I then found the x and y coordinates of the nose and connected my page to PubNub to publish and subscribe to the position of the ellipse.

I followed along in a previous example from my Creation and Computation class in the fall that tracks how many users are on a webpage using PubNub. In this example, every user loaded on the page sends the x and y coordinates of their cursor on the webpage when they click. The code then takes the average of all the user coordinates and draws lines to the average point of all the cursors.

I connected my code of the nose to PubNub and sent the coordinates of my nose. I chose a nose because it is the center of the face, and most accurately depicts where the user is in relation to their screen. Upon receiving the subscribed data I would check to see if there was a “new nose”. If there was a new nose, that user would be added into the active user array of “All Noses”. Upon every time a message from PubNub was received I would check to see if their ID was in the array and if so then I would update the coordinates of where they are on the screen.

 

Two noses chasing each other on the page. 

The code then loops through the array and draws an ellipse with the received/sent coordinates of the user’s noses. When the user leaves, the ellipse stays there which shows a trace of all the users that have been active on the page.

Along with sending the x & y coordinates, I also sent along to PubNub the RGB values of the user’s nose ellipse. This was to differentiate the different user’s on the page and also allow the user’s to uniquely identify themselves on other’s browsers.

Video documentation of moving around another nose.

Results

The interaction of the two noses was interesting because it prompted either an aversion of the different noses overlapping or an urge to make the dots touch. The action of moving your nose along the screen was not as direct as it was perceived. The motion was laggy, which prompted by jerky motions from the users.

This experiment was an initial exploration into mapping multiple physical spaces together into a virtual space. In further iterations, I would make an action occur if the different positions of the noses overlapped on the browser page. I chose not to this time because I did not want to make anything that could be interpreted as a game mechanic. I wanted to see what the actual reaction would be amongst a few users. In another iteration, I would include other parts of the body to track, such as eyes, or wrist. The single tracking of the nose was effective for tracking the facial position of the user which is the most common angle seen from sitting down at a webcam (while working at a computer).

Overall I am interested in exploring networking computer vision technologies further in a pursuit to examine the multiplicity of spaces and existences we inhabit simultaneously.

Resources

PoseNet Coding Train

Creation and Compution

Body Centric Blog Post 

PubNub

The Real-Time Virtual Sunglasses

My interest in ML5 is focused on real-time animated effects. Compared to other professional software such as Adobe Character, to make real-time face animation using ML5 is more customizable and simpler. Though the result may not be so highly finished, it is a great choice for designers and coders to produce visual work.

I found it easier for me to just use the p5 editor, however the ML5 script needs to be put in the HTML file in the p5 editor. (the fourth “script”)screenshot-2019-04-17-at-10-35-38

The model used is poseNet. It allows real-time human pose estimation, it can track for example where my eyes, nose, hands are and then build visual work on those positions.

screenshot-2019-04-17-at-10-50-26

Then I set the canvas and draw functions in the p5 editor, I used the gray filter to add more fun.

screenshot-2019-04-17-at-10-35-50

Program the poseNet into my coding. When everything is settled, we can see that the ml5 recognizes “object, object, object (which should be my face)…” from the WebCam.

screenshot-2019-04-17-at-11-15-11

After some research, I learned that nose to feet are coded as 0 to 16 in poseNet. The left eye and the right eye should be 1 and 2.

screenshot-2019-04-17-at-13-47-03

The first try:

screenshot-2019-04-17-at-14-15-31

01-2019-04-17-17_16_35

As the gif showed, if I move out of the screen the circles will not be able to track back.

The second try solved it: (if (poses.length > 0))

screenshot-2019-04-17-at-14-19-31

02-2019-04-17-17_17_10

In fact, I can call my project successful at this point, however, I wanted to make it more finished.

In the third try, I tested the lerp function and instead of a set size, the size of the ellipses are defined by the “distance”, which allows the ellipses to become larger or smaller as I move forward and backward:

screenshot-2019-04-17-at-16-27-10

03-2019-04-17-17_17_31

04-2019-04-17-17_17_50

 

Reference:

https://ml5js.org/docs/PoseNet

The Coding Train

 

Eavesdropper

Workshop Notes #5
Eavesdropper
Ubiquitous Computing
Olivia Prior

Github
Web Application 

Eavesdropped home page
Web application “Eavesdropper” home page.

Concept

Eavesdropper is a web application that actively listens for the mention of a user’s name spoken in conversation. The application uses a voice to text API that transcribes conversations that are in listening range. The transcriptions are analyzed, and if the name of someone is said the application will sound an alert noting that that person is going to be notified through text message. Simultaneously, the clip of what was being said around the user is saved. The user then receives a text message and can go see what was being said about them.

Objective

Eavesdropper is an exploration into creating an applet on the platform If This Then That (IFTTT) using Adafruit IO. The purpose of IFTTT is to make a trigger and a response. I wanted to create an accessible and customizable web page that anyone could use as an applet. The JavaScript voice to text API analyzes what is being said throughout the space that the application is opened in. If the text detects the name is sends two pieces of data to Adafruit IO: the snippet of conversation containing the user’s name and a “true” statement. IFTTT is linked with Adafruit IO; if the channel data matches “true” then the applet sends a text message to the associated user letting them know that someone is mentioning them in conversation. The web application simultaneously uses the text to voice API to announce a message to the party that set off the trigger. This applet is simple to set up, allowing anyone to create a transcription analyzer that can notify them of anything they so wish.

Process

Building upon my previous voice to text “DIY Siri” project, I wanted to play around with the concept “what if my computer could notify me if it heard something specific?”. I initially thought that it would be interesting to build directly off of the Wolf Ram Alpha API from the DIY Siri project to notify me if something specific was searched. From here I decided that I wanted to isolate the working parts and start with the concept of “the application hears a specific word, the user gets a notification”. I chose to use names as a trigger because they are rare enough that the trigger would not be sent frequently. This is important because both IFTTT and Adafruit IO have data sending and receiving limits. IFTTT has a limit of sending up to 100 text messages a month, and Adafruit IO has a limit of updating channels 30 times a minute.

I started off by using my existing code from DIY Siri and removing any of the PubNub server integration. I then changed the code to analyze the transcripts of what was being said. If my name was mentioned, then log this information.

Iterating through the transcript to detect if the string "Olivia" was picked up
Iterating through the transcript to detect if the string “Olivia” was picked up

My next step was to connect my Adafruit IO channel to the page. I created a new feed titled “overheard” with two channels: listening, and transcripts. Listening would indicate whether or not my name was overheard, and transcripts would save whatever was being said about me.

After creating those two channels, I connected my voice to text API to Adafruit to see if I would be able to save the value “true” and the transcript of the conversation. I tested with “if my name is included in this transcript, send the data to Adafruit”. This was successful.

Upon the guidance from Adafruit, I started to create an applet of my own to connect this feed to my phone. I chose the if “this” (trigger) to be Adafruit IO, and the “then that” (action) to be an SMS text message. On the Adafruit side, I selected to monitor the feed “overheard” and the channel “listening”. If “listening” was equal to the data “true” then send a text message. The UX of IFTTT made it simple to connect the two platforms together.

 

Step 1 in IFTTT applet, find the "This" platform Adafruit
Step 1 in IFTTT applet, find the “This” platform Adafruit
Monitor the channel listening - overheard. If the channel has the value true, send an SMS trigger
Monitor the channel listening – overheard. If the channel has the value true, send an SMS trigger
Message that will be sent in the text message.
Message that will be sent in the text message.

 

I started testing my application with all of the parts now connected. At first, I was not receiving text messages. This was because I was sending Adafruit a boolean value and not a string. The “equal to” on the IFTTT side of the platform was comparing the channel value to the string “true”. I changed the value of what I was sending to Adafruit to a string and was able to receive a text message.

Screenshot of receiving multiple notifications in a row from the recursive loop.
Screenshot of receiving multiple notifications in a row from the recursive loop.

Once I received a text message, I resulted in receiving six in a row. I realized that my voice-to-text alert that played upon hearing my name was vocalizing my name out of the speakers, which in result my application was picking up. This created an infinite loop of alerts. “Alert alert Olivia has been notified that you mentioned her and received a text message”. I attempted to stop the recursive loop by turning off the voice recognition and restarting it. The issue was with each time a new voice recognition object is instantiated explicit permission from the user to have their microphone activated was required. A quick fix for this was so that I could continue development was to not use my name in the text to voice alert from my speakers. I chose to use “O Prior has been notified” rather than using my name, Olivia.

Screenshot of the recursive loop picking up the text to voice message.
Screenshot of the recursive loop picking up the text to voice message.

For the UX/UI of the application, I chose to use a simple button. When the application was not actively listening a button would appear that said “Shhhhhhh!”. If the button was clicked, a microphone permissions prompt would display requesting access. Once the application was listening to the room the entire screen would turn black to be “inconspicuous”. The stop button was designed to be black and appears if the cursor hovers overtop of the element. If the name Olivia is heard in conversation, then a .gif file plays showing someone pressing an alert button. The video and message loop twice before returning to a black screen.

Video demo of Eavesdropper

Challenges

One challenge I faced was attempting to connect two channel to the IFTTT applet. I wanted to additionally send the transcript as data through the SMS notification. The applet that was connected to Adafruit only allowed for the data of one channel to be used in the SMS. Due to the set up of the applet, I could only compare on direct values (such as greater than, is not equal too, etc.) This inhibited me from using the transcript channel as a trigger to send the message. Alternatively, I could have set up the applet so that it sent a message anytime the transcript channel was updated. With this method, I would have to be concerned with character length and substring the message to ensure that the data would not exceed the character limit for the SMS. I did not want to cut the transcript short, so I chose to use the boolean method. If the user wanted to see what was being said about them, they could investigate the transcript channel and use the time the text message was sent as an indicator for what was being said about them at that moment.

The other challenge I noted was the text to speech API. I had to use a function that checked many different iterations of “Olivia”. This included all different capitalizations of Olivia and with different punctuation. This was only an issue once so all of the iterations may not be necessary. The function that I used is incredibly useful if this program were adapted to listen for multiple keywords. The program loops through the transcript and checks for strings for a list of words that are kept in an array. The array can be modified to store whatever the user would want to be notified of in a conversation.

Next steps & conclusion

The next step for this project would be to find a way to use the data from both of the channels. Having different customized messages from the triggered conversation I think would provide a better experience for the user and would stop the application from being redundant.

This applet is an initial exploration into connecting a web application to IFTTT. Adafruit IO is a useful technology for networking Internet of Things (IoT) devices. In another iteration, it would be interesting to see how this applet could be applied to a bespoke IoT device, such as an Arduino voice to text sensor. The JavaScript voice to text API is still in beta mode, and even though the development on it is improving issues such as frequent permissions become an issue if the desired goal is continuous audio listening on a space. The JavaScript API is not a replacement for tools such as a Google Home or Alexa for this reason.

Overall IFTTT is an intuitive platform that uses simple and accessible logic that allows many people to create bespoke trigger and responses. Some limitations may be an issue if one were to have their applet on a grander scale, such as the SMS message limit of 100 per month. This web application is a simple demo of what can be achieved with lots of other potentials to adapt to more bespoke applications.

References & Research 

Configuring IFTTT using Adafruit to SMS

Function for finding multiple strings