PubNub & PoseNose

Olivia Prior
Ubiquitous Computing
Experiment 6

PubNub & PoseNose
GitHub
Working App Link (If by yourself for testing, you can use your phone as a second webcam)

Nose tracking using PoseNet and PubNub
Nose tracking using PoseNet and PubNub

Objective & Concept

Through PoseNet I wanted to explore networking with computer vision to visualize multiple people watching a screen together. This project tracks the position of the nose on every unique browser page and shares the data amongst the connected users. The tracking allows for users to be aware of the physical position of where the other users are. This creates a spatially aware sensation by either encouraging others to follow other “noses” or for users to move away and create their own space on the browser page.

Process

I followed along on Daniel Shiffman’s Coding Train tutorial where he explores the concepts of what is PoseNet, what is the data that is given, and how can you visualize the data. In his example, he visualizes a nose through an ellipse that follows the nose along with the user on the screen.

The most interesting (and surprisingly simple) part of PoseNet is that it simply changes your body into what could be perceived as “cursors” on the screen (x & y coordinates).

There are a few different examples of how to expand on this data with P5, such as creating a light nose. These examples with PoseNet are interesting because it uses the body as a controller. I had previously explored PoseNet in my Body Centric class in this blog post here where I expanded upon this concept.

For this project, to emphasize the ubiquity of the webcam, I wanted to explore what would it look like to have multiple people be visualized on the screen together.

Starting with Shiffman’s code for tracking the nose, I used this to create the ellipse that follows the nose along with the user on the open browser.

I then found the x and y coordinates of the nose and connected my page to PubNub to publish and subscribe to the position of the ellipse.

I followed along in a previous example from my Creation and Computation class in the fall that tracks how many users are on a webpage using PubNub. In this example, every user loaded on the page sends the x and y coordinates of their cursor on the webpage when they click. The code then takes the average of all the user coordinates and draws lines to the average point of all the cursors.

I connected my code of the nose to PubNub and sent the coordinates of my nose. I chose a nose because it is the center of the face, and most accurately depicts where the user is in relation to their screen. Upon receiving the subscribed data I would check to see if there was a “new nose”. If there was a new nose, that user would be added into the active user array of “All Noses”. Upon every time a message from PubNub was received I would check to see if their ID was in the array and if so then I would update the coordinates of where they are on the screen.

 

Two noses chasing each other on the page. 

The code then loops through the array and draws an ellipse with the received/sent coordinates of the user’s noses. When the user leaves, the ellipse stays there which shows a trace of all the users that have been active on the page.

Along with sending the x & y coordinates, I also sent along to PubNub the RGB values of the user’s nose ellipse. This was to differentiate the different user’s on the page and also allow the user’s to uniquely identify themselves on other’s browsers.

Video documentation of moving around another nose.

Results

The interaction of the two noses was interesting because it prompted either an aversion of the different noses overlapping or an urge to make the dots touch. The action of moving your nose along the screen was not as direct as it was perceived. The motion was laggy, which prompted by jerky motions from the users.

This experiment was an initial exploration into mapping multiple physical spaces together into a virtual space. In further iterations, I would make an action occur if the different positions of the noses overlapped on the browser page. I chose not to this time because I did not want to make anything that could be interpreted as a game mechanic. I wanted to see what the actual reaction would be amongst a few users. In another iteration, I would include other parts of the body to track, such as eyes, or wrist. The single tracking of the nose was effective for tracking the facial position of the user which is the most common angle seen from sitting down at a webcam (while working at a computer).

Overall I am interested in exploring networking computer vision technologies further in a pursuit to examine the multiplicity of spaces and existences we inhabit simultaneously.

Resources

PoseNet Coding Train

Creation and Compution

Body Centric Blog Post 

PubNub

Eavesdropper

Workshop Notes #5
Eavesdropper
Ubiquitous Computing
Olivia Prior

Github
Web Application 

Eavesdropped home page
Web application “Eavesdropper” home page.

Concept

Eavesdropper is a web application that actively listens for the mention of a user’s name spoken in conversation. The application uses a voice to text API that transcribes conversations that are in listening range. The transcriptions are analyzed, and if the name of someone is said the application will sound an alert noting that that person is going to be notified through text message. Simultaneously, the clip of what was being said around the user is saved. The user then receives a text message and can go see what was being said about them.

Objective

Eavesdropper is an exploration into creating an applet on the platform If This Then That (IFTTT) using Adafruit IO. The purpose of IFTTT is to make a trigger and a response. I wanted to create an accessible and customizable web page that anyone could use as an applet. The JavaScript voice to text API analyzes what is being said throughout the space that the application is opened in. If the text detects the name is sends two pieces of data to Adafruit IO: the snippet of conversation containing the user’s name and a “true” statement. IFTTT is linked with Adafruit IO; if the channel data matches “true” then the applet sends a text message to the associated user letting them know that someone is mentioning them in conversation. The web application simultaneously uses the text to voice API to announce a message to the party that set off the trigger. This applet is simple to set up, allowing anyone to create a transcription analyzer that can notify them of anything they so wish.

Process

Building upon my previous voice to text “DIY Siri” project, I wanted to play around with the concept “what if my computer could notify me if it heard something specific?”. I initially thought that it would be interesting to build directly off of the Wolf Ram Alpha API from the DIY Siri project to notify me if something specific was searched. From here I decided that I wanted to isolate the working parts and start with the concept of “the application hears a specific word, the user gets a notification”. I chose to use names as a trigger because they are rare enough that the trigger would not be sent frequently. This is important because both IFTTT and Adafruit IO have data sending and receiving limits. IFTTT has a limit of sending up to 100 text messages a month, and Adafruit IO has a limit of updating channels 30 times a minute.

I started off by using my existing code from DIY Siri and removing any of the PubNub server integration. I then changed the code to analyze the transcripts of what was being said. If my name was mentioned, then log this information.

Iterating through the transcript to detect if the string "Olivia" was picked up
Iterating through the transcript to detect if the string “Olivia” was picked up

My next step was to connect my Adafruit IO channel to the page. I created a new feed titled “overheard” with two channels: listening, and transcripts. Listening would indicate whether or not my name was overheard, and transcripts would save whatever was being said about me.

After creating those two channels, I connected my voice to text API to Adafruit to see if I would be able to save the value “true” and the transcript of the conversation. I tested with “if my name is included in this transcript, send the data to Adafruit”. This was successful.

Upon the guidance from Adafruit, I started to create an applet of my own to connect this feed to my phone. I chose the if “this” (trigger) to be Adafruit IO, and the “then that” (action) to be an SMS text message. On the Adafruit side, I selected to monitor the feed “overheard” and the channel “listening”. If “listening” was equal to the data “true” then send a text message. The UX of IFTTT made it simple to connect the two platforms together.

 

Step 1 in IFTTT applet, find the "This" platform Adafruit
Step 1 in IFTTT applet, find the “This” platform Adafruit
Monitor the channel listening - overheard. If the channel has the value true, send an SMS trigger
Monitor the channel listening – overheard. If the channel has the value true, send an SMS trigger
Message that will be sent in the text message.
Message that will be sent in the text message.

 

I started testing my application with all of the parts now connected. At first, I was not receiving text messages. This was because I was sending Adafruit a boolean value and not a string. The “equal to” on the IFTTT side of the platform was comparing the channel value to the string “true”. I changed the value of what I was sending to Adafruit to a string and was able to receive a text message.

Screenshot of receiving multiple notifications in a row from the recursive loop.
Screenshot of receiving multiple notifications in a row from the recursive loop.

Once I received a text message, I resulted in receiving six in a row. I realized that my voice-to-text alert that played upon hearing my name was vocalizing my name out of the speakers, which in result my application was picking up. This created an infinite loop of alerts. “Alert alert Olivia has been notified that you mentioned her and received a text message”. I attempted to stop the recursive loop by turning off the voice recognition and restarting it. The issue was with each time a new voice recognition object is instantiated explicit permission from the user to have their microphone activated was required. A quick fix for this was so that I could continue development was to not use my name in the text to voice alert from my speakers. I chose to use “O Prior has been notified” rather than using my name, Olivia.

Screenshot of the recursive loop picking up the text to voice message.
Screenshot of the recursive loop picking up the text to voice message.

For the UX/UI of the application, I chose to use a simple button. When the application was not actively listening a button would appear that said “Shhhhhhh!”. If the button was clicked, a microphone permissions prompt would display requesting access. Once the application was listening to the room the entire screen would turn black to be “inconspicuous”. The stop button was designed to be black and appears if the cursor hovers overtop of the element. If the name Olivia is heard in conversation, then a .gif file plays showing someone pressing an alert button. The video and message loop twice before returning to a black screen.

Video demo of Eavesdropper

Challenges

One challenge I faced was attempting to connect two channel to the IFTTT applet. I wanted to additionally send the transcript as data through the SMS notification. The applet that was connected to Adafruit only allowed for the data of one channel to be used in the SMS. Due to the set up of the applet, I could only compare on direct values (such as greater than, is not equal too, etc.) This inhibited me from using the transcript channel as a trigger to send the message. Alternatively, I could have set up the applet so that it sent a message anytime the transcript channel was updated. With this method, I would have to be concerned with character length and substring the message to ensure that the data would not exceed the character limit for the SMS. I did not want to cut the transcript short, so I chose to use the boolean method. If the user wanted to see what was being said about them, they could investigate the transcript channel and use the time the text message was sent as an indicator for what was being said about them at that moment.

The other challenge I noted was the text to speech API. I had to use a function that checked many different iterations of “Olivia”. This included all different capitalizations of Olivia and with different punctuation. This was only an issue once so all of the iterations may not be necessary. The function that I used is incredibly useful if this program were adapted to listen for multiple keywords. The program loops through the transcript and checks for strings for a list of words that are kept in an array. The array can be modified to store whatever the user would want to be notified of in a conversation.

Next steps & conclusion

The next step for this project would be to find a way to use the data from both of the channels. Having different customized messages from the triggered conversation I think would provide a better experience for the user and would stop the application from being redundant.

This applet is an initial exploration into connecting a web application to IFTTT. Adafruit IO is a useful technology for networking Internet of Things (IoT) devices. In another iteration, it would be interesting to see how this applet could be applied to a bespoke IoT device, such as an Arduino voice to text sensor. The JavaScript voice to text API is still in beta mode, and even though the development on it is improving issues such as frequent permissions become an issue if the desired goal is continuous audio listening on a space. The JavaScript API is not a replacement for tools such as a Google Home or Alexa for this reason.

Overall IFTTT is an intuitive platform that uses simple and accessible logic that allows many people to create bespoke trigger and responses. Some limitations may be an issue if one were to have their applet on a grander scale, such as the SMS message limit of 100 per month. This web application is a simple demo of what can be achieved with lots of other potentials to adapt to more bespoke applications.

References & Research 

Configuring IFTTT using Adafruit to SMS

Function for finding multiple strings

 

DIY Siri

DIY Siri

Ubiquitous Computing Process Journal #4
GitHub
Website Application
By Olivia Prior

DIY Siri Web application screen shot
DIY Siri web application screenshot

Concept

DIY Siri is a web application that uses a combination of the JavaScript built text to speech functionality and Wolf Ram Alpha API in an attempt to answer questions asked through voice and respond back through sound. The web application prompts the user to ask a question by “speaking out loud” and is given a “speech to text” response in return. The web application is open to all questions being asked, so the user is able to view and listen to the responses given by anyone else using the application in parallel. This web application is a small demonstration of how the ubiquitous smart devices that occupy households can be accessible for customization.

Objective

With this assignment, I was curious about how does one actually construct their own Siri, Alexa, or Google Home through readily available APIs online. In another class, I had explored the JavaScript “Text to Speech” functions through continuous listening. I found that that the JavaScript functionality was clunky and would time out. When introduced to this subject of API’s in class I thought it would be an interesting experiment to see if 1) If using explicit commands such as start and stop buttons would lend itself to a smoother use of the JavaScript functions and 2) is it really that simple to connect JavaScript to an “all knowing” API like Wolf Ram Alpha to craft a simple at home assistant?

Tools & Software

– Text-editor of your choice
– Wolf Ram Alpha API developer key
– PubNub Account for connecting the web page to the Wolf Ram Alpha API
– JQuery JS

Process

The base code I used for the JavaScript is followed closely to this tutorial that demonstrates the many different types of uses that could be applied. The tutorial shows the ability to start a recording, document the voice to text as someone is speaking, pause/stop the recording, and save the recording to the local notes. Once the recording is saved to the local notes, the user has the ability to “listen” to those notes through a browser audio-based speech to text function. 

My first step was to isolate the steps of “I am talking and asking a question” to “I need to send this question off into the ether”.

I created a speech recognition object and two buttons: the first was a record button and the second was a stop. I made the button visibility toggle for UX purposes: the user can only press ask or stop.

Video demo of toggling the button 

The first button was actively listening through the JavaScript function “.start()” which is an available function from the speech recognition object.

Screenshot of instantiating the screen recognition object.
Screenshot of instantiating the screen recognition object.

The second button had an on click event that executed “.stop()”. The start would transcribe what the user was saying and input it into a text area on the page. This worked surprisingly well. When I pressed stop, the microphone would turn off.

The next step was tying it into the Wolf Ram Alpha code we had used with p5.js in class. I took the code that had my PubNub module that connected my account to the Wolf Ram Alpha Developer API. I took the value of what I had transcribed into the text area on the page and sent it as a message through PubNub. Just as if I was typing what I was saying, I received a message response to my question that was from Wolf Ram Alpha. I outputted the message onto the web page.

My next step was to connect the speech to text so that the answer could playback audibly. I created a new object “SpeechSynthesisUtterance” which is a standard JavaScript object. I passed the message from Wolf Ram Alpha into the function .speech() and the browser responded with the answer to my question.

Screenshot of the text to speech object being instantiated.
Screenshot of the text to speech object being instantiated.

DIY Siri Demo

Challenges & Next Steps 

Upon testing this web application, I sent a link to my friend in a different province. I was fixing my CSS and all of a sudden heard my browser start talking. The way I had set up my PubNub server on the web application was that everyone who had access to the application had the ability to listen to whatever was being asked. Initially, I started to fix the issue. Upon reflection, I realized that the ability to listen to whatever anyone is asking brings up interesting connotations of security and surveillance, especially in an open source concept. I decided to keep this the same and to test it out in class and see what the reception would be from my classmates.

A next step I considered was to store previously asked questions so that the user could quickly press and re-ask if needed such as “what is the weather”, etc. Once I had encountered that my web application was able to listen to anyone’s question, anywhere, I decided that this was more of a listening tool rather than an asking application. If I were to enforce individual privacy on this application, I would consider storing the frequently asked questions in the local browser storage.

Since I was attracted to the idea of listening, I think I would make it more apparent that multiple people were on the application and asking questions. It makes it a much more collaborative experience and could be elevated to a more polished art piece. Currently, this application lays in the world between tool and commentary and needs refining touches on either spectrum to make it more of a complete experience. Until then, this is a simple, basic, DIY Siri that allows you to ask questions through voice.

References and Resources

 

Speech to Text Tutorial 

Mozzila Speech to Text Documentation 

Process Journal #1: XBee + Metronomes

Xbee Metronome
Process Journal #1
Olivia Prior

LCD screen

 

In this class, our assignment was to explore XBee radio communication through the concept of high (“H”) and low (“L”) commands as incoming data. Our class was each given an XBee radio, and an XBee breakout board to experiment transmitting and receiving radio commands. Our final assignment was to set up our XBee radios to control an actuator of our choice on Arduino that would respond to a “metronome” that was transmitting a count from an accompanying  XBee.

Part 1: Testing with the XBee Radios

Step 1: Internally sending and receiving

The first step was to connect my XBee radio to CoolTerm to test if my Arduino Micro would respond to high and low commands. I opened up CoolTerm, connected to the serial port that my Arduino was hooked up to, and tested typing “H” and “L” commands to turn on and off an LED. I had no problem with this working.

Using CoolTerm as serial communication, I type H to turn the LED on and L to off the LED.
Step 2: Connecting to a friend
Upon my initial testing, many students in the class had upfront issues with connecting to another XBee radio. I paired with a classmate in the studio and we switched our ATID to be on the same channel, and I changed my ATDL to match their ATMY. At first, my LED light was not responding immediately so we were unsure if the connection was working. We then realized there was a small lag within the communication, but were satisfied with the results knowing that we were able to transmit and receive commands to each other’s Arduinos.
To ensure that this was completely wireless, I changed the Serial library in the code to use “Serial1” to allow my Arduino to be disconnected from the machine.

A classmate and I testing out our transmitting and receiving with each other. They are transmitting “H” and “L” to control the LED light on the Arduino. 

 Step 3: Playing with new actuators
I removed the single LED and connected a small strip of addressable LED lights to my circuit. I jumped 5v to on side of the circuit as to not overpower the breakout board for the XBee on the other side which requires 3.3V. Using the same “H” and “L” commands as before, I used this data to turn on and off the strip lights. I used the AdaFruit NeoPixel library to control these LEDs.

Turning the LED lights on and off using “H” and “L” commands. 

Step 4: Changing the code

I was inspired by the realization that the Philip’s Hue Lights use a type of radio communication as controls. I have a few within my own possession and wanted to see if I could create a very simplified version. I copied the “H” and “L” command code, but rather than simply turning on and off the lights, I used different keys to change the colour of the strips. Here in the video below, I use the commands “R” for red, “G” for green, “B” for blue, “P” for pink, “Y” for yellow, “O” for orange, and “W” for white.

Creating a simplified version of Philip’s Hue Light by transmitting simple letter commands to control the hue of the LED strip. 

Part 1 overall outcome

At this point in my radio experimentation journey, the most exciting part is the ability to control one, or multiple other, devices through transmitting commands. I feel quite inspired through the mock “Philip’s Hue Lights” test to create smaller bespoke lamps for my own bedroom.

When testing by myself, the simple commands for turning the actuators on and off do not feel that different from what I have the ability to do with an Arduino solely.

When testing with others, I found that it was interesting to see the differences in “lag” depending on the radio. The classmate I was testing with had about a second delay on their commands, which led for confusion when attempting a proof of concept. The lag only went one way; when I would send commands their LED would turn on and off almost instantly. I wonder if this has anything to do with the machine speed on either side.

Part 2: Metronome

For this part of the experiment, I wanted to count how many “beats per a minute” the metronome was outputting. I decided upon this after choosing as my output, which was an LCD display.

Step 1: Choosing an output

For this experiment, I rifled through my toolkit to see what I had that would be newer and interesting to play with. I found an LCD LiquidDisplays that I had inherited from my Dad’s electronic supply and decided the use it.

LCD LiquidDisplay
LCD LiquidDisplay

I found readily available documentation for the LCD screen on AdaFruit. I followed the tutorial and connected the screen to my Arduino Uno. I was able to get a “hello world” screen up and counting the seconds that have passed.

LCD screen displaying the LiquidCrystal example which prints out “hello, world!” and on the second line counts how many seconds the program has been on.

Step 2: Connecting the XBee and LED lights

I then moved the connections to my Arduino Micro. I used the same code that worked for initial experimentation that made the addressable LED lights turn on and off. Rather than simply turning the LEDs on and off, I changed the brightness of them. I found that if I turned them on and off fully, it was too much of a drain of the power in the circuit. This would cause the LCD screen to flicker. As well, on the high command, on the screen I printed out “high” and on the low command I printed out “low”.

LCD screen connected to CoolTerm, and receiving “High” and “Low” commands to change the input of the screen, and as well the brightness of the LED strip. 

Step 3: Counting the beats

Because I was testing this at home, I wrote a pseudo metronome in my code to mimic different speeds of a count.

Mock metronome for testing purposes
Mock metronome for testing purposes

I would change the value of the metronome to get varying results. I would count the passing millisecond’s in-between counts, take 60000 and divide the result, and multiply by two to take into account the offbeat. I took this count and printed it out to the LCD screen.

LCD screen displaying a rudimentary BPM of the program

Step 4: What are you dancing too?

I took the BPM of the mock metronome, and then created statements that would print out what type of music correlated to the actual bpm that was listed. If the BPM was lower than 60 or higher than 200, messages of “slow is also good” come up, or a warning saying “Don’t dance too fast!” appear.

 

LCD screen showing the bpm of the program & what type of dance music is associated with that beat.

The one bug I have yet to fix is that the program initially detects that the bpm is half of what it is within the first couple of counts. It quickly fixes itself upon the second round of metronome counts.

Part 2 overall

I found this program to be interesting because it was not simply turning something “on” or “off”. It dynamically takes in data and will be able to print out the data accordingly so.

Testing out the screen with the metronome did not give as specific data as I thought. The range for the metronome was 100 – 5000 milliseconds, while my code is optimized for a range 200-1500 milliseconds. This is not necessarily a bad thing, as it requires someone to slowly go through the range as opposed to just cranking the potentiometer in either direction.

Overall the experiment was an interesting exercise in communication. It was interesting to see what other folks did with the metronome data in the class. Some interpreted the data to count every fourth beat, while others used the data to control servo-motors. The work altogether was interesting to see, and because of the nature of the rhythm of the metronome, it seemed to all connect in a collage-like way.

Liquid Crystal: https://www.arduino.cc/en/Reference/LiquidCrystal

Tutorial for LCD screen: https://learn.adafruit.com/adafruit-arduino-lesson-11-lcd-displays-1

Different music types: https://accusonus.com/sites/default/files/media/blog/Electronic_music_tempo-.pdf