Sign Language with Ai

Code to this project: https://github.com/imaginere/UbiComp_Exp6

After looking through all the examples I felt a bit stumped as most of them seemed like they were already finished products and repurposing them in terms of code seemed beyond my current skills.

What appealed to me was using the Ai engines as an input device to make it learn different hand gestures, the idea was to make the sign alphabet shown below.

sign-chart
Copyright: Cavallini & Co. Sign Language Chart Poster

The KNNClissification example had a rock paper scissors example which was doing hand recognition and was the perfect start.

It took some tinkering with the source code to figure out what it was doing and how the html was rendering things from the .js file. This took a fair amount of time as most of the code was very alien looking and some very concise shortcut methods were used to optimize the code.

I had to stop and go back to Nick’s video to understand what tensorFlow and Ml5 were doing and how this all fits together as my code kept breaking and only the initial 3 values kept showing. A little mind mapping of the tensorflow and ml5 ecosystem cleared the fog and I could finally get the engine working with 5 values, it was understanding how the two things connected that helped me see what I was trying to do with the code, as most of it was copy-pasting code snippets.

mind map

The Idea for using ml5 and p5:
I had now the framework for the I wanted to use, I did not do the traditional chart as my idea was that I could use hand gestures instead. Like Stop, attention, Thumbs Up etc, this would allow me to make a game of it, where two players could train their own sets and ping-pong signs before the time ran out the winner would be the best at getting predictions precise and also switching back and forth in time before the timer ran out.

This was the charts I used to get the icons from01df

I mapped these in the window:

204

The next part was training the different symbols to recognize the hand gestures, this was fairly straight forward but I could not get the Ai to load the data set I had saved, I tried replacing the permissions but it did not work when I clicked load, I kept having to retrain the data sets to get them to function.

The next thing I want to try is getting p5 to trigger an animation based on the gesture that was 100% I did this with a very simple example to a rotating cube which would turn based on the no that was 100%. This was testing the concept but the ultimate goal was to make a meter which would be like a gauge for where the most accurate recognition was.

206

This was a proof of concept where the circle would be then replaced with a meter. I tried making this purely in p5 but it was way too much math for such a simple shape, so I made an image in illustrator and imported it.

207

The final result does not have a working meter or a game but they are both edging towards that final outcome the where you would the final game where two users return the gesture they are given and then trigger a new one if they are unable to return the gesture in time they lose a point. When you return the gesture the timer gets faster on the next return, till its too hard to compete and one person loses a point. The game ends at 10 points.

The concept for the game:

209

The above is mockUp of what I would like to create with the AI engine, I still have a long way to go to realize this but I have some of the pieces in place. It would use ml5->p5->PubNub->p5

 

References:

https://github.com/ml5js/ml5 examples/tree/release/p5js/KNNClassification/KNNClassification_Video

Special Thanks to Omid Ettehadi for help with understanding the code.
Icons Designed by Freepik

 

 

 

 

 

 

The new

PubNub & PoseNose

Olivia Prior
Ubiquitous Computing
Experiment 6

PubNub & PoseNose
GitHub
Working App Link (If by yourself for testing, you can use your phone as a second webcam)

Nose tracking using PoseNet and PubNub
Nose tracking using PoseNet and PubNub

Objective & Concept

Through PoseNet I wanted to explore networking with computer vision to visualize multiple people watching a screen together. This project tracks the position of the nose on every unique browser page and shares the data amongst the connected users. The tracking allows for users to be aware of the physical position of where the other users are. This creates a spatially aware sensation by either encouraging others to follow other “noses” or for users to move away and create their own space on the browser page.

Process

I followed along on Daniel Shiffman’s Coding Train tutorial where he explores the concepts of what is PoseNet, what is the data that is given, and how can you visualize the data. In his example, he visualizes a nose through an ellipse that follows the nose along with the user on the screen.

The most interesting (and surprisingly simple) part of PoseNet is that it simply changes your body into what could be perceived as “cursors” on the screen (x & y coordinates).

There are a few different examples of how to expand on this data with P5, such as creating a light nose. These examples with PoseNet are interesting because it uses the body as a controller. I had previously explored PoseNet in my Body Centric class in this blog post here where I expanded upon this concept.

For this project, to emphasize the ubiquity of the webcam, I wanted to explore what would it look like to have multiple people be visualized on the screen together.

Starting with Shiffman’s code for tracking the nose, I used this to create the ellipse that follows the nose along with the user on the open browser.

I then found the x and y coordinates of the nose and connected my page to PubNub to publish and subscribe to the position of the ellipse.

I followed along in a previous example from my Creation and Computation class in the fall that tracks how many users are on a webpage using PubNub. In this example, every user loaded on the page sends the x and y coordinates of their cursor on the webpage when they click. The code then takes the average of all the user coordinates and draws lines to the average point of all the cursors.

I connected my code of the nose to PubNub and sent the coordinates of my nose. I chose a nose because it is the center of the face, and most accurately depicts where the user is in relation to their screen. Upon receiving the subscribed data I would check to see if there was a “new nose”. If there was a new nose, that user would be added into the active user array of “All Noses”. Upon every time a message from PubNub was received I would check to see if their ID was in the array and if so then I would update the coordinates of where they are on the screen.

 

Two noses chasing each other on the page. 

The code then loops through the array and draws an ellipse with the received/sent coordinates of the user’s noses. When the user leaves, the ellipse stays there which shows a trace of all the users that have been active on the page.

Along with sending the x & y coordinates, I also sent along to PubNub the RGB values of the user’s nose ellipse. This was to differentiate the different user’s on the page and also allow the user’s to uniquely identify themselves on other’s browsers.

Video documentation of moving around another nose.

Results

The interaction of the two noses was interesting because it prompted either an aversion of the different noses overlapping or an urge to make the dots touch. The action of moving your nose along the screen was not as direct as it was perceived. The motion was laggy, which prompted by jerky motions from the users.

This experiment was an initial exploration into mapping multiple physical spaces together into a virtual space. In further iterations, I would make an action occur if the different positions of the noses overlapped on the browser page. I chose not to this time because I did not want to make anything that could be interpreted as a game mechanic. I wanted to see what the actual reaction would be amongst a few users. In another iteration, I would include other parts of the body to track, such as eyes, or wrist. The single tracking of the nose was effective for tracking the facial position of the user which is the most common angle seen from sitting down at a webcam (while working at a computer).

Overall I am interested in exploring networking computer vision technologies further in a pursuit to examine the multiplicity of spaces and existences we inhabit simultaneously.

Resources

PoseNet Coding Train

Creation and Compution

Body Centric Blog Post 

PubNub

Intro to ML5_Exploring StyleTransfer Example

Process Journal #6

2

Learning: ML5&P5

Computer visions seems very interesting to me. I checked the 4 videos posted on canvas and started this assignment by going through all the examples available on GitHub and the ML5 website. I just want to use some tensorflow.js model to implement some functions in the web. Then I don’t need to learn tensorflow. js systematically, I found I just need to use a ready-made model packaged as an NPM package. Such as MobileNet (image classification), coco-ssd (object detection),PoseNet (human gesture recognition), roll commands (voice recognition). The NPM pages for these models have detailed code examples that you can copy. There are also a number of third-party developments of off-the-shelf model packages, such as ML5, which includes pix2pix, SketchRNN and other fun models. We were asked to build upon one of the existing ML5 examples by changing the graphic or hardware input or output. I found StyleTransfer example is quite interesting, so I decided to work on that example. I already had the chance to explore PoseNet in the Body-Centric course. For this project, I decided to explore something different. I still used the webcams and picture, I decided to experiment with the style transfer example in the ML5.

Concept:

It is an expansion on the style transfer example in ml5, where users select the paintings of their favorite artists as the materials within a limited range of choices, and the selected paintings will change the style of real-time images, thinking of a unique abstract painting video.

Process:

Firstly, the position detector detects the movement of the object. When the object moves to the visual center of the camera system, the detector immediately sends a signal to the image acquisition part to trigger the pulse.

Then, according to a predetermined program and delay, the image acquisition section sends pulses to the camera and lighting system, and both the camera machine and the light source are turned on.

The camera then starts a new scan. The camera opens the exposure mechanism before starting a new frame scan, and the exposure time can be pre-set. Turn on the lighting source at the same time. The lighting time should match the exposure time of the camera.

At this point, the screen scanning and output officially began. The image acquisition part obtains digital image or video through A/D mode conversion. At the same time, the obtained digital image/video is stored in the memory of the processor or computer, and then the processor processes, analyzes and recognizes the image.

StyleTransfer example:

Step1: allow styletransfer to use camera

screen-shot-2019-04-19-at-5-01-35-pm

Step 2: select the art work

screen-shot-2019-04-19-at-5-16-00-pm

(Start and stop the transfer process)

Step 3: check new synthesized video

screen-shot-2019-04-19-at-5-01-55-pm

In the example, there is only one painting, and the composition of video is too abstract. I wanted to give users more choices, so I tried several other paintings to see if they have different effects.

a

screen-shot-2019-04-19-at-5-20-29-pm

Now I have a big problem, that is, there is no big difference between the color and composition of video between the chrysanthemum painting and the abstract painting. I don’t know what the problem is. Then, I tried an abstract painting in blue.

screen-shot-2019-04-19-at-5-28-53-pm

screen-shot-2019-04-19-at-5-28-04-pm

The difference is so small that I don’t know what to do with the picture. The naked eye can only see very slight differences. But I’m going to make the framework so users can choose different images.

1

Users select the paintings of the artists they like as the materials, and the selected paintings will change the style of real-time images, as a unique abstract painting video.

2

References:

https://www.npmjs.com/package/@tensorflow-models/posenet

https://github.com/ml5js/ml5-examples/tree/release/p5js

https://canvas.ocadu.ca/courses/28083/pages/intro-to-ml5?module_item_id=226079

Exploring PoseNet & P5.js & ml5.js

For this workshop I chose to continue my explorations of PoseNet, which allows for real-time human pose estimation in the browser, having started learning the framework in the Body-Centric Technologies class. I wanted to try working with images and PoseNet but I couldn’t get the PoseNet model to track whenever I used images…my workaround for this was to group the images into a video. My idea was to explore how women posed on magazine covers by comparing poses from different fashion magazine covers. Below is a video showing the final results of poses that were captured when working with the PoseNet with webcam example.

I found that the body-tracking worked well only if I had the size of the video set to width (640 pixels) and height(480 pixels) which were the dimensions  used in the ml5 examples.

What is ml5.js? A wrapper for tensorflow.js that makes machine learning more approachable for creative coders and artists. It is built on top of tensorflow.js, accessed in the browser, and requires no dependencies installed apart from regular p5.js libraries.

NOTE: To use ml5.js you need to be running a local server. If you don’t have a localhost setup you can test your code in the p5.js web browser – you’ll need to create an account.

I also found that the multi-pose tracking seemed to tap off at 3 poses max tracked whenever there were more than 3 poses. Additionally, the model’s skin color affected the tracking so that at times some body parts were not tracked. I also found that the model’s clothes also affected whether some parts were tracked or not. At times the models limbs were ignored or the clothes were tracked as additional limbs. The keypoints seemed to be detected all the time but the lines for the skeleton were not always completed. What are keypoints? These are 17 datapoints that PoseNet returns and they reference different locations in the body/skeleton of a pose. They are returned in an array where the indices 0 to 16 reference a particular part of the body e.g in the array index 0 contains results about the nose such as x,y co-ordinates and percentage of detection accuracy.

Below are some of the images I tested with:

mag_covers

I’d like to continue working on this however I would like to explore using OpenPose which is a framework like PoseNet that provides more keypoints tracked as compared to PoseNet’s 17 keypoints. From my working with PoseNet so far, I find that it is more beneficial in areas where you aren’t tracking a skeleton but are doing something more with the data gotten back from keypoints e.g. right eye is at this x and y position so do certain action.

I tried some of the other ml5 examples however I wasn’t satisfied with the results. I was particularly interested in the style transfer and the interactive text generator. However, I found that in order for them to be useful to me, I would have to train my own custom models and I didn’t have the time and adequate dataset to do this.

I also tried out the video classification example where I was toying around with the idea of having the algorithm detect an image in a video and explored for a video classification. It quickly dawned on me that this was a case for a custom model as the pre-trained model seemed to only work best when generic objects were in view. e.g. At times it recognized my face as a basketball, my hand as a band-aid, my hair as an abaya etc. I also noticed that if I brought the objects closer to the screen, the detection was slightly better. Below are some of my findings using MobileNet Video Classification in p5.

mnet-768x1386

Pros & Cons of using a pre-trained model vs. a custom model? When using a pre-trained model like PoseNet & tensorflow.js a lot of the work has already been done for you. Creating a custom model is beneficial only if you are looking to capture a particular pose e.g. If you want to train the machine on your own body but in order to do this you will need tons of data. Think 1000s or even hundred of thousands of images, or 3D motion capture to get it right. You could crowdsource the images however you have to think of issues of copyright and your own bias of who is in the images and where they are in the world. It is imperative to be ethical in your thinking and choices.

Another issue to keep in mind is diversity of your source images as this may cause problems down the line when it comes to recognizing different genders or races. Pre-trained models too are not infallible and is recommended that you test out models before you commit to them.

Do you think you can “Sing”?

Code: https://webspace.ocad.ca/~3170557/UbiquitousComputing/Week6/CanYouSing.rar

untitled-1

I started this experiment by going through all the examples available on the ML5 website.  I found the webcam classification quite interesting, but also difficult to work with because of how sensitive it was to background objects. In addition, I already had the chance to explore PoseNet in the Body-Centric course. So for this project, I decided to explore something different. Moving away from webcams and picture, I decided to experiment with the pitch detection example in the ML5. Having already done work in digital signal processing (DSP), I found it quite fascinating how quickly and accurately the software was able to identify the musical note in the piano example. I wanted to modify the example, creating an user interaction with the algorithm using the existing data available to provide a tool for them to practice their musical skill.

“Can You Sing?” is an expansion on the Piano example, where the user can select the note they are trying to mimic to practice specific notes. They software indicates when they user has successfully mimic the sound by highlighting the key in green. Only then the user is allowed to select another key to repeat the experience.

To do that, I had to create a way for the user to select the note that they wanted to mimic. I divided the heights of the keys into two. The top half would look for black keys and the bottom half looks for the white keys. Every time the a mouse key is pressed, the Y location of the mouse is checked. if it is in the bottom half of the piano shape, then base on the X location of the mouse, a white key is selected. If the Y position is in the top half, base on the X position a black key is selected.untitled-2

After the key is selected, user’s voice is converted into a note and drawn on the screen. Only if the user’s input is the same as the selected note, the note will change color to green. After that it requests from the user to select another note.untitled-3

In  the other examples, I found the voice command also very interesting. So I added it to this program so that every time the user matches the selected key a voice will say “Nicely Done!”. This was purely so that I could also explore this feature of ML5.

untitled-4Setting up the example was very challenging. For some reason the device software would not always receive data from the microphone which I found irritating. I had to restart the browser each time that I made some changes to the code. I wasn’t able to figure out if the problem was because of the libraries or it was just the local server, but it took a few tries each time to run the application.

But the most challenging part of the program was to identify the location for each note on the screen so that the user could choose their desired note by clicking on it on the piano shape. After a few tries I was able to get a good understanding of how they were drawn and was able to use the same technique to identify which position is associated with which key. I did end of dividing the keys into half base on their Y location just to simply the separation between the black and the white keys.

What I found useful was how the algorithm could detect the note independent on the octave. Plus, the speed that it was able to process the data and its accuracy made it an ideal tool for musicians. This can simply be used as an online tuner for almost all musical instruments which I find quit useful.

References:

ML5 – https://ml5js.org/

ML5 Examples – https://github.com/ml5js/ml5-examples

ML5 Pitch Detection Documentation – https://ml5js.org/docs/PitchDetection

Tossing The Ball Around

doc4

GitHub

This is a fairly simple engagement with PoseNet. Tossing The Ball Around is a ball rendered in P5 that changes size based on the space between the user’s arms.

I started by messing around with some more ambitious ideas. I looked at trying to get PoseNet to create animated skeletons using .gif files in place of images. However, it seems that PoseNet doesn’t integrate with animated gifs easily.

After this I spent a fair bit of time trying to put together what I was thinking of as an “AR Costume”

docu1

I played with PoseNet and brought the code back to a place where I had access to all of the key points, and to a place where I was comfortable modifying things.

Then I went to the p5 Reference site for some tutorials on how to make particles. I imagined a series of animated streamers emanating from every key point, covering the user like a suit of grass.

docu2

I was able to create a particle system but I was not able to implement it in a way I was happy with, with multiple instances of it emanating from multiple points on the body.

I settled on experimenting with finding origins for drawings that were outside of the key points returned by PoseNet. I took the key points from the wrists, and drew an ellipse at the origin between them. I used the dist() function to measure the difference between the two user’s two wrists and return a circle that changes size based on the user’s movements. The effect is similar to that of holding a ball or a balloon that adjusts itself constantly.

docu3

I played around with coloration and making the image more complex, but I decided to quit while I was ahead.

Throughout the process, I tried to make use of coding techniques I had learned throughout the year. When I had first approached programming I knew nothing beyond the very basics. In coding this project I tried to use concepts like arrays, passing variables into functions, and object-oriented programming. I still have a lot more work to do to get comfortable, but this project demonstrated to me how far I’ve come.

Resources consulted:

https://ml5js.org/

https://ml5js.org/docs/PoseNet

https://github.com/tensorflow/tfjs-models/tree/master/posenet

https://p5js.org/reference/

The Real-Time Virtual Sunglasses

My interest in ML5 is focused on real-time animated effects. Compared to other professional software such as Adobe Character, to make real-time face animation using ML5 is more customizable and simpler. Though the result may not be so highly finished, it is a great choice for designers and coders to produce visual work.

I found it easier for me to just use the p5 editor, however the ML5 script needs to be put in the HTML file in the p5 editor. (the fourth “script”)screenshot-2019-04-17-at-10-35-38

The model used is poseNet. It allows real-time human pose estimation, it can track for example where my eyes, nose, hands are and then build visual work on those positions.

screenshot-2019-04-17-at-10-50-26

Then I set the canvas and draw functions in the p5 editor, I used the gray filter to add more fun.

screenshot-2019-04-17-at-10-35-50

Program the poseNet into my coding. When everything is settled, we can see that the ml5 recognizes “object, object, object (which should be my face)…” from the WebCam.

screenshot-2019-04-17-at-11-15-11

After some research, I learned that nose to feet are coded as 0 to 16 in poseNet. The left eye and the right eye should be 1 and 2.

screenshot-2019-04-17-at-13-47-03

The first try:

screenshot-2019-04-17-at-14-15-31

01-2019-04-17-17_16_35

As the gif showed, if I move out of the screen the circles will not be able to track back.

The second try solved it: (if (poses.length > 0))

screenshot-2019-04-17-at-14-19-31

02-2019-04-17-17_17_10

In fact, I can call my project successful at this point, however, I wanted to make it more finished.

In the third try, I tested the lerp function and instead of a set size, the size of the ellipses are defined by the “distance”, which allows the ellipses to become larger or smaller as I move forward and backward:

screenshot-2019-04-17-at-16-27-10

03-2019-04-17-17_17_31

04-2019-04-17-17_17_50

 

Reference:

https://ml5js.org/docs/PoseNet

The Coding Train

 

LSTM Poetry with Text-to-Speech

For the final week’s project using ml5.js, I put together a LSTM text generator with a poetry model and text-to-speech generator.

Github: https://github.com/vulture-boy/lstmPoetry
(
there are a few extra models on the Github that you can access by modifying the code slightly; just change the model folder to load)

You can check it out here: https://vulture-boy.github.io/lstmPoetry/
[The text to speech seems to be giving the webhost some issues and only works some of the time. would recommend downloading it from the GitHub]

To accomplish this, I scraped poetry from a website and followed the tutorial listed on ml5: Training a LSTM

Scraping

webscraperchrome

I used Web Scraper in Chrome in order to get the text information I needed to train the machine learning process. I needed to create a text file containing the information I wanted the algorithm to learn from, but I didn’t want to go through the laborious process of manually collecting it from individual web pages or Google searches. Using a web scraper makes the task automated by the computer. The only information that is required is a ‘sitemap’ that you can put together using Web Scraper’s interface: pick out the html elements that designate which text, links and data of interest are located to describe to the scraper how to navigate the page and what to collect.

scrapepoems

After the process is complete (or if you decide to interrupt it), you can export a .csv containing the data collected by the Web Scraper process and copy the column(s) containing the desired data into a .txt file for the training process to use.

Training the Process

In order to prepare my computer for training, I had to install a few packages to my Windows 10 Powershell, namely Chocolatey, Python3, and a few python packages (pip, Tensorflow, virtual environment). It’s worth noting that in order to install these I needed to enable Remote Scripts: by default, Windows 10 prevents you from running scripts inside Powershell for safety purposes.

Installing Python3 (inc. Powershell setup)
Installing Tensorflow

capture

Once I had the packages installed, I ran the train.py file included in the training package repository on a .txt file collating all the text data I collected via web scraping. Each epoch denotes one full presentation of the data to the process and the time/batch section denotes how many seconds passed per process. The train_loss parameter indicates how accurate the process’ prediction was to the input data: the lower the value, the better the prediction. There are also several hyper-parameters that can be adjusted to improve the quality of the result and the time it takes to process (Google has a description of this here). I used the default settings for my first batch on the poetry:

  • with 15 minutes of scraped data (3500 iterations, poem paragraphs), it took about 15 minutes to process.
  • For a second batch, I collected about 30 minutes of data from a fanfiction website (227650 iterations, sentence and paragraph sizes) and I believe it took a little over 3 hours.
  • I adjusted the hyperparameters as recommended on the ml5 training instructions for 8mb of data on another 15 minute data set containing an entire novel (55000 iterations, 360 chapters) and instead chose to run the process on my laptop instead of my desktop computer. The average time/batch was ~7.5, larger than my desktop’s average of ~0.25 with default settings. This was also going to take approximately five days to complete, so I aborted the process. I tried again using default settings on my laptop: the iterations increased from 55000 to 178200 but the batch time was a respectable 0.115 on average.

scrape

The training file on completion creates a model folder, which can be substituted for any other LSTM model.

Text-to-Speech

One of the contributed libraries for p5.js is the p5.speech library. The library is easily integrated into existing p5.js projects and has comprehensive documentation on their website. For my LSTM generator, I created a voice object and a few extra sliders to control the voice’s pitch and playback speed as well as a playback button that read the output text. Now I can listen to beautiful machine-rendered poetry!

Here’s a sample output:

The sky was blue and horry The light a bold with a more in the garden, Who heard on the moon and song the down the rasson’t good the mind her beast be oft to smell on the doss of the must the place But the see though cold to the pain With sleep the got of the brown be brain. I was the men in the like the turned and so was the chinder from the soul the Beated and seen, Some in the dome they love me fall, to year that the more the mountent to smocties, A pet the seam me and dream of the sease ends of the bry sings.

Eavesdropper

Workshop Notes #5
Eavesdropper
Ubiquitous Computing
Olivia Prior

Github
Web Application 

Eavesdropped home page
Web application “Eavesdropper” home page.

Concept

Eavesdropper is a web application that actively listens for the mention of a user’s name spoken in conversation. The application uses a voice to text API that transcribes conversations that are in listening range. The transcriptions are analyzed, and if the name of someone is said the application will sound an alert noting that that person is going to be notified through text message. Simultaneously, the clip of what was being said around the user is saved. The user then receives a text message and can go see what was being said about them.

Objective

Eavesdropper is an exploration into creating an applet on the platform If This Then That (IFTTT) using Adafruit IO. The purpose of IFTTT is to make a trigger and a response. I wanted to create an accessible and customizable web page that anyone could use as an applet. The JavaScript voice to text API analyzes what is being said throughout the space that the application is opened in. If the text detects the name is sends two pieces of data to Adafruit IO: the snippet of conversation containing the user’s name and a “true” statement. IFTTT is linked with Adafruit IO; if the channel data matches “true” then the applet sends a text message to the associated user letting them know that someone is mentioning them in conversation. The web application simultaneously uses the text to voice API to announce a message to the party that set off the trigger. This applet is simple to set up, allowing anyone to create a transcription analyzer that can notify them of anything they so wish.

Process

Building upon my previous voice to text “DIY Siri” project, I wanted to play around with the concept “what if my computer could notify me if it heard something specific?”. I initially thought that it would be interesting to build directly off of the Wolf Ram Alpha API from the DIY Siri project to notify me if something specific was searched. From here I decided that I wanted to isolate the working parts and start with the concept of “the application hears a specific word, the user gets a notification”. I chose to use names as a trigger because they are rare enough that the trigger would not be sent frequently. This is important because both IFTTT and Adafruit IO have data sending and receiving limits. IFTTT has a limit of sending up to 100 text messages a month, and Adafruit IO has a limit of updating channels 30 times a minute.

I started off by using my existing code from DIY Siri and removing any of the PubNub server integration. I then changed the code to analyze the transcripts of what was being said. If my name was mentioned, then log this information.

Iterating through the transcript to detect if the string "Olivia" was picked up
Iterating through the transcript to detect if the string “Olivia” was picked up

My next step was to connect my Adafruit IO channel to the page. I created a new feed titled “overheard” with two channels: listening, and transcripts. Listening would indicate whether or not my name was overheard, and transcripts would save whatever was being said about me.

After creating those two channels, I connected my voice to text API to Adafruit to see if I would be able to save the value “true” and the transcript of the conversation. I tested with “if my name is included in this transcript, send the data to Adafruit”. This was successful.

Upon the guidance from Adafruit, I started to create an applet of my own to connect this feed to my phone. I chose the if “this” (trigger) to be Adafruit IO, and the “then that” (action) to be an SMS text message. On the Adafruit side, I selected to monitor the feed “overheard” and the channel “listening”. If “listening” was equal to the data “true” then send a text message. The UX of IFTTT made it simple to connect the two platforms together.

 

Step 1 in IFTTT applet, find the "This" platform Adafruit
Step 1 in IFTTT applet, find the “This” platform Adafruit
Monitor the channel listening - overheard. If the channel has the value true, send an SMS trigger
Monitor the channel listening – overheard. If the channel has the value true, send an SMS trigger
Message that will be sent in the text message.
Message that will be sent in the text message.

 

I started testing my application with all of the parts now connected. At first, I was not receiving text messages. This was because I was sending Adafruit a boolean value and not a string. The “equal to” on the IFTTT side of the platform was comparing the channel value to the string “true”. I changed the value of what I was sending to Adafruit to a string and was able to receive a text message.

Screenshot of receiving multiple notifications in a row from the recursive loop.
Screenshot of receiving multiple notifications in a row from the recursive loop.

Once I received a text message, I resulted in receiving six in a row. I realized that my voice-to-text alert that played upon hearing my name was vocalizing my name out of the speakers, which in result my application was picking up. This created an infinite loop of alerts. “Alert alert Olivia has been notified that you mentioned her and received a text message”. I attempted to stop the recursive loop by turning off the voice recognition and restarting it. The issue was with each time a new voice recognition object is instantiated explicit permission from the user to have their microphone activated was required. A quick fix for this was so that I could continue development was to not use my name in the text to voice alert from my speakers. I chose to use “O Prior has been notified” rather than using my name, Olivia.

Screenshot of the recursive loop picking up the text to voice message.
Screenshot of the recursive loop picking up the text to voice message.

For the UX/UI of the application, I chose to use a simple button. When the application was not actively listening a button would appear that said “Shhhhhhh!”. If the button was clicked, a microphone permissions prompt would display requesting access. Once the application was listening to the room the entire screen would turn black to be “inconspicuous”. The stop button was designed to be black and appears if the cursor hovers overtop of the element. If the name Olivia is heard in conversation, then a .gif file plays showing someone pressing an alert button. The video and message loop twice before returning to a black screen.

Video demo of Eavesdropper

Challenges

One challenge I faced was attempting to connect two channel to the IFTTT applet. I wanted to additionally send the transcript as data through the SMS notification. The applet that was connected to Adafruit only allowed for the data of one channel to be used in the SMS. Due to the set up of the applet, I could only compare on direct values (such as greater than, is not equal too, etc.) This inhibited me from using the transcript channel as a trigger to send the message. Alternatively, I could have set up the applet so that it sent a message anytime the transcript channel was updated. With this method, I would have to be concerned with character length and substring the message to ensure that the data would not exceed the character limit for the SMS. I did not want to cut the transcript short, so I chose to use the boolean method. If the user wanted to see what was being said about them, they could investigate the transcript channel and use the time the text message was sent as an indicator for what was being said about them at that moment.

The other challenge I noted was the text to speech API. I had to use a function that checked many different iterations of “Olivia”. This included all different capitalizations of Olivia and with different punctuation. This was only an issue once so all of the iterations may not be necessary. The function that I used is incredibly useful if this program were adapted to listen for multiple keywords. The program loops through the transcript and checks for strings for a list of words that are kept in an array. The array can be modified to store whatever the user would want to be notified of in a conversation.

Next steps & conclusion

The next step for this project would be to find a way to use the data from both of the channels. Having different customized messages from the triggered conversation I think would provide a better experience for the user and would stop the application from being redundant.

This applet is an initial exploration into connecting a web application to IFTTT. Adafruit IO is a useful technology for networking Internet of Things (IoT) devices. In another iteration, it would be interesting to see how this applet could be applied to a bespoke IoT device, such as an Arduino voice to text sensor. The JavaScript voice to text API is still in beta mode, and even though the development on it is improving issues such as frequent permissions become an issue if the desired goal is continuous audio listening on a space. The JavaScript API is not a replacement for tools such as a Google Home or Alexa for this reason.

Overall IFTTT is an intuitive platform that uses simple and accessible logic that allows many people to create bespoke trigger and responses. Some limitations may be an issue if one were to have their applet on a grander scale, such as the SMS message limit of 100 per month. This web application is a simple demo of what can be achieved with lots of other potentials to adapt to more bespoke applications.

References & Research 

Configuring IFTTT using Adafruit to SMS

Function for finding multiple strings

 

ValenClick – Jingpo and April

1st Idea:

 

Code: https://github.com/jli115/GoogleHome

 

Design Concept:

 

This time we consider to do something fun with the IFTTT functions. After researching the current projects online, we found it is possible to link Google Home to “this” using Google Assistant, and then by linking “that” to webhook service in IFTTT, we can control the output by directly speaking to the Google Home.

 

The set-up of Google Home:

In “”this”:
1

2

 

Webhook Configuration:

 

3

 

The URL should be the IP address+index.php?state={{Text Field}}

We were inspired by the project, “Google Home – Control DIY Devices”. This project shows how to control multiple IoT devices using Google Home using php. However considering we were planning to use Arduino feather as the output, the configuration might be a little different. We changed the “that” service to Adafruit and expect to control the result using the toggle block in the Adafruit.

4

So far, the logic of the data flow we are planning is:

Google Home > Google Assistant > IFTTT > Webhook > PHP > Turn on/off the light

or

Google Home > Google Assistant > IFTTT > Adafruit Send Data > Adafruit > Arduino Feather Board > Turn on/off the light

 

Challenges:

  1. Arduino code

After we went over the Adafruit IO feed and dashboard basics guides we learnt in class, we all agree the most challenging part of this project is to get Arduino code working on Adafruit platform. We found online source under Adafruit learn section. It covers the basics of using Adafruit IO, showing us how to turn a LED on and off from Adafruit IO using any modern web browser.

2. Google Home

We failed to connect to the Google Home SSID from the school’s Wi-Fi settings. We guess Google Home cannot connect to public Wi-Fi.

Step 1: Adafruit IO Setup:

5

6

 

Step 2: Arduino Wiring

7

Step 3: Arduino Setup

Have the Adafruit IO Arduino library installed and open the Arduino example code.
8

We follow the tutorial step by step. When we compelling the code, it didn’t work. We keep adding libraries indicated by Arduino IDE.
9

 

 

2nd Idea:

Code: https://github.com/jli115/ValenClick

Design Concept:

Considering the timeframe of this homework, we had to change our mind towards something easier to approach and more manageable. As Valentine’s Day is at the corner, we were thinking to relate the project to it. We believe that for some people, to tell someone their love can be very hard, but also, to break up with someone they do not feel anything anymore can be even harder. There comes our project “ValenClick”, the users can send their love or no to anyone by just one click…in a funny way.

10

The interface is super clear and simple, the users just need to click the right or the left side of the screen to send the different emails to their receivers.

IFTTT configuration:

The centre of the image is about 680px.

11

Configure the left IFTTT applets:
12 13

Configure the right IFTTT applets:14 15

Test: 16

Reference:

https://www.hackster.io/phpoc_man/google-home-control-diy-devices-3be448

https://learn.adafruit.com/adafruit-io-basics-digital-output/network-config