DIY Siri

DIY Siri

Ubiquitous Computing Process Journal #4
GitHub
Website Application
By Olivia Prior

DIY Siri Web application screen shot
DIY Siri web application screenshot

Concept

DIY Siri is a web application that uses a combination of the JavaScript built text to speech functionality and Wolf Ram Alpha API in an attempt to answer questions asked through voice and respond back through sound. The web application prompts the user to ask a question by “speaking out loud” and is given a “speech to text” response in return. The web application is open to all questions being asked, so the user is able to view and listen to the responses given by anyone else using the application in parallel. This web application is a small demonstration of how the ubiquitous smart devices that occupy households can be accessible for customization.

Objective

With this assignment, I was curious about how does one actually construct their own Siri, Alexa, or Google Home through readily available APIs online. In another class, I had explored the JavaScript “Text to Speech” functions through continuous listening. I found that that the JavaScript functionality was clunky and would time out. When introduced to this subject of API’s in class I thought it would be an interesting experiment to see if 1) If using explicit commands such as start and stop buttons would lend itself to a smoother use of the JavaScript functions and 2) is it really that simple to connect JavaScript to an “all knowing” API like Wolf Ram Alpha to craft a simple at home assistant?

Tools & Software

– Text-editor of your choice
– Wolf Ram Alpha API developer key
– PubNub Account for connecting the web page to the Wolf Ram Alpha API
– JQuery JS

Process

The base code I used for the JavaScript is followed closely to this tutorial that demonstrates the many different types of uses that could be applied. The tutorial shows the ability to start a recording, document the voice to text as someone is speaking, pause/stop the recording, and save the recording to the local notes. Once the recording is saved to the local notes, the user has the ability to “listen” to those notes through a browser audio-based speech to text function. 

My first step was to isolate the steps of “I am talking and asking a question” to “I need to send this question off into the ether”.

I created a speech recognition object and two buttons: the first was a record button and the second was a stop. I made the button visibility toggle for UX purposes: the user can only press ask or stop.

Video demo of toggling the button 

The first button was actively listening through the JavaScript function “.start()” which is an available function from the speech recognition object.

Screenshot of instantiating the screen recognition object.
Screenshot of instantiating the screen recognition object.

The second button had an on click event that executed “.stop()”. The start would transcribe what the user was saying and input it into a text area on the page. This worked surprisingly well. When I pressed stop, the microphone would turn off.

The next step was tying it into the Wolf Ram Alpha code we had used with p5.js in class. I took the code that had my PubNub module that connected my account to the Wolf Ram Alpha Developer API. I took the value of what I had transcribed into the text area on the page and sent it as a message through PubNub. Just as if I was typing what I was saying, I received a message response to my question that was from Wolf Ram Alpha. I outputted the message onto the web page.

My next step was to connect the speech to text so that the answer could playback audibly. I created a new object “SpeechSynthesisUtterance” which is a standard JavaScript object. I passed the message from Wolf Ram Alpha into the function .speech() and the browser responded with the answer to my question.

Screenshot of the text to speech object being instantiated.
Screenshot of the text to speech object being instantiated.

DIY Siri Demo

Challenges & Next Steps 

Upon testing this web application, I sent a link to my friend in a different province. I was fixing my CSS and all of a sudden heard my browser start talking. The way I had set up my PubNub server on the web application was that everyone who had access to the application had the ability to listen to whatever was being asked. Initially, I started to fix the issue. Upon reflection, I realized that the ability to listen to whatever anyone is asking brings up interesting connotations of security and surveillance, especially in an open source concept. I decided to keep this the same and to test it out in class and see what the reception would be from my classmates.

A next step I considered was to store previously asked questions so that the user could quickly press and re-ask if needed such as “what is the weather”, etc. Once I had encountered that my web application was able to listen to anyone’s question, anywhere, I decided that this was more of a listening tool rather than an asking application. If I were to enforce individual privacy on this application, I would consider storing the frequently asked questions in the local browser storage.

Since I was attracted to the idea of listening, I think I would make it more apparent that multiple people were on the application and asking questions. It makes it a much more collaborative experience and could be elevated to a more polished art piece. Currently, this application lays in the world between tool and commentary and needs refining touches on either spectrum to make it more of a complete experience. Until then, this is a simple, basic, DIY Siri that allows you to ask questions through voice.

References and Resources

 

Speech to Text Tutorial 

Mozzila Speech to Text Documentation 

Leave a Reply