PROJECT TITLE, SUBTITLE
Breaking the Language Barrier with Technology
Priya Bandodkar, Jignesh Gharat, Nadine Valcin
‘Silent Signals’ is an experiment that aims to break down language barriers between users across locations by enabling them to send and receive text messages in the language of the intended recipient(s) using simple gestures. The gesture detection framework is based on poseNet technologies, and the experiment uses PubNub to send and receive these messages. It is intended to be a seamless interaction where users’ bodies become controllers and triggers for the messages. It does away with the keyboard as an input and takes communication into the physical realm, engaging humans in embodied interactions. It can comprise of multiple users and is irrespective of the spatial distance between these participants.
When we first started thinking about communication, we realized that between the three of us, we had three different languages: Priya and Jignesh’s native Hindi, Nadine’s native French and English that all three shared as a second language. We imagined teams collaborating on projects across international borders, isolated seniors who may only speak one language and globetrotting millenials who forge connections throughout the world. How could we enable them to connect across language barriers by making them connect across language barriers?
Our first idea was to build a translation tool that would allow people to text one another seamlessly in two different languages. This would involve the use of a translation API such as Cloud Translation by Google (https://cloud.google.com/translate/) that has the advantage of automatic language detection through artificial intelligence.
We then thought that it would be more natural and enjoyable for each user to be able to speak their preferred language directly without the intermediary of text. That would require a speech-to-text API and a text-to-speech API. The newly release Web Speech API (https://wicg.github.io/speech-api/) would fit the bill as would the Microsoft Skype Translator API (https://www.skype.com/en/features/skype-translator/) which has the added benefit of translating direct speech to speech translation in some languages, but unfortunately that functionality is not available for Hindi.
Language A Language B
As we discovered that there are several translation apps already on the market, we decided to push the concept one step further enabling communication without the use of speech and started looking into visual communication.
Source: Emojipedia (https://blog.emojipedia.org/ios-9-1-includes-new-emojis/)
Derived from the Japanese terminology for “picture character”, the emoji has grown exponentially in popularity since its online launch in 2010. More than 6 billion emojis are exchanged every day and 90% of regular emoji users rated emoji messages as more meaningful than simple texting (Evans, 2018). They have become part of our vocabulary as they proliferate and are able to express at times relatively complex emotions and ideas with one icon.
Sign languages allow the hearing impaired to communicate. We also use our hands to express concepts and emotions. Every culture has a set of codified hand gestures that have specific meanings.
|American Sign Language
Source: Carleton University (https://carleton.ca/slals/modern-languages/american-sign-language/)
Source: Social Mettle (https://socialmettle.com/hand-gestures-in-different-cultures)
We also simultaneously started thinking about how we can use technology, as the three of us shared a desire to make our interactions more intuitive and natural.
“Today’s technology can sometimes feel like it’s out of sync with our senses as we peer at small screens, flick and pinch fingers across smooth surfaces, and read tweets “written” by programmer-created bots. These new technologies can increasingly make us feel disembodied.”
Paul R. Daugherty, Olof Schybergson and H. James Wilson
Harvard Business Review
This preoccupation is by no means original. Gestural control technology is being developed for many different applications, especially as part of interfaces with smart technology. In the Internet of Things, it serves to make interactions with devices easy and intuitive, having them react to natural human movements. Google’s Project Soli, for example, uses hand gestures to control different functions on a smart watch.
Some of the challenges in implementing this approach to technology is that there is currently no standard format for body-to-machine gestures and that gestures and their meanings vary from country to country. For example, while the thumbs up gesture pictured above has a positive connotation in the North American context, it has a vulgar connotation in West Africa and the Middle East.
The original concept was a video chat that would include visuals or text (in the user’s language), triggered by gestures of the chat participants. We spent several days attempting to use different tools to achieve that result before Nick Puckett informed us that what we were trying to achieve was nearly impossible via PubNub. This left us with the rather unsatisfactory option of the user only being able to see themselves on screen. We nevertheless forged ahead with a modified concept that had these parameters:
- Using the body and gestures for simple online communications
- Creating a series of gestures with codified meanings for simple expressions that can be translated in 3 different languages
Source: ml5.js (https://ml5js.org/reference/api-PoseNet/)
We leveraged the poseNet library, which is a machine learning model that allows for Real-Time Human Pose Estimation. It tracks 17 nodes on the body using the webcam and creates a skeleton that corresponds to human movements. By using the node information tracked by poseNet, we were able to define the relationship of different body parts to one another, use their relative distances and translate that into code.
poseNet tracking nodes
As we continued to develop the code, we soon realised that poseNet tracking seemed rather unstable and at times finicky, as it was purely based on the pixel-information it received from the camera. The output fluctuated as it was based on several factors such as the lighting, contrast of clothing, background and the user’s distance from the screen. Consequently, it meant that the gesture would not always be captured if these external factors weren’t acting favourably. Dark clothing and skin seemed to be particularly problematic.
We originally had 10 gestures coded, but the challenge of integrating them all was that they sometimes interfered or overlapped with the parameters of one another. To avoid this, we developed 5 in the prototype. We had to be mindful of using parameters that were precise enough to not overlap with other gestures, yet broad enough to take into account the fact that different body types and people would perform these gestures in slightly different ways.
Since there are very limited resources dealing with p5.js and PubNub, we had difficulty in finding code examples to help us resolve some of the coding problems we encountered. Most notably amongst these was managing to publish graphic messages we designed (instead of text), that would be superimposed on the recipient’s interface. We thus only managed to display graphics on the sender’s interface and send text messages to the recipient.
CODE ON GITHUB
- Participants expressed that it was a unique and satisfying experience to engage in this form of embodied interaction using gestures.
- The users were appreciative of the fact that we developed our own set of gestures to communicate instead of confining to existing sign languages.
We would like to complete the experience by publishing image messages to recipients with corresponding translations rather than have the text interface.
Oliveira, Joana. “Emoji, the New Global Language?” In Open Mind https://www.bbvaopenmind.com/en/technology/digital-world/emoji-the-new-global-language/. Accessed online, November 14, 2019
Evans, Vyvyan. Emoji Code: the Linguistics behind Smiley Faces and Scaredy Cats. Picador, 2018.
https://us.macmillan.com/excerpt?isbn=9781250129062. Excerpt accessed online, November 15, 2019
Schybergson H, Paul R. Daugherty Olof, and James Wilson. “Gestures Will Be the Interface for the Internet of Things.” in Harvard Business Review, 8 July 2015, https://hbr.org/2015/07/gestures-will-be-the-interface-for-the-internet-of-things. Accessed online November 12, 2019
Oved, Dan. “Real-time Human Pose Estimation in the Browser with TensorFlow.js” in Medium. 2018.
https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5. Accessed online November 10, 2019.