The research idea is inspired from Dushan’s Project 2, the sound drawing project. Also, there are many sound visualization projects which sound could be transferred to images. I was thinking about if there is a way could transfer images back to sound accurately so that we can achieve a two-way communication between sound and images. The research shows that sound and images could be translated through audio frequency and sound spectrum. In order to solve the problem, we are still facing three challenges or steps.The research presentation conducts research in each of the step and collects the useful information and tools. After the research, it shows there is possibility to make the idea come true.
The research idea is inspired from Dushan’s Project 2, the sound drawing project. Also, there are many sound visualization projects which sound could be transferred to images. I was thinking about if there is a way could transfer images back to sound accurately so that we can achieve a two-way communication between sound and images.
What is Sound Visualization?
Music visualization is a feature found in electronic music visualizers and media player software, generates animated images based on a piece of music. The images are usually generated and rendered in real time and synchronized with the music as it is played.
Visualization techniques range from simple ones to elaborate ones. The changes in the music’s loudness and frequency spectrum are among the properties used as input to the visualization.
Brief History of Sound Visualization
The first electronic music visualizer was the Atari Video Music introduced by Atari Inc. in 1976, and designed by the initiator of the home version of Pong, Robert Brown. The idea was to create a visual exploration that could be implemented into a Hi-Fi stereo system. It is described in US 4081829. Music visualization was first pioneered in Great Britain by Fred Judd.
Music and audio players were available on early home computers, Sound to Light Generator (1985, Infinite Software) used the ZX Spectrum’s cassette player for example. The 1984 movie Electric Dreams prominently made use of one, although as a pre-generated effect, rather than calculated in real-time. One of the first modern music visualization programs was the open-source, multi-platform Cthugha (1994).
Subsequently, computer music visualization became widespread in the mid to late 1990s as applications such as Winamp (1997), Audion (1999), and SoundJam (2000). By 1999, there were several dozen freeware non-trivial music visualizers in distribution.
In particular, MilkDrop by Ryan Geiss, G-Force by Andy O’Meara, and Advanced Visualization Studio (AVS) by Nullsoft became popular music visualizations. AVS is part of Winamp and has been recently open-sourced, and G-Force was licensed for use in iTunes and Windows Media Centerand is presently the
flagship product for Andy O’Meara’s software startup company, SoundSpectrum. The real distinction between music visualization programs such as Geiss’ MilkDrop and other forms of music visualization such as music videos or a laser lighting display is a visualization program’s ability to create different visualizations for each song every time the program is run.
- 1. Converting sound to frequency spectrum
- 2. Converting frequency spectrum to images
- 3. Converting images/ photos back to sound
Introduction to Audio Frequency, Spectrogram and FFT
An audio frequency is characterized as a periodic vibration whose frequency is audible to the average human. It is the property of sound that most determines pitch and is measured in hertz (Hz). The generally accepted standard range of audible frequencies is 20 to 20,000 Hz, although the range of frequencies individuals hear is greatly influenced by environmental factors. Frequencies below 20 Hz are generally felt rather than heard, assuming the amplitude of the vibration is great enough. Frequencies above 20,000 Hz can sometimes be sensed by young people. High frequencies are the first to be affected by hearing loss due to age and/or prolonged exposure to very loud noises.
A spectrogram, or sonogram, is a visual representation of the spectrum of frequencies in a sound or other signal as they vary with time or some other variable. Spectrograms are sometimes called spectral waterfalls, voiceprints, or voice grams. Spectrograms can be used to identify spoken words phonetically, and to analyze the various calls of animals. They are used extensively in the development of the fields of music, sonar, radar, and speech processing, seismology, etc. The instrument that generates a spectrogram is called a spectrograph. The sample outputs on the right show a select block of frequencies going up the vertical axis, and time on the horizontal axis.
A fast Fourier transform (FFT) is an algorithm to compute the discrete Fourier transform (DFT) and its inverse. A Fourier transform converts time (or space) to frequency and vice versa; an FFT rapidly computes such transformations. As a result, fast Fourier transforms are widely used for many applications in engineering, science, and mathematics. The basic ideas were popularized in 1965, but some FFTs had been previously known as early as 1805. Fast Fourier transforms have been described as “the most important numerical algorithm[s] of our lifetime.
3D Spectrum of Sound
Step 1. Converting sound to frequency spectrum
We can achieve this step by utilizing many existing spectrum analyzers.
A spectrum analyzer measures the magnitude of an input signal versus frequency within the full frequency range of the instrument. The primary use is to measure the power of the spectrum of known and unknown signals. The input signal a spectrum analyzer measures is electrical, however, spectral compositions of other signals, such as acoustic pressure waves and optical light waves, can be considered through the use of an appropriate transducer. Optical spectrum analyzers also exist, which use direct optical techniques such as a monochromator to make measurements.
Step 2. Converting frequency spectrum to images
The minim library in Processing contains a lot of examples that could convert the sound frequency and related format of frequency, such as DDT and FFT into images.
Step 3. Converting images/ photos back to sound
How it works?
References & Links
GRID multi-touch sound visualization : https://vimeo.com/26226875
Processing work of sound visualization : https://vimeo.com/58704425