Web Audio – All Aboard!

In this post I will talk about Mozilla’s Web Audio Data API. I will cover where we have come from, demonstrate some of the incredible results that have already been achieved; I will talk about why audio in the browser is so important, take a look at where we are headed and explore some of the ways in which we hope to reach our goals.

Buckle Up!

It is difficult to know just where to begin. The ground that has been covered by so few, in so short a time is truly remarkable. Web audio has literally jolted into motion like the cars on a white-knuckle fair ground ride. Audio in the browser has gone from a simple tag in a web page for playing sound, to a usable Data API allowing JavaScript to read and write audio between the browser and the sound card.

This means being able to read and visualize audio data from a song in the web-page without any plug-ins like Flash. It means generating music and speech based on dynamic web-site content. It means blowing the doors of research, creativity and accessibility wide open, for everyone!

For now, this strange and sudden velocity remains an experimental experience for those who dare ride the crazy roller-coaster of innovation, but the potential creative and technical energy building as we climb to the top is becoming harder and harder to ignore.

HISTORY

Out of The Blue

At the true beginning of this creative branch, stands a man commissioned to sound the drum beat “Upgrade the Web”. Chris Blizzard is very good at connecting people. It was his ability to inspire and align people against a common goal, that helped form the vibrant Processing.js community; a highly charged super-community throwing out wide arching sparks innovation like furious Tesla Coil of creativity.

Back in early 2009, before the public release of the

Champion Energy is the unworldly force generated by a person who has the stubborn tenacity to latch on to an idea, and to push on it so hard that it breaks the very fabric of reality, allowing them to displace the future, and pull it ever closer towards them. In the Processing.js Channel, we respectfully refer to this force as “David Humprey“.

The Right Kind of Wrong

Dave Humphrey is the wrong man for the job. He clearly knows precious little about audio and music theory. But anyone who’s knows Dave will confirm his unspoken mantra: ‘What is not, exists to be conquered.’ visa-vi ‘The gaps are there to be filled.’

It was with this spirit, in the Winter of 2009, that Dave insisted, “Al, I am going to make audio data available to JavaScript in the Browser”. Doubtedly I responded, “Hmmmm, OK. I think it is going to be a lot harder than you anticipate Dave.”

Within a couple of weeks, Dave had me downloading patches for Firefox and building versions of Minefield that spat out audio data into the console. Not only was I taken aback… I was excited. Very, very, veeeeery excited!

Having studied music technology at College and spent the first phase of my career working in music and video studios mastering audio and writing score for documentaries, I had always dreamed of having fine control over raw audio in a language as simple as JavaScript. In two weeks, some guy who knew very little about audio, magically produced an array of seemingly random numbers. I was quickly able to confirm these numbers were exactly the right kind of data we needed!

The World’s First Everything

It was not long before things became fun. We were already creating Canvas visualizations using audio data, and by early January Dave had opened the door to generating audio signals that could be played out over the sound card. I used the colloective code thus far, to create the World’s first JavaScript generated music, completing the synthesis cycle, taking math and turning it into music in the browser, without any plug-ins.

The convergence of technical skill and creative talent in the Processing.js channel gave rise to a swarm of intense activity. We rapidly built against the new functionality, visualizing audio from video streams, analyzing the frequency of audio with fast fourier transforms and building subtractive audio synthesizers -to mention but a few.

Not only were the right people and the right skills converging, the right technologies were also merging. Recent web technologies like JIT, WebGL, Canvas and SVG Filters were multiplying the awesomeness of our online applications quicker than a 70′s arcade power-up. After the pre-release success of WebGL, adding audio data to the web page is to many (this many being Flash developers I have encountered while giving talks on HTML5), the final nail in the coffin for black-box browser technologies like Flash.

Hacks, Meets, Demos & The W3C

We recently held the first Processing.js Panel in Boston at The Bocoup Loft. Ben Fry (Processing), John Resig (jQuery), Chris Blizzard(Mozilla), David Humphrey(Seneca) and Corban Brook (Processing.js) discussed the Processing.js project at length, and the fun did not end there. Mozilla stuck around with a few of their graphics boffins, who worked with the Processing.js Team from Seneca, to fix performance bugs we discovered while working on the project. To see a browser vendor like Mozilla get behind an open source library and reap the rewards was greatly edifying and demonstrated an important method for sustainability in the open source movement.

We were fortunate to have Doug Schepers from the W3C SVG Working Group at the Processing.js Panel 2010. We demo-ed the work thus far to Doug. Immediately he recognized the creative potential and accessibility use-cases for audio data in the browser, Doug began to set the processes in motion for an Audio Data Incubator Group at the W3C.

I am writing this post as I sit in the lobby of the Marriot hotel in North Carolina with David Humphrey, Kathy Leung and Andor Salga from Seneca. Yesterday we gave talks on WebGL and the Audio Data API at The www2010 conference.

It was great to see the response to our work. The smiles and nods as we demonstrated JavaScript Audio Software Development were not only encouraging, but served as an indicator that people understood the concept and approved of the use-cases for Audio Data access in the browser. I want to show you some video captures of the live demos we shared.

DEMO FRIDAY!

HTML5 3D FFT Visualization with CubicVR

Charles Cliffe created this excellent WebGL FFT visualization using the CubicVR 3D Engine which he first wrote in C, and then ported to JavaScript…because he’s the man. It reads the audio data without any plug ins using Mozilla’s Audio Data API and converts the data to… PURE AWESOME!

WebGL HTML5 JavaScript Beat Detection

As if somehow his first swing at awesomeness didn’t hit home hard enough, Charles Cliffe went and added beat detection in JavaScript. Watch JavaScript lock in to the beat and turn the WebGL cogs in time with the music. Yes… the web is this awesome!

Nihilogic’s HTML5 Audio-Data Visualizations

Truly stunning canvas-based audio-data Winamp style visualizations created by Jacob Seidelin using our audio data API. See: http://blog.nihilogic.dk/2010/04/html5-audio-visualizations.html for more details.

HTML5 Audio with Realtime Javascript Low Pass Filter

An excellent demonstration by Corban Brook, showing how JavaScript can be used to filter audio signals. The JavaScript reads from first (muted) audio tag, passes through a low pass IIR Filter (also implemented in Javascript), and then Writes processed signal to a second audio tag, all in real time, frame by frame… SLICK!

In-browser Synthesizer and Sequencer with Envelope and Filter control

Corban’s In-Browser Sequencer demo uses his new JavaScript DSP library ‘PJSAudio’ (http://github.com/corbanbrook/pjsaudio) to generate tones and sequence musical patterns in the open web. The PJSAudio Oscillator module generates a signal which is sent through a Low Pass Filter and ADSR Envelope and then written to the computer’s sound card through the HTML5 audio tag. Is there no stopping this guy!?

“Bloop” An HTML5 Instrument inspired by Brian Eno’s Bloom App. for the iPhone

This is a demonstration of a simple touch screen video instrument I made with Processing.js. The instrument is built using the Firefox web browser with no plug ins, inspired by Brian Eno’s Bloom application available on iPhone. I used Firefox’s Audio Data API to generate tones with JavaScript and push them to the audio stream of the sound card. The key of the music changes at set intervals and creates a loop buffer, to repeat the notes just played.

Thanks to NotMasterYet & Corban Brook for crucial tweaks to my buffering code, you guys rock!

HTML5 Touch-screen Video Instrument in Firefox

My demo would not have been complete without the ability to create sound by touch. In this example I am rendering black squares on a white background so the touch points are visible on the low-quality web cam. Apart from the styles, the code is the same as the example above. I am using the HP TouchSmart TX2 with Felipe Gomes’ Firefox Touch API.

Are We There Yet?

YES! The demos above are not the work of a commercially funded research group. The work has been done by members of the Processing.js community in the spare minutes and the small hours. We have traveled all the way from an idea to an API in a matter of months. We have proven that web technology is ready to deliver audio data access. We have experienced the steady influx of the audio-visual community to the Processing.js chanel, a community who’s thirst for simpler APIs that can break cross-platform boundaires is being quenched by the open web.

If this is as exciting to you as it is to me, and you would like to visualize audio data and/or generate audio signals in Firefox, connect to server irc.mozilla.org and join the #Processing.js channel. You can read more about Mozilla’s Audio Data API at: http://wiki.mozilla.org/Audio_Data_API.

WHY AN AUDIO DATA API IS SO IMPORTANT

The current web applications specification is broad, and clearly powerful. But it will never be complete without a greater granularity of control over audio. And here is why…

The web is not just about the sharring of scientific papers. The web is more than just the commercialization of information into subscription services. The web is bigger than email. The web is more than the sum of its technical parts. It attempts to capture and engage all of our human knowledge, spirit and creativity; in all of its ugliness, in all of its beauty and in all of its functionality.

A web browser that allows for such fine granular control over video graphics using tools like Canvas and WebGL, yet provides no equivelent control over audio data, is a web browser that is very lopsided. In human terms, web browsers have always been very lopsided. They reflect a specialized facet of ‘the human requirement’. This is unfortunate as the web can potentially encompass a far more balanced and expressive set of features, encapsulating our humanity. Fortunately the modern movement towards a more human browser, appears to have gained significant velocity… in the right direction. (though many may disagree)

Audio Data Use-Cases

Allowing for finer control over audio data unlocks previously hidden potential for the browser. The number of great use-cases for an Audio Data API in the browser are simply staggering. I threw together some ideas into the following use-case list. I am sure there are a few obvious cases I have missed and many creative cases I would not have dreamed up in a million years. If you have ideas I can add to this list, I would love to hear from you!

Music / Sound Effects

  • Synthesis
  • Composition, Sequencing
  • Analysis
  • Games
  • Signal Processing
  • Audio Effects, Reverb, EQ
  • Surround Sound Data Audition
  • Web-DJing
  • Teaching
  • Web-Instruments
  • Online Music Editing
  • Website Ambience
  • Radio Streaming
  • Wave-tracing, bouncing audio off 3D objects for simulations and games
  • Remote AJAX Musical Performance Groups

Video Processing

  • Speech to text sub-titles
  • Frequency spectrography
  • Vocal frequency enhancement
  • AV Mixing
  • Rear speaker surround signal processing (I will demo this in another post)
  • De-ess, limit, compress poorly mastered content on the fly
  • Adjust levels of cinematic sequences compensating for the listener’s environment
  • Noise Removal

Accessibility

  • Text to Speech Synthesis
  • Vocal help in web applications
  • Exporting Wikipedia pages to MP3 for external device playback
  • Speech recognition from a microphone source
  • Rendering 3D Worlds to audio for the blind
  • Audible cues for rolling over web forms
  • Speech & audio based games
  • Event cues such as DOMContentLoaded, page loading, etc
  • Audible IDEs for visually impaired developers
  • Accessible error handling events

Implementation Time-Window

You have seen in the demos above that JavaScript is fast enough to handle real-time frequency analysis, quick enough to generate music on the fly and can be combined with technologies like WebGL without melting your CPU. Web technology is ready.

Let that really sink in for a minute.

“Web technology is ready.”

  • Hardware devices are already more than powerfull enough to process audio data.
  • Browser software is smart enough to JIT compile code, bringing JavaScript performance close to C++.
  • Internet users seek web services that leverage the latest web-application APIs in the browser, because they work without any downloads or installs.
  • Commercial enterprises are realizing the financial advantage of developing applications for a web platform, a platform on which the APIs are well documented, view-source-able and quick to implement.
  • Browser vendors are taking standards seriously and paying attention to innovative open source communities.

While discussing The Audio Data API, Doug Schepers pointed out that the business of specification and implementation involves focusing one’s energy in the right time as well as in the right place. At present, the opportunity to innovate, specify and implement are aided, at least in part, by community and financial pressure on the browser vendor to implement standards.

The outcome of WWWW2 (The second world wide web war, Microsoft vs. Netscape) was a steep decline in standards implementation. Why? Because Microsoft had nobody with which to compete. After-all, it makes no business sense to expend resources by adhering to an alien standards-body, when Microsoft’s users were mute and their competition burnt to ashes.

Fortunately Microsoft are not in that position today. Thank God! They have mechanisms in place to listen to their users, they care about standardizing their browser and they are delivering higher standards of work with each iteration. I mean… come on… they are actually iterating now!!

This shifting force of competition between browser vendors, sends technoclysmic combobulations of awesome-source-energy pulsating and colliding through the fiberous nervous system of the common electronic consciousness. What I mean to say is: the competition between browser vendors to ‘make the grade’ fosters innovation across the web.

But as Doug rightly points out, this may not always be the case. What happens if another ‘for-profit’ giant gains almost complete dominance over the browser market? The thought is sobering.

And so…

I believe this is a crucial point in the history of the web. I only hope the developer community will remain focused, work hard and finish what it started. The current web applications specification is powerful. But it is incomplete without the public ability to read and write audio data in the browser.

The technology is ready,

The community is thirsty.

GOALS

As web developers, what are the technical requirements we are looking for from an audio application interface in the browser? The simple API we have drafted is powerful enough to support the full range of audio applications in the web. The API is extremely light-weight, making browser implementation relatively trivial.

So far there are five basic components that make up the Audio Data API.

  1. Reading from an audio source
  2. Writing to an audio source
  3. Reading the buffer state
  4. DOM event model
  5. C++ FFT Spectrum Analysis

A more technical description of the API can be found: here, including the methods used to access audio data, as well as download-able builds of Firefox+audio (these will not over-write your existing Firefox installations). A list of tests and demonstrations to get you started can also be found at the bottom of the page.

Holding The Door Open

As a community of open source hackers, we want to keep both the initiative and the API 100% open. This openness will allow the implementers of every browser vendor to be ‘on the same page’. We would do well to learn from the mistakes of the web’s past and work closely together, not as members of warring browsers tribes, but as a community of audio enthusiasts. We welcome developers working on the source of Chrome, Safari, Opera, Internet Explorer and Firefox to get involved, experiment, share results and work towards creating a rock-solid API that will enrich the web for many years to come.

If we are going to develop one web-standard together… the right way, let it be the inherently collaborative domain of audio, music, and speech synthesis.

With A Little Help From Our Friends

Audio innovation on the web needs to be about more than just an army of web geeks. We need musicians, audio engineers, game developers and accessibility pioneers to join forces with web developers and drive this effort forward. A strong community possessing a broad range of specializations will help to define existing best practices in audio software and will help keep the momentum of the Audio Data API from getting caught in the red tape of standardization before it is ready to launch.

A CALL TO ACTION

Not only is this space important, it is exciting. The Audio Data API is largely un-chartered territory with hundreds of ‘firsts’ unclaimed. So many APIs, libraries and software applications remain un-written. If you are interested in bringing audio data to the web, why are you waiting for someone else to get there before you?!

PerformersCome develop the first web based networked musical instruments. Start remote music groups who’s members can tour in different cities.

Audio EngineersYour expertise is crucial in making the web sound right in the subtle levels that most people can not even hear.

Software Engineers & Game DesignersUse your existing programming knowledge of best audio practices and let your creativity flow in the exciting experimental world of online audio synthesis and web based games.

DJs, VijaysCome and create visualizations and web-based music mixers that run content from your servers at home. Stop rushing to set up, just plug in your laptop, click on your web browser and jam!

Web DesignersCreate beautiful sound-scapes to capture the attention of your users. Add subtle sound effects to give your web site an un-unprecedented level of depth.

The technology is ready,

The community is thirsty.

THE FUTURE OF WEB AUDIO @ THE BOCOUP LOFT

Bocoup is holding an event called The Future of Web Audio at the Bocoup Loft on the 12th of May from 6:30pm to 10pm. We will be serving free pizza and beer. This event is aimed at audio software developers, game developers, audio engineers and whoever else is interested in the Audio Data API. We will be demonstrating the progress so far with live code examples and will open up the floor for discussions. Come voice your opinions on audio development, give suggestions, heckle your creative ideas and words of wisdom that will be invaluable to the implementation of audio standards in the web for years to come.

We look forward to seeing you!

– F1LT3R (Alistair G. MacDonald)

Comments

Contact Us

We'd love to hear from you. Get in touch!

Phone

+1 617-379-2752

Mail

P.O. Box 961436
Boston, MA 02196