Jason Freeman recently created Piano Etudes, an audio-enabled GWT application integrated with Flash. We're happy to say that Jason had a chance to stop by to talk about the development strategy he used to add audio to Piano Etudes. Check it out below.
As a faculty member at the Georgia Tech Center for Music Technology, I recently developed Piano Etudes, a musical application built with GWT, along with a student of mine, Akito Van Troyer. Since Piano Etudes is about music, audio was a top priority for us throughout the development process. In this post, I’d like to share what we’ve learned from our experiences with audio and GWT and explain how we implemented audio functionality.
There is currently no built-in audio support in GWT; nevertheless, it’s fairly simple to add audio to a GWT app. Several audio APIs for GWT exist in various stages of development, including Fred Sauer’s gwt-voices, CodeLathe’s GWT SoundManager, Jeffrey Miller’s gwt-sound, and the sound classes in the GWT Incubator. Most of these libraries in turn rely upon the Flash Sound API instead of the inconsistent audio support found natively in web browsers.
For Piano Etudes, we found it easier to build our own GWT classes to support the audio features we needed, instead of using one of the APIs listed above. We wrote JSNI methods to access Scott Schiller’s SoundManager2 Javascript Sound API. SoundManager2, which similarly relies upon Flash, is a clean and elegant API that Scott has been fanatically supporting and updating. A couple of the GWT audio libraries also use SoundManager2, though they only support a subset of its API.
Regardless of how you choose to implement audio in your GWT app, we've discovered a few key principles while working on Piano Etudes that you may want to consider:
Because audio support in GWT is rapidly evolving, the API you choose to use today may not be the one you use a year from now. So it’s essential to wrap your audio functionality in your own audio classes. Then if you change implementations later, you’ll only have to tweak a few classes instead of auditing your entire code base.
For Piano Etudes, we created two GWT Java classes to handle all audio functionality. These are the only classes in our code base that assume a particular audio implementation and are also the only classes that contain JSNI methods.
One class, called SoundManager, includes static JSNI methods for initializing SoundManager2 and configuring its global parameters.
For example, Soundmanager.setDebugMode() will enable or disable debugging:
Soundmanager.setDebugMode()
public native void setDebugMode(boolean b) /*-{ $wnd.soundManager.debugMode = b; }-*/;
Each instance of our other class, called Sound, represents a single audio file to be played back in GWT. This class wraps methods such as play, pause, stop, getting and setting panning, and getting and setting playback position.
It was important for us to hide SoundManager2’s implementation details from the rest of our GWT code. Both SoundManager2 and parts of our GWT application code have their own bookkeeping, so there is potential for bad cross usage if the code isn't kept separate. For instance, SoundManager2 requires that an ID string be assigned to each sound object, and it uses that ID in many of its methods. Our Sound class handles this ID string internally and privately. The constructor method generates a unique ID string for each instance of the Sound object:
public Sound(String fileName) { soundID = CustomIDGenerator.getUniqueId(); // additional initialization code }
The Sound object then uses this ID in other methods. For example:
public native void stop() /*-{ $wnd.soundManager.stop(this.@net.jasonfreeman.pianoetudes.client.sound.Sound::soundID); }-*/;
There is no need for any other application code to be aware of this ID string; they are able to use instances of Sound in a more object-oriented, implementation-neutral manner.
In Piano Etudes, the timing of audio events is important. Typically, each measure of moveable music is stored as a separate audio file, so when audio files do not play back at the correct times, there is an audible gap or jump between measures of music. Audio timing is similarly important in applications where sound effects are triggered by user actions or where multiple audio files must play back together in sync.
Neither Javascript nor Flash are famous for precise timing, so there is an upper bound on timing precision within GWT as well. But there are a few strategies that do improve precision:
Both Flash and GWT have limited capabilities when it comes to audio. You can play sounds, jump around within sounds, combine them together, query waveform data, and perform other basic tasks. Newer versions of Flash do support some sample-level manipulation of audio data, but it would be tough to use these features from JavaScript or, by extension, from GWT. (In fact, it’s tough to use these features at all in their present form.)
So what if your GWT app requires advanced audio features such as sample-accurate timing, sound synthesis, digital signal processing, or mixdowns to audio files? You could implement audio functionality in a Java applet instead of through Flash, taking advantage of a Java-based audio API such as JSyn, JavaSound, or Minim to access additional features unavailable in GWT. But if you’re like me, the headaches associated with Java applets are one of the reasons you’re using GWT in the first place, so that is not an enticing solution.
Fortunately, there is another alternative: render your audio on the server. This successfully avoids the limitations of client-side systems, though it does introduce other issues related to scalability, latency, and bandwidth. While it’s not a universal solution, it is a valuable strategy for your toolbox.
Here’s a simple use case: in Piano Etudes, we wanted users to be able to download the music they created as an MP3 file. In GWT and Flash there is no obvious way to render an audio file to disk or to convert it into an MP3. So instead, we handle this task server side.
When a user clicks a button to download the MP3 of their music, GWT sends a representation of that music to the server; for each audio track, it describes the audio clips in the track and the timings of those clips. We post the data as JSON via HTTP to a PHP script, but this could be done just as easily via any other client-server communication paradigm supported in GWT. Here’s some code for the client side:
FormPanel mp3Form = new FormPanel("_blank"); HorizontalPanel panel = new HorizontalPanel(); Hidden mp3Data = new Hidden("data", timeline.getAsJSON()); panel.add(mp3Data); mp3Form.setWidget(panel); mp3Form.setAction(URL.encode(SERVER_SCRIPT_URL)); mp3Form.setMethod(FormPanel.METHOD_POST);
This code snippet submits a JSON array to the server representing the audio clips and timings for each track of music. The JSON data is sent as a hidden field in an HTML form. The result obtained from the server (the MP3 file) is returned to a new web browser window, keeping the GWT app open in the current window.
When the server-side script receives the data from the client, it processes that data to create and run a simple script in a computer-music language called RTCmix. (Classic scripting languages for computer music, such as CSound and RTCmix, are great for rendering audio server-side.) The audio file output by the RTCmix script is then converted to an MP3 using LAME, and that file is returned to the client browser, which downloads it. Here is a simplified excerpt from our PHP script (our entire script is only 100 lines of code):
// initialize the rtcmix script $audio = tempnam("/tmp", "PianoEtude"); // output file $script = "set_option(\"AUDIO_OFF\", \"CLOBBER_ON\")\n"; // non-real-time $script .= "rtsetparams(44100, 2)\n"; // sampling rate and number of output channels $script .= "rtoutput(\"$audio\")\n"; // output to temp file $script .= "load(\"" . $CMIX_LIB_PATH . "\")\n"; // library path for rtcmix // create an rtcmix command to place each audio clip $timeline = json_decode($data); foreach ($timeline as $track) { $insertPoint = 0; foreach ($track as $soundObject) { $soFileName = $soundObject[0]; $soDur = $soundObject[1]; $script .= "rtinput(\"$soFileName\")\n"; // insert the audio clip into the output audio file $script .= "STEREO($insertPoint, 0, DUR(), 0.5, 0.5, 0.5)\n"; $insertPoint += $soDur; } } // execute the rtcmix script $scriptFile = tempnam("/tmp", "script"); $handle = fopen($scriptFile, 'w'); fwrite($handle, $script); fclose($handle); $cmd = "$rtcmix < $scriptFile"; $result = shell_exec($cmd); // convert to MP3 $mp3File = tempnam("/tmp", "mp3"); $cmd = "$lame $audio $mp3File"; $result = shell_exec($cmd); // return result readfile($mp3File);
You can also use client-side and server-side audio together. For example: instead of just having the web browser download this server-generated MP3 file to disk, our GWT app could use a client-side audio API to play it back inside of the app.
By using a variety of GWT or Javascript APIs to connect with the Flash Sound API, you can add rich audio features to your GWT apps; for more advanced functionality, GWT apps can communicate with a server to render audio remotely. Regardless of the exact implementation(s) you choose, keeping your GWT app optimized and your audio code structurally isolated will ensure that audio performs well in your app and can incorporate future improvements to audio support in the browser