Pure Data, Android audio, and random stuff

Noisepages: Websites for smart artists.

I am happy to announce that my new OpenSL library, opensl_stream, is now one of the officially recommended resources on the Android High Performance Audio page. The library is a thin wrapper on top of OpenSL that presents a simple, callback-driven API for audio development, as well as a credible illusion of synchronized input and output. (Android treats input and output as separate streams, and so a credible illusion of synchronized I/O is the best we can hope for right now.)

This new library was derived from the original OpenSL glue of Pd for Android, which in turn was adapted from Victor Lazzarini’s opensl_io library. The latest version is a signification improvement on the original OpenSL glue, in several ways. In particular, it is more robust as well as more efficient, and it uses a near-optimal heuristic for synchronizing input and output streams (more on that in an upcoming post).

The API is in fact identical to that of the original OpenSL glue. I actually wrote about it before, but I’ll do it again here for completeness. The API consists of just a handful of functions, plus a function prototype for the audio processing callback of your app. In order to use it, import opensl_stream.h and define an audio processing callback that looks like this:

void process(void *context, int sample_rate, int buffer_frames,
     int input_channels, const short *input_buffer,
     int output_channels, short *output_buffer) {
  // Read input samples from input_buffer; perform signal processing magic;
  // write output samples to output_buffer.
}

The context pointer can point to any data structure that your processing callback needs to do its job. You supply it to the opensl_open function when you create your OpenSL stream, along with the desired number of input and output channels as well as the processing function you just defined:

OPENSL_STREAM *opensl_open(
    int sample_rate, int input_channels, int output_channels,
    int callback_buffer_size, opensl_process_t process, void *context);

Ownership of the context remains with the caller, i.e., you are responsible to releasing any resources associated with the context pointer when you are done with this stream. Sample rate and callback_buffer_size are supplied by the AudioManager class as of Android 4.2; if you are targeting earlier versions, read the documentation in opensl_stream.h for hints on how to choose a good configuration.

The return value of opensl_open is an opaque pointer to the data structure that opensl_stream uses internally. You simply store it in a variable and pass it to the other functions in this library. opensl_start and opensl_stop are the usual transport functions. When you are done with your stream, make sure to call opensl_close on it in order release its resources (except for the context pointer, which remains your responsibility). That’s pretty much it.

The header file opensl_stream.h contains further documentation. A sample project is available at github.com/nettoyeurny/opensl_stream_sample. Have fun!

3

Thanks to Jan Berkel’s patch, building and packaging externals for Pd for Android has become much simpler. Instead of building externals, renaming them according to Pd’s naming scheme, packaging them for deployment, and writing code for unpacking and installing them, it is now enough to just build them with the Android build tools.

The latest version of libpd (as of 12/28/2012) includes Jan’s patch, which causes Pd to look for the file names that ndk-build produces. That means that the Android tools can now automatically build and package binaries for multiple architectures, and the Android installer will automatically choose the right one for a given devices. The only additional step that’s needed is to add the location of the binaries to Pd’s search path, but if your project uses PdService, you already get that for free.

Consequently, I have removed a few hacks that only served to support the old, awkward way of deploying externals. This may break existing projects that use their own externals, but they’ll be straightforward to fix. I believe the disruption will be minor and worth the payoff. The result is a much cleaner design that everybody will benefit from.

For an example of how to use externals in an Android project, check out the JNI folder of the ScenePlayer sample project, especially Android.mk. That’s all it takes to include externals; there’s no need to do anything on the Java side.

0

The latest version of Pd for Android (as of 12/28/2012) supports low-latency audio for compliant Android devices. When updating your copy, make sure to pull the latest version of both pd-for-android and the libpd submodule from GitHub.

At the moment, only Galaxy Nexus, Nexus 4, and Nexus 10 provide a low-latency track for audio output. In order to hit the low-latency track, an app must use OpenSL, and it must operate at the correct sample rate and buffer size. Those parameters are device dependent (Galaxy Nexus and Nexus 10 operate at 44100Hz, while Nexus 4 operates at 48000Hz; the buffer size is different for each device).

As is its wont, Pd for Android papers over all those complexities as much as possible, providing access to the new low-latency features when available while remaining backward compatible with earlier versions of Android. Under the hood, the audio components of Pd for Android will use OpenSL on Android 2.3 and later, while falling back on the old AudioTrack/AudioRecord API in Java on Android 2.2 and earlier.

Configuring audio for low latency

The class org.puredata.android.io.AudioParameters recommends audio parameters. From the point of view of most app developers, the most important methods are init and suggestSampleRate. The AudioParameters class must be explicitly initialized. The PdService class does this on creation, so you won’t have to worry about this if you only use AudioParameters after binding to PdService. If you aren’t using PdService, or if you want to use AudioParameters before binding to PdService, you need to call AudioParameters.init(this) early in the life of your activity. The init method is safe to call more than once, and so it won’t hurt initialize AudioParameters just in case.

If you access AudioParameters before it has been initialized, then it will initialize itself with default parameters. Those parameters are perfectly safe to use, but they won’t enable the low-latency track.

So, in order to target the low-latency track, just initialize the AudioParameters class with AudioParameters.init(this) early on (if necessary), then get the required sample rate with AudioParameters.suggestSampleRate() and use it when configuring the audio components. Of course, this means that your Pd patch must work at the recommended sample rate (or you prepare one patch for each possible sample rate and then load the appropriate patch).

You won’t have to worry about the buffer size. When using OpenSL, the audio components of Pd for Android will automatically choose the appropriate buffer size, i.e., the initAudio methods of PdAudio and PdService will ignore the buffer size parameter.

Sample code

The CircleOfFifths app that comes with Pd for Android illustrates how to configure libpd for low latency.

1

I am happy to announce that the OpenSL components of libpd are now official; I recently merged them into the master branches of libpd and Pd for Android.

When I wrote up my reasoning behind the OpenSL support of libpd in June, I thought I was done, but then, due to the number of devices and Android versions out there, it took me a while to test the new code to my satisfaction. That turned out to be most fortunate indeed, because the appearance of Android 4.2 in November introduced two major changes. First, it changed the timing requirements of the buffer queue callbacks, requiring a complete revision of my take on threading in the native OpenSL library. Second, it introduced a new Java API for querying audio capabilities of the device.

The first change is entirely under the hood and does not affect the public API of libpd or Pd for Android. The second change, however, is both good news and bad news. It’s good news because it takes the guesswork out of the configuration of OpenSL. Readers who looked at the original version of my OpenSL library will be relieved to see that the awkward heuristics for choosing buffer sizes are gone. The bad news is that the new capabilities are only available in Java, so that it is no longer possible to have a neat, self-contained native library for streaming audio with OpenSL. Instead, any streaming OpenSL library will have to consist of a native component and an auxiliary Java component that configures the native component.

Moreover, the new Java API calls require an application context. As far as I know, there is no good way to statically obtain the current context, and so it is impossible to implicitly query audio capabilities in static initializers (please correct me if I’m wrong). Rather, the AudioParameters class needs to be explicitly initialized once an application context is available.

In other words, this change does affect the public API of Pd for Android, albeit in a fairly minor way. Here are the most important points:

  • The AudioParameters class takes advantage of new features of Android 4.2 when available, but it remains backward compatible with older Android versions as well, all the way back to Android 1.5.
  • Existing code will continue to work as before; if AudioParameters isn’t explicitly initialized, it will initialize itself with safe default parameters.
  • The default parameters are perfectly fine, unless you are targeting the new low-latency features of recent Nexus devices.
  • PdService now initializes AudioParameters when it is created, so there’s no need to change anything if you use PdService and only access AudioParameters after PdService binds to your activity.
  • If you want to use low-latency features and you don’t want to use PdService, then you need to explicitly call AudioParameters.init(this); early in the life of your activity, e.g., in the onCreate method. It is perfectly safe to call init more than once.

Proper documentation of this and other changes will follow soon. Stay tuned!

0

This is second part of a rant on “Inventing on Principle” by Bret Victor. I find it necessary to speak up because misguided ideas from this talk seem to be getting traction, especially at Khan Academy.

One of the key points of “Inventing on Principle” is that software development tools ought to visualize the effects of code in real time, so that developers will have instant visual feedback on any change they make. This is an interesting twist on the old idea of visual programming, visualizing data instead of visualizing code. Bret then shows off some nifty tools for tweaking parameters in graphics and games.

If he had left it at that, it would have been an entirely respectable presentation. But no, instant visualization of data has to be elevated to a general principle that will usher in world peace and cure the common cold. Or something. Anyway, the wheels come off around minute 17 or so, when he explains how to implement binary search with his techniques.

He begins with a brief look at an awkward implementation of binary search, claiming that current development tools require programmers to “play computer”, i.e., to imagine the effect of every line of code on the data.

Let’s first address the elephant in the room: Unless Bret’s development tools come with a magical do-what-I-want button, they can only visualize the effect of a line of code after I enter it. How would I write a line of code without first imagining what it’ll do? Indeed, how would I come up with an algorithm like binary search unless I first imagine it in its entirety? We can dismiss this entire section of the talk on grounds of causality alone, but let’s suspend our disbelief because there’s plenty more to object to.

The next step is a demonstration of how one would implement binary search with Bret’s visual tools. Essentially, the idea seems to be that you stumble from one misbegotten line to the next and hope that the tools will alert you to the myriad mistakes you make along the way. The worst part is that he writes the body of the search loop first and tries to tweak the effect of a single iteration. This is not even playing computer, it’s playing debugger. If that’s how you code, then your methodology is broken and no development tool will save you.

So, let’s do this right. As is often the case in software engineering, the correct approach is completely counterintuitive. The crucial point is that when writing a loop, you need to focus on what the loop won’t change, i.e., the loop invariant. Let’s see how this works in practice, in Python.

def binary_search(key, array):
  # Returns the index of the first occurrence of key in array, or -1 if key
  # is not in array. array must be sorted in ascending order.

  low = 0
  high = len(array) - 1

  # Loop invariant: If key is in array, then the index i of the first
  # occurrence of key satisfies low <= i <= high.
  # The loop invariant is trivially satisfied at the beginning.

  while low < high:
    mid = (low + high) / 2
    # Note that low <= mid < high.

    if array[mid] < key:
      low = mid + 1
      # Since array is sorted and array[mid] < key, the first index of key
      # must be at least mid + 1. Moreover, low is now strictly larger than
      # before.
    else:
      high = mid
      # Since the array is sorted and array[mid] >= key, the first index of
      # key must be less than or equal to mid. Moreover, high is now strictly
      # smaller than before.

    # The loop invariant still holds. Moreover, since the difference between
    # high and low has decreased, this loop will terminate eventually.

  # At this point, the loop invariant still holds, and we have low >= high.
  # Now we only have to determine whether we've actually found key. Note that
  # this behaves correctly if array is empty.
  if low == high and array[low] == key:
    return low
  else:
    return -1

The comments show the thought process that I go through when implementing something like binary search. At no point do I try to imagine what’s happening to all the data. Instead, I track a small number of logical statements and make sure that they remain satisfied. Moreover, I make sure that the algorithm behaves correctly on empty input, an important special case that Bret doesn’t even mention.

I’m not playing computer. Rather, I’m proving a little theorem that says that the code does what it’s supposed to do. None of this is original; all of it should be part of the basic toolbox of any software engineer. If it isn’t, then we need better education, not better tools.

0

Note: Usually I suppress the urge to rant in this venue, but this has been keeping me up at night, so I’ll make an exception. A multi-part exception, in fact.

Every once in a while I come across a link to the talk “Inventing on Principle” by Bret Victor, usually accompanied by words like “brilliant” or “inspiring”. I have a few words as well: infuriating, misguided. That’s too bad because Bret has an impressive resume and lots of interesting ideas and I entirely approve of the gist of the talk. The rhetoric, however, is galling, and many of the examples are misleading at best.

Now, I’m expecting a certain amount of annoyance whenever the topic turns to user interfaces. For some reason, a moralistic tone seems to be de rigueur in UI circles (of all the books on interface design that I’ve read, only “Don’t Make Me Think” by Steve Krug strikes a tone that I can relate to). I have no idea why. If you have new insights into human-computer interaction, good for you. Can’t you present your ideas without first denigrating existing solutions?

Still, if I were to write a rant every time somebody dumps on venerable Unix tools, I’d never have time for anything else. What pushed me over the edge in this case was the choice of words (“barbaric” — really?) and the number and nature of misrepresentations.

For example, around minute 28 or so, he shows this famous picture of Dennis Ritchie and Ken Thompson and their PDP-11.

This picture fills me with awe. These are the people whose ideas defined technology as we know it, and much of what’s wrong with technology today can be traced to willful ignorance of their work. Bret, however, presents this picture as evidence of a dark past, making their humble teletype seem like some medieval instrument of torture. (Alright, I’m exaggerating. This is a rant, okay?)

Next, he claims that C was designed for teletypes. Now that’s just wrong. C and Unix go together, and one of the core Unix principles is that everything is a file (Unix isn’t entirely consistent that way, but that’s another topic). From the point of view of C, the keyboard of the teletype is just another input file; the printer is just another output file. There’s absolutely nothing in C that’s specific to teletypes.

Okay, that wasn’t so bad. What really upset me, however, was the next claim: “Every time you’re using a console or terminal window, you’re emulating a teletype.” Bzzzt, wrong again. This one is actually bad because it is a complete misrepresentation of the nature of computers.

Fundamentally, computers are symbol manipulation machines. Symbols go in, an algorithm runs, symbols come out. That’s our basic model of computation, going back to Turing and Babbage. You can obfuscate this simple core by generating input symbols with a mouse and displaying output symbols graphically, but deep down it’s still symbols in, algorithm, symbols out.

Punch cards, teletypes, and terminal windows are all expressions of this basic pattern. It’s not that command lines are a holdover from the days of teletypes. The truth is that there’s a platonic ideal of computation, and teletypes and command lines are just different reifications of it. That’s why command lines are here to stay. (This doesn’t mean that they’re the right tool for all purposes, of course, and some of Bret’s tools look very nice indeed; all I’m saying is that it’s profoundly misguided to reject command lines as obsolete.)

Moreover, I have yet to see a user interface that matches the elegant simplicity of Unix pipes. I remember learning about pipes as one of the defining moments in my education. It felt like a revelation. Show me an interface that can compete with pipes in terms of power, concision, and flexibility. Until then, I’ll be emulating a teletype, thank you very much.

1

With all the prerequisites for OpenSL support in place, the last step was to make the new functionality accessible in a way that’s transparent to existing Android projects. After giving the matter much thought, I settled on the following solution: Extend the existing Java API with native methods for controlling OpenSL; provide two different native binaries, one implementing the new methods for Android 2.3 or later, and one leaving them as no-ops for Android 2.2 or earlier.

The new methods are quite simple. First, there is a method that indicates whether the new methods provide meaningful functionality: boolean implementsAudio() If this method returns false, Pd for Android falls back on the old Java components for audio. Second, there is a pair of methods for creating and destroying the audio components: int openAudio(int inputChannels, int outputChannels, int sampleRate) and void closeAudio(). The openAudio method was actually there before, but the original version only set the audio parameters of Pd; the new version creates the corresponding OpenSL objects as well. Third, there are the usual transport methods: int startAudio(), int pauseAudio(), and boolean isRunning().

None of these new methods will actually concern developers who build apps on top of Pd for Android; just like the audio processing methods that existed before, they will only be called by the PdAudio class that hides all technicalities of the audio setup. PdAudio now supports both the old and the new audio components, but its public API remains unchanged. It also automatically polls the message queue, so that the new take on receiving messages from Pd will also be transparent to developers.

Coda

When I started contemplating OpenSL support for libpd, I didn’t even see how this could possibly work without without breaking existing projects. With the joy of hindsight, however, all the changes seem perfectly straightforward, maybe even obvious. This gives me a good feeling, about the new additions as well as the original design.

The one (very minor) drawback is that this change requires a slight change to the way non-Android applications use the Java bindings because developers will have to remember to poll the message queue if they want to receive messages from Pd, but I believe that the improved performance of the audio processing methods will more than make up for this inconvenience.

Moreover, although it wasn’t the goal, it turns out that this revision opens up a number of exciting new possibilities for non-Android uses of the Java components. To wit, there’s no reason why this technique, swapping out audio binaries underneath the same Java API, should be limited to Android. There’s no reason why we can’t have binaries that implement audio with JACK or PulseAudio for Linux, CoreAudio for Macs, or ASIO for Windows. Alternatively, one might just implement it once, using PortAudio, and compile it for all major platforms. (I already have a PortAudio prototype that seems to work nicely.) The Processing branch might also benefit from this.

2

The most important aspect of OpenSL support for libpd is that the audio thread is not attached to the Java virtual machine. This has two consequences that affect the design of the Java bindings. First, thread synchronization can no longer be done in Java. That’s easy to fix; just get rid of locks on the Java side and use pthread mutexes in the JNI layer instead. Second, the audio processing methods of libpd can no longer invoke message callbacks in Java when Pd sends messages to libpd’s receiver objects. Instead, it now writes a binary representation of Pd messages to a lock-free ring buffer, and the Java API has a new method, pollMessageQueue, that reads messages from the ring buffer and invokes the corresponding callbacks in Java.

From the point of view of Android developers, this change makes no difference because the new version of the audio components of Pd for Android will automatically poll the message queue every 20ms as long as the audio is playing; all messages will be delivered almost instantaneously, and existing projects will work unchanged. In particular, my book remains up to date, except for one minor subsection that provides some background on Pd for Android. Most importantly, all sample code in the book is still current.

The new message queuing mechanism is factored in such a way that it’s easy reuse. There’s a new pair of source files, z_queued.[hc], that installs the queuing mechanism and provides a new set of message hooks:
EXTERN t_libpd_printhook libpd_queued_printhook;
EXTERN t_libpd_banghook libpd_queued_banghook;
EXTERN t_libpd_floathook libpd_queued_floathook;
EXTERN t_libpd_symbolhook libpd_queued_symbolhook;
EXTERN t_libpd_listhook libpd_queued_listhook;
EXTERN t_libpd_messagehook libpd_queued_messagehook;

EXTERN t_libpd_noteonhook libpd_queued_noteonhook;
EXTERN t_libpd_controlchangehook libpd_queued_controlchangehook;
EXTERN t_libpd_programchangehook libpd_queued_programchangehook;
EXTERN t_libpd_pitchbendhook libpd_queued_pitchbendhook;
EXTERN t_libpd_aftertouchhook libpd_queued_aftertouchhook;
EXTERN t_libpd_polyaftertouchhook libpd_queued_polyaftertouchhook;
EXTERN t_libpd_midibytehook libpd_queued_midibytehook;

int libpd_queued_init();
void libpd_queued_release();
void libpd_queued_receive();

If you use libpd in C or C++ and would like to add the new queuing mechanism, the transition is straightforward: Use libpd_queued_init instead of libpd_init, and assign your message callbacks to the queued hooks instead of the original hooks. Call libpd_queue_receive in order to poll the queue and invoke callbacks, and call libpd_queued_release when you’re done with libpd. That’s all.

2

Note: This post has been superceded by a new post on a revised version of this API: New opensl_stream library, Part I: The API

One of the reasons it took me so long to implement OpenSL support for libpd is that I was dreading the complexity of OpenSL. I tried reading the documentation more than once, but my eyes glazed over every time. Fortunately, Victor Lazzarini came along and posted a great tutorial on streaming audio with OpenSL.

Now that I had some sample code that took care of the setup, configuration, and cleanup of OpenSL objects, it was straightforward to revise it for my purposes. The main change I made was to equip OpenSL with a simple, callback-driven API inspired by the JACK audio connection kit, much like the Java-based API that I put together when I first started working on Pd for Android.

In order to use this library, you need to implement an audio processing callback that reads a buffer of input samples and writes a buffer of output samples. The exact signature looks like this:
typedef void (*opensl_process_t)(void *context, int sRate, int bufFrames, int inChans, const short *inBuf, int outChans, short *outBuf);
In addition to the input and output buffers, the parameters include the sample rate, buffer size in frames, and the number of input and output channels, as well as a pointer that may be used to store additional context information between callbacks.

Once the processing callback is in place, you can open an OpenSL I/O stream by specifying the desired sample rate, channel configuration, buffer size, and processing callback:
OPENSL_STREAM *opensl_open(int sRate, int inChans, int outChans, int bufFrames, opensl_process_t proc, void *context);
The context pointer can be NULL if the callback needs no additional information. The return value will be an opaque pointer representing the OpenSL I/O stream on success, or NULL on failure.

(By the way, I should point out that there is no such thing as an OpenSL I/O stream. Much like Android’s Java API, OpenSL has objects for either input or output, but not both. One of the purposes of this API is to tie them together to create a reasonable illusion of duplex audio.)

Now that you have an OpenSL stream, you can start and pause it with the following pair of functions, whose return values are zero on success:
int opensl_start(OPENSL_STREAM *p);
int opensl_pause(OPENSL_STREAM *p);

When you’re done with the stream, you should close it in order to release the resources that it takes up:
void opensl_close(OPENSL_STREAM *p);
After you’ve closed the stream, the pointer is no longer valid.

That’s all there is to it. I hope that this API will be useful beyond its original purpose, and that it will lower the barrier to entry for OpenSL development. The code is available from the opensl branch of Pd for Android: https://github.com/libpd/libpd/tree/opensl/jni New opensl_stream library, Part I: The API

3

A few weeks ago, I decided to add support for OpenSL ES to Pd for Android. OpenSL ES is the native audio API of Android as of Android 2.3 (Gingerbread), and supporting it makes sense since it promises to reduce the overhead that the audio thread has to deal with.

The entire project felt a bit like squaring the circle because it required changing pretty much the entire JNI layer while leaving the public Java API untouched. After all, there are a fair number of existing projects that I didn’t want to break, and I also wanted to make sure that my recent book would remain relevant.

The good news is that I managed to meet those goals, while simultaneously opening up exciting new use cases for the Java bindings of libpd. In particular, the new version of libpd transparently supports all versions of Android. It uses OpenSL when available and defaults to the previous solution otherwise.

I’ll elaborate on the various components in a series of upcoming posts. I believe that some of them may be interesting in their own right. In the meantime, you can check out the upcoming revision of Pd for Android at GitHub. (Make sure to say git checkout opensl as well as git submodule update after cloning pd-for-android.) Any feedback will be appreciated!

8