Andy Lee sent me a bunch of excellent feedback about FlexTime, and let me know about a strange, 100% reproducible crashing bug. If you configure FlexTime such that both the ending cue of one activity and the starting cue of the one that follows are “Speak Text” cues, then the application crashes.
First thought: damn I’m glad I put a beta out. Second thought: good lord, what I have done!?
Unfortunately, the bug is not in my code. I was able to reproduce the problem quite easily with the simplest of command line tools:
You may not have realized that it was quite so simple to accomplish spoken text on a Mac. Unfortunately, the simplicity is deceptive, since compiling and running the above tool results in a nasty crash:
SpeakString has been around for a long time. Long before Mac OS X and long before CoreAudio, where the crash appears to be happening. I would guess it didn’t used to crash, but when it was ported to Mac OS X, something got overlooked and now it leads to whammy land.
OK, so I how do I work around the problem? It is clearly related to attempting to speak text while some text is already speaking. Maybe if I could coddle the Speech Manager a little bit, I could prevent it from crashing.
From the Speech Synthesis Manager Reference documentation for SpeakString, we see that the behavior for overlapping speech is (supposed to be) very well defined:
Translation: what we’re doing is supposed to work. But maybe by overdoing it we can achieve the desired goal. If Mac OS X falls down on the “interrupting immediately” behavior, perhaps we can manually stop any previous sound to help it keep its bearings. According to the documentation, calling “SpeakString(NULL)” should effectively cancel playback. Unfortunately, injecting it into my simple crash case changes nothing. Worse, when I add it to my live application, I observe a new failure path. The text “pure virtual method called” is printed to the console, with the following backtrace:
Well, this can is getting wormier and wormier. It is starting to look like I won’t be able to take advantage of the ease and simplicity of SpeakString. Ten years ago, sure. But in 2006 SpeakString es muy sucky. It’s probably time to start looking at the more advanced speech API, where I’m responsible for managing my own speech channels. With responsibility also comes (we hope) the ability to save ourselves from certain doom.
But let’s say I just need to stick with SpeakString, because I have a demo in 5 minutes, or users are just screaming bloody murder about this bug. There is a crude workaround that takes all the asynchronous fun out of speech, but also prevents the crash. By explicitly waiting for the Speech Manager to be done with any previous speech, I can prevent it from maiming itself:
This also “works” in FlexTime, for some definition of “working.” But it can cause hideous stalls in the playback UI, since I’m blocking there for an indeterminate length of time. Passable in a beta release, but not acceptable for a finished product.
Sigh. I’m going to have to do real work. But you don’t have to. RSSafeSpeaker is a simple singleton class designed to make worry-free overlapping speech easy for the Cocoa programmer. Instead of trying to manage a number of open speech channels, this class takes the approach that it’s “good enough” to just allocate and deallocate a channel for every speech made. Obviously for some purposes this will not be suitable, and you’ll want to manage a pool of open channels. For the “everyday, get this done easily” use though, I hope you’ll find this class handy. Rewriting our previous example using RSSafeSpeaker:
No crashes! And I get to use NSString. Everything is better. This is a good example of a situation where the shortcomings of Apple’s API caused me grief and made me go to a lot of extra work. But it’s also an example of such a situation where the extra work won’t be for naught. It’s a good idea for me to use the “deeper” speech APIs, because it’s inevitable that I’ll want to have finer control over the playback effects in my application. It was just a lot easier to choose “SpeakString” as the quickest solution. If anything else persnickety comes up, I’ll be in an excellent position to respond quickly and effectively. All in all, time well spent!
Oh, and in case anybody was worried, I did report the crashing bug to Apple (rdar://problem/4633582).
Update: Oh man, don’t I feel like a dork! I somehow missed the presence of NSSpeechSynthesizer, altogether. Thanks to Jim Correia for pointing it out to me via email. It does seem to work, and doesn’t crash. Of course, now that I’ve got the infrastructure in place, I might as well keep using it, since it will ultimately give me more control over the playback options. But NSSpeechSynthesizer does seem a better choice for most purposes.
It looks like each NSSpeechSynthesizer corresponds with a “speech channel,” so if you actually want to overlap voices (instead of just causing the previous speech to be canceled), you’d need to allocate multiple speech synths (similarly to how my RSSafeSpeaker allocates a speech channel for each request).
Thanks again to Jim for sharing this! I am embarrassed to have overlooked it…