String Theory

February 26th, 2006

Every good API (or language) should have a fundamental, powerful, impossible to ignore encapsulation of a primary building block in computer programming: the string. Wikipedia defines a string as a “sequence of various simple objects.” Wow! That’s an indulgent definition, but kind of nice in its flexibility. When such a robustness is properly implemented in a string object or type, it facilitates effortless and worry-free manipulation by developers.

Even back when it was generally presumed that only white, English-speaking dudes with lab coats would ever need to make sense of the bits of text that went into and came out of a computer program, there was an alarming lack of consensus on how that data should be represented. I’m sure some of you are old enough (or just unlucky enough) that you’ve had to stare down a chunk of data, scratch your head and ask, “Is this an ASCII or EBCDIC string? Mac programmers with a history of Carbon programming are no doubt familiar with the ever-tedious task of converting strings between Pascal and C-style formats. As the de facto standard programming language of the Mac slowly shifted, the format expected by Apple APIs followed suit at varying paces. For those of you lucky enough to have been born into a world where strings are magical objects that handle Chinese characters as easily as Roman type: you should count your good 时运!

Cocoa programmers are among the luckiest programmers in the world. NSString is a comprehensive, powerful encapsulation of Unicode strings. It’s because of this encapsulation that I was able to copy the Chinese characters in the above paragraph out of Safari and paste them into MarsEdit without thinking twice about whether it would work. OK, that was a lie. I was a little bit concerned, but I didn’t need to be! It was an old-man paranoia, based on years of living under the repressive restrictions of a world without Unicode. Now, it just works! And the developers of Safari and MarsEdit probably didn’t think twice about it, either.

But the biggest winners here are the users. José and Aslög, who have spent their lives tweaking the spelling of their names to suit fussy computer programs and DMV clerks, are more and more likely to see their names pleasantly represented in-full by our electronic friends. And their sacrifice has been slight compared to those whose names require a complete reworking to satisfy the technical ineptitude of Western technology. While the Windows and Linux platforms have also made great strides in recent years, it continues to be my experience that the Mac “just works” in far more circumstances than on other platforms.

It was enough of a gift for Apple to take care of all the encoding, multi-byte issues, allocation, etc. But they also snuck NSString into nooks and crannies of the system where only a developer could appreciate them. The flexibility of NSString is shown off in a few unexpected ways that not all developers are familiar with.

I Can See You

Most Cocoa developers are very familiar with the handy NSLog function. This little beauty is the modern replacement for traditional “printf debugging.” Some circumstances call for “step by step” debugging, while at other times its faster to spew information out to the console, hoping to get a clue as to why your program is misbehaving. Like printf, NSLog takes a template string and a variable number of value parameters which are then formatted and inserted into the template as requested by the developer. In addition to the standard printf codes for converting things like long integers to text, the Objective-C runtime gives us a super-powerful code, “%@”, for converting an arbitrary object to text. So if you’re confused as to why your array of fruits is not showing up as expected, you can simply add a line to your program at an opportune point:

NSLog(@”My fruits: %@”, fruitArray);

When you run your program, you get this output conveniently displayed in the console (or Xcode run log):

2006-02-26 11:56:35.939 MyApplication[25666] My fruits: (Apple, Banana, Pear)

Aha! The strawberries are missing! This powerful “object inspection” facility is enabled by a simple message, “description,” which all Cocoa objects respond to by returning a human-readable representation of the object in NSString format. The quality of the description ranges from “not very useful” to “exceedingly informative,” depending on whether the particular object’s class has overridden the default implementation, which simply spits out the class and a hex-formatted representation of the object’s address in memory. The promise that something will be returned is powerful enough to make “printf debugging” of objects a lot easier than it otherwise would be.

String from What?

Once you become accustomed to NSLogging everything under the sun, it can come as quite a surprise that a few common Objective-C types don’t bend so easily as the rest. Examples of such types are NSRect, NSSize, and NSRange. These compromised “objects” are implemented as plain C-structs rather than as proper objects, presumably as a performance consideration. Since they’re not objects, there is no freebie “description” method enabling them to be effortlessly passed as arguments to NSLog. But these values are just as likely to run amok and cause bugs in your program, so what do you do when it’s time to spray their contents out to the log? Something I’ve witnessed developers doing on more than one occasion is to simply revert to printf-style inspection:

NSLog(@"My rectangle: origin = (%f, %f), size = (%f, %f)", 
	myRect.origin.x, myRect.origin.y, myRect.size.width, myRect.size.height);

Which yields:

My rectangle: origin = (0.000000, 0.000000), size = (300.000000, 500.000000)

Clearly, this works. And it’s a fine habit to get into if you’re training for the typing olympics, or you’re masochistic, or both. You might ask yourself why a group of engineers who designed such an elegant, introspective runtime environment would allow these essential building blocks to slip through the cracks. The answer? They didn’t. A number of handy utilities exist to not only convert these basic types to string format, but to then convert them back into their native struct format as you see fit. Using NSStringFromRect, we can replace the lengthy log statement above with this:

NSLog(@"My rectangle: %@", NSStringFromRect(myRect));

Which yields:

2006-02-26 12:19:45.578 MyApplication[29788] My rectangle: {{0, 0}, {300, 500}}

The developers at Apple acknowledge the universal usefulness of NSString, and therefore provide a number of these utilities, allowing you to “objectify” many fundamental types: NSStringFromRange, NSStringFromSize, NSStringFromClass, NSStringFromSelector, etc.

Not Just for Show

This objectification of basic types is very handy for logging purposes, and a great tool to keep in your belt for tackling everyday programming problems. Types like NSRect and NSSize can’t be tossed around willy-nilly in Cocoa’s collection classes, but NSString can! So when you find yourself needing to store a list of NSRanges in your Cocoa object, don’t revert to your old C ways of dynamically allocating an array of structs. Simply objectify the ranges and stick them in a mutable array:

[mRangesToWatch addObject:NSStringFromRange(newRange)];

When it comes time to examine a range, just convert it back to its native format:

NSRange oldRange = NSRangeFromString([mRangesToWatch objectAtIndex:0]);

This even works with nil values. NSRangeFromString(nil) produces an empty range.

Chris Liscio is a super, mega, ultra groovy Mac developer who points out that in some cases, this easy path to objectification is too inefficient for practical use. He recently authored a custom wrapper of NSRect, precisely so he could maintain an array of these objects. In his case, the array of rectangles needs to be constantly reviewed and compared with other rectangles, so the constant conversion to and from string format would have been ridiculous. But in many instances, taking advantage of the built in NSStringFromBlah functions will prove to be both an easier to implement and easier to maintain solution that writing your own custom wrapper.

Update: Michael Tsai comments below about the speed of such a wrapper vs. NSValue – essentially bringing to light my obliviousness to using NSValue … the way it’s documented! I’d always just associated NSValue with NSNumber. It makes my recommendations above to use NSStrings for such a purpose sound kind of stupid. C’est la vie! Thanks Michael for reminding us of yet another elegant feature of the API.

It’s on the House

Apple deserves a standing ovation for their clever use of NSString throughout the Cocoa API. Just about every method that calls for text as input or output does so in the form of NSString. So long as we’re in the cozy confines of Cocoa, we can sit back and bask in the glow of our convenient, precious little class. But Apple really went the extra mile by “toll-free bridging” it to CFString, its leaner, procedural cousin in the CoreFoundation framework. What this means is that the utility of NSString essentially reaches not only across the Cocoa API, but across the vast number of Apple APIs that rely on the CoreFoundation framework, and therefore use CFStringRef as their basic unit of text manipulation. This makes it easy for Cocoa developers to take advantage of lower-level libraries without jumping through hoops to do so. If you see a CFStringRef and wish it were an NSString so you could easily copy it or add it to an array, “just cast it.” Casting a CFStringRef to an NSString* (from Cocoa code) feels dangerous and dirty, but it’s actually the right thing to do, and is a heck of a convenience. Casting the opposite direction is also handy and equally valid. Just be sure to manage your retain counts. The best things in life are free. Thanks, Apple.

7 Responses to “String Theory”

  1. Michael Tsai Says:

    Is Chris’s wrapper faster than NSValue?

  2. Jonathan Wight Says:

    NSValue already has wrappers for NSRect, NSPoint, NSSIze and NSRange. I’d like to think that it is more efficient than converting the structs to strings.

  3. Chris Liscio Says:

    Michael: Probably no different, actually. When Daniel and I originally spoke about this, neither of us even thought to use NSValue. I just hacked out my own class because I didn’t know any better, and using an NSString just seemed wrong for my needs.

    So I guess I’m not such a super mega, ultra groovy developer after all… ;)

  4. Daniel Jalkut Says:

    Thanks guys for bringing this to light quickly so we can hopefully avoid an avalanche of comments reminding me that I’m an idiot :) I’ve updated the original entry above to include a reaction to Michael’s comment. Let this be a lesson to all onlookers: blogging is dangerous – especially when you assume a tone of knowing something worth sharing with the world :)

  5. Johannes Fortmann Says:

    I love bridging: it should be used more thoroughly throughout the API. e.g. why isn’t CFRunloop bridged with NSRunloop? Wherever there’s a 1:1 relation between objects, there should also be a bridge.

    Bridging is very easy, by the way: since an Objective C object is nothing more than a struct with an isa pointer as first member, it’s sufficient to just set that right after initialization.

    An excerpt from a class I once wrote:

    struct Font
    {
    public:
    Font(const std::string&,int fontsize);
    void Print(const std::string& text);
    private:
    void *isa;
    unsigned int texture;
    unsigned int displaylist;
    };

    This is a C++ class (in a standard cpp file. The constructor sets isa to a Objective C class with the same variable layout. Of course, this doesn’t work with “real” C++ classes, which have their vtable in the place where the isa pointer should be for the Objective C class. With this class, you can call [font drawString:@”blah”] as well as font->Print(std::string(“blah”).

    Now, that was off-topic :-)

  6. Kevin Ballard Says:

    Johannes: NSRunLoop is not bridged to CFRunLoop because it’s a wrapper, not an equivalency. NSRunLoop has functionality not present in CFRunLoop, which presumably requires extra ivars, which means it cannot be bridged (since bridging objects requires having the same underlying memory structure).

  7. Daniel Jalkut Says:

    Kevin: couldn’t they just put padding bytes in for any unused ivars on CFRunLoop? Not saying they should do so without thinking carefully, but the mere mismatch in ivars doesn’t seem like a complete deal breaker.

Comments are Closed.

Follow the Conversation

Stay up-to-date by subscribing to the Comments RSS Feed for this entry.