reCAPTCHA – Combining Distributed Problem Solving with a Web Service

I ran into an interesting project this morning called reCAPTCHA. In the spirit of distributed computing solutions, such as folding@home, it tackles a difficult problem by splitting it up and farming the pieces out. What makes this interesting is that instead of having computers solve the problem, people do.

ReCAPTCHA actually tries to solve two problems. The reCAPTCHA project pipelines the unrecognizable words from a book scanning OCR effort into a freely available web service for verifying your humanity, a CAPTCHA. Instead of each CAPTCHA puzzle being a necessary but regrettable waste of human effort, reCAPTCHA harnesses this otherwise lost resource. How brilliant is that!

The web service looks very interesting to me. I’m due to revisit a submission form soon that contains a CAPTCHA that I wrote several years ago that I know has been broken. The system contains a very extensive blacklisting system, so the weakness of the CAPTCHA has never been enough of a problem to warrant its replacement, but I’m curious to see what difference this service will make.

php|tek Slides

Well, php|tek is over. It was a great conference and I’m really glad I went. This was my first PHP conference.

One of my main goals was to meet some of the folks at php|architect. I’ve been writing the Test Pattern column for them for over a year now. I was finally able to put faces to the names that I keep seeing in my email box. Actually, one of the cooler things about the conference was meeting people who I’m familiar with from online, either from forums, IRC, mailing lists or blogs.

I’d like to thank everyone who attended my sessions. Here are the slides from each session:

Let Your Properties be Properties

There is a coding pattern that I see (and have used) in PHP code that defines generic methods on a class for setting and getting properties.

function set($name, $value);
function get($name);

Google code search for examples

Some times there are some ancillary methods to deal with unsetting, checking for existence, setting via an array, or dealing with references in PHP 4. They can really clutter up the definition of a class. That’s not good. All this code is fairly standard, too, but it gets duplicated on every class that does this. That’s not good, either.

Oh, I’ll solve this problem by making a base class, some may say. Wrong. This a very feeble reason to spend your one shot at inheritance. Trust me, I know, I’ve done it.

I think the idea is to make the class extensible. But PHP is really ok with just setting new properties on a class.

$obj->foo = 'bar';

So why not just do this?

Another variation of this pattern is to use setXXX($name, $value) or setYYY($name, $value) methods. This happens alot with “options” or “vars” or “properties.” It also happens on request wrapper classes. To me this looks like there is an object here just begging to get out for each XXX and YYY.

$obj->xxx->prop = 'foo';
$obj->yyy->prop = 'foo';
$obj->zzz->prop = 'foo';

This eliminates a slew of property manipulation methods and leaves the original class free to implement its true purpose. Methods of the form getXXX($name) and setXXX($name, $value) should be the solution of last resort.

Since I’ve started eliminating these in my own code in favor of direct properties, intermediate objects or __set and __get, I feel I’ve seen nothing but positive results. Try it. You may like it, too. Let your properties be properties.

From reading the comments, I think there was some confusion about what I meant in this post. I am not talking about using naked properties instead of accessor methods. I’m not talking about accessor methods at all in this post. A specific accessor method, such as


where the name of the property is part of the method name is very different than


Where you pass the name of the property as a string parameter. Its only the latter pattern, where the property name is an actual parameter to the method, that I am talking about in this post and that I think should generally be refactored.

Where do you get your Wi-Fi?

Sometimes you just have to get out of the house or out of the office. And some of those times, you have to use the internet as well.

I’ve collected a (short) list of places around town that have Wi-Fi. I usually use the Wi-Fi at a small local coffee shop. They are open longer hours than the public library and they don’t glare at you for your beverage. For the times when the coffee shop is not open, like late evening, I’ve been going to McDonalds.

Unfortunately, I went to McDonalds this evening and they seem to have gone pay for their WiFi. They want $2.95 for a two hour block. Now, I actually wouldn’t mind paying that. I had my credit card out. But I’d just shelled out $5 something for greasy meal that I really didn’t want but that I only bought so I could sit there and use the WiFi. I don’t think I’ve gone to McD of my own volition for years, not at least after 10:00 am when they stop serving breakfast burritos. That is until I found out they had WiFi.

After I thought about it, I put my credit card away. I took my $2.95 and headed over to a local bar/pizza place that has WiFi. I invested in a beer that I don’t really want so that I can use their WiFi. Well, now that I have it, its really not so bad.

Actually, I’ve done a couple php|arch columns from here before. And that’s what I have to do tonight, finish up a Test Pattern column and do some work on my 3 php|tek talks. When I’m done I’ll leave an extra $3 in the tip.

That’s where I get my Wi-Fi. Where do you get yours?

On the Perils of Inline API Documentation

Travis Swicegood has a post questioning the value of the docblock. I have a deep sympathy with this sentiment.

Even on projects with extensive generated documentation, I find that kind of documentation to be of extremely low value. The problem with inline API documentation is that there is no sense of priority. Developers are encouraged to document every element that can possibly be documented. Then, when the documentation is generated, there is no way to distinguish the important from noise.

Another problem is restating the obvious. When a property or method really is well named, there may not be much more to say in the docblock. But, the mechanics of docblocks invites the programmer to say it anyway. So, you end up with comments like “$widget the widget.”

Duplication is also a significant problem. If you inherit from a base class or you implement an interface, there is a tendency to copy and paste the doc block from the parent to the children. This is an obvious maintenance problem. In fact, this is one of the major problems with comments. If comments restate what the code does, its a form of code duplication. When the code is changed, it requires changes in the comments. In this sense, comments make code harder to maintain.

Lately, I’ve been trying to curb the attractive nuisance aspect of docblock comments by replacing docblocks with comments like

// Definition in parent


// docblock intentionally omitted.

It would be nice if there were standard abbreviations for things like these. The idea being to eliminate the attractive nuisance of the commentable element by placing a comment there that is not a docblock, and then commenting only the elements where you have something to say.

Well, if docblocks are so bad, what is the alternative? Well, I’ve tried a few, including using a wiki for API documentation. Here is the problem. If duplication between the comment and the code is created by inline documentation, that maintenance problem becomes significantly worse with distance. If the documentation is external to the code, that distance really harms the ability to keep the documentation in sync with the code. So, docblocks are bad, but everything else is worse.

Docblocks were popularized by Sun and Java (or maybe Donald Knuth). But when sun was documenting their Java APIs with docblocks, they had a professional documentation team to do the work. When teams that don’t have a dedicated documenter role use these tools, I think they fall short.

So what is the solution? I’d like to see better techniques and conventions for solving the problems of docblocks: avoiding duplication, avoiding restating the obvious, and avoiding the tendency to docblock everything thats docblockable. Maybe there are more successful documenters out there who are handling these API documentation anti-patterns better than I am. I’d sure like to hear how.

Software Development Team Diversity

Matt Mullenweg has a post about Hiring Diversity. A successful software project must fulfill many competing goals and factors and meet a wide variety of challenges. Diversity is the combined arms of software development. In my personal experience, the diverse team performs better. A diverse team allows the most appropriate team member to step up to a challenge. The interaction between team members with different points of view helps a project balance competing goals.

Technical Background

A team needs members with a technical background. These guys are the ones that ensure everything works. They can write the hard bits of code, keep everything performing and make sure the code is maintainable. Without these guys, you will have problems. On the other hand, sometimes those with a technical background view problems solving as a puzzle, where the solution is the end in itself.
In my experience, team members with a less technical background are more empathic toward the user community. They view problem solving as a means to a goal. A customer focus is what makes a project successful. Software can be successful despite technical problems, but it cannot be successful if doesn’t meet the customer’s needs. Having some non-technical people helps prevent your resident technocrats from building a system that will service 10,000 concurrent users, but doesn’t meet the needs of the 329 users you have today.
Perhaps at the extreme end of this spectrum is the extreme programming practice of including a customer on the development team.


In expert versus novice programmers, I make the distinction between skill level and experience. Age, skill and experience often correlate, but not always. Your young apprentice programmers are important to have on a team. They can react faster. Sometimes not knowing better is a bonus. They’ll do things more senior team members don’t want to do. They do things they’re not supposed to do, sometimes for the benefit of the team.
I think people underestimate how long it takes to actually become an expert, as I mention in my previous post. But, then they also underestimate how similar all software projects are. They also overestimate the impact of their toolset. Generally the technologies in the laundry lists that form most job listings are not the critical success factors for that project.
People want a programmer with who can hit the ground running, who is familiar with all of the teams tools, languages and libraries. But, this is a monoculture as well. Sometimes you need people who aren’t from your toolset culture to point out where you have swallowed the hook too deeply or to introduce new tools and techniques from their alternate background. Sometimes their learning process can teach you something.
While some experience is valued in programmers, age sometimes isn’t. However, you need the older, more stable people as well. They keep the young ones from repeating the mistakes of the past. They keep the project on an even keel. It helps to have an older perspective on the team. I think they help keep attention focused on what matters. The tension between the older and younger team members helps keep the team on track.

Traditional Diversity Factors

I have no opinion on the impact of race. I have an opinion about women on software teams. I wish there were more of them. I have worked on software development teams with diverse nationalities. I regard that experience as positive. Different cultures have different tolerances for risk, different respect for authority or tradition. I think cultural diversity on a team is a good thing.

Managing Diversity

Over the years, there have been many methods proposed of matching people with the jobs they are best at. The solution that works best is to ask them. People gravitate toward the tasks that require their special skill sets. Good team management means hiring diversely and allowing that migration. On a well managed, diverse team, people will self select the jobs they are best at. As a result, the diverse team is more resilient and I believe potentially more successful.

The Problem with Markup Languages

Chris Shiflett has a post today, Allowing HTML and Preventing XSS. The problem is how to allow users to format their contributed content without introducing security vulnerabilities. The answer is usually some sort of markup language or filtering and sanitization of HTML.

BBCODE was designed for this purpose. There is no actual standard, but the core syntax seems fairly uniform. It’s good for those used to forums, where it seems to norm.

HTML markup is nice because it is a standard, even if varying subsets are supported. Learning a little HTML isn’t going to hurt anyone, at least for the next 20 years or so. The problem is that HTML was never intended to be hand edited. The syntax is not the most inviting, and different HTML-like markup languages handle whitespace differently than the HTML standard.

Wiki markup syntaxes were designed to be human friendly. The main problem I have with wiki syntax is that there is no standard. It seems like every wiki has a different way to formulate a link, for example. I guess there is some progress with Wiki Creole, but I still have a bad taste in my mouth.

The other problem I have with wiki markup is that I find it to be non-deterministic. When I edit any given wiki and try to use more than basic formatting, I never know what I am going to get. Most of the markup processing engines for these wikis are impenetrable morasses of regular expressions. It can be hard to gauge interactions. Are you really sure they are secure?

Speaking of impenetrable morasses of regular expressions, have you ever looked at WordPress’s input path? I’m sure every one with a WordPress blog who likes to blog about PHP code knows that it is a code eater. I’ve been particularly disappointed with WordPress in this area. Most the “code formatting” plugins still have problems protecting code from WordPress’ heavy hand.

But the WordPress preg_replace gauntlet doesn’t just mangle code. I have a post which has been sitting in draft mode for several weeks because I can’t figure out how to give it the proper markup. WordPress is somehow taking my perfectly balanced input markup and producing “unbalanced” output markup. I haven’t yet tracked down the problem to either submit a fix or to do a good bug report. Frankly, I’m not looking forward to trudging through all those regular expressions.

In Chris’ post, he takes the regular expression approach. Folks in the comments have pointed out a few problems with his approach, including the problem of interleaved tags. If you can’t tell by now, I am not a fan of the regular expression gauntlet approach to markup languages. I prefer a defined syntax and a traditional computer science style parser (which may use regular expressions).

The other must-have is a preview option. With so much variation in markup languages, not having a preview leaves the user to play Russian roulette with their submitted content. I’ve talked about that before in the usability of input filtering. This is another area where WordPress leaves the user high and dry.

The complex input path in WordPress combined with its reliance on global variables seems to leave it unable to do an in-page preview. The admin area preview is an IFRAME so that it launches a separate request. The various live preview plugins are JavaScript based and don’t work when it is disabled. They also don’t pass the input through the same input path that WordPress uses, so they are not a true preview.

I don’t mean for this to be a WordPress rant, on the whole, I like WordPress. Rather, I just wanted to point out how hard it can be to do good input filtering, that is safe, reliable, deterministic, and usable.

Firefox Extensions for Web Developers

I prefer Safari for my casual web browsing on the Mac, but for web development, nothing beats Firefox. (Firefox beats IE hands down on Windows.) Firefox’s openness and the Firefox plugin architecture means that there is little that you cannot find out about a web page with a Firefox add-on. I’ve tried a bunch of different Firefox extensions for web development. Here are the ones that I find most useful and that I use on a regular basis.

DOM Inspector

Yes, yes, it comes installed with Firefox, but lets not forget the basics. The DOM Inspector allows you to see what is actually going on in your web document. The DOM Inspector lets you browse DOM nodes, style sheets, or Java Script objects. You select a node by either drilling down, by searching, or by clicking on it. Although, the UI for selecting a node with your mouse is just plain lousy. Once you’ve chosen your subject, the DOM inspector can show you the box model information for that node, the style sheets associated with the node, the computed CSS styles, or the Javascript object.

Web Developer Extension

Chris Pederick’s Web developer extension has been out for a long time and is the plugin I am most familiar with. This is really the swiss army knife of web developer tools. It is so feature packed that I am still finding new things that it does. Unfortunately, the UI is also so cluttered that I am still finding new things that it does.

This add-on can slice and dice a web page every which way. It can outline a variety DOM elements, for example drawing an outline around all block elements on a page. This can be nice for lining things up. The Display Line Guides option is also a good way to verify alignment, not to mention Display Ruler, or Display Page Magnifier for fine detail.

This extension has dozens of reports, each one geared toward diagnosing a particular kind of problem. Some of them are external, such as sending your URL to a validation service. Some are internal, such as showing a dump of all of the page’s active cookies. Unfortunately, many of these option open up in a new tab, taking the focus off of the page that you are trying to work with. It can be hard to tell which options do this. There is an option for having the tabs open in the background, but this is not the default.

The View Style Information option is particularly nice. You can point to any element on the page and the extension will display the element tree along with ids and classes. If you click on an element, it will display only the style rules that apply to that element. This beats the drill down approach in the DOM inspector, although it doesn’t show box model information or computed style information this way.

The web developer extension can change things as well as inspect them. You can go into a mode where you can edit your CSS or HTML in real time for immediate feedback. This is great for testing out small changes. For the PHP developer, the extension has a variety of options for manipulating cookies and forms. There are also a variety of ways to enable or disable certain elements on the page.

Install Web Developer Extension

Tamper Data

Tamper Data is live HTTP headers on steroids. Tamper data records the HTTP request headers and HTTP response headers for each request that the browser makes. Not only that, It allows you to “tamper” with the requests before they are sent out, editing headers or form values behind the scenes. Tamper data can present a graph of the requests involved with loading a web page. Tamper data is great for security testing and page loading performance tuning.

Install Tamper Data Extension


FireBug, ah what can I say but wow! According to their web site:

Firebug integrates with Firefox to put a wealth of development tools at your fingertips while you browse. You can edit, debug, and monitor CSS, HTML, and JavaScript live in any web page.

Firebug has considerable overlap with the extensions I’ve mentioned so far. It doesn’t necessarily duplicate all of their functions, but the ones it does, it does really well. It goes way beyond in some cases. There is really no point in me talking about Firebug’s features, because the website already does such a good job at it. They’ve impressed this jaded old developer.

If you haven’t tried this one yet, seriously, go get it right now.

Install FireBug Extension


ColorZilla adds a small eyedropper tool to the bottom left corner of the window. You can use this tool to inspect colors on the current web site. Double clicking it brings up a color picker and some other color related tools.

Install ColorZilla Extension

Multiple Profiles

Ok, I lied. There are a few situations where I use FireFox for casual browsing. Some web sites just won’t work with Safari, or don’t work well with Safari. For these, I pull up Firefox. I don’t want my casual browsing tools to clutter up my web development experience and I don’t want my web development tools to clutter up my casual browsing experience. The solution is to create multiple profiles in FireFox. I have one for web development and another for normal surfing. I have safari ask me to select a profile on start up. This extra step would be annoying for a primary browser, but it doesn’t seem too bad for a secondary browser.

Setting up my Mac series

Firefox is not mac specific, but this is actually the latest installment in my setting up my Mac series.

  1. How to Transfer Mac OS X Application Data between Computers
  2. Free Software for Mac OS X
  3. FireFox Extensions for Web Developers
  4. UPCOMING: Configuring Boot Camp and Parallels

Yahoo YUI wins JavaScript Library Wars

There is huge web development news from Yahoo today. Yahoo is offering free hosting for YUI components, both JavaScript and CSS. I’ve been favoring the YUI, but this is a great boon. One big drawback to AJAX is Page loading performance. I’m betting that the Yahoo infrastructure can serve these files way faster than most people’s servers, they are much more likely to be cached, and by being located on a different domain, they circumvent domain connection limits in the browser. By offering hosting, Yahoo turns YUI into a true shared library for the internet.

Delphi for PHP

I have to comment on this week’s annoucement of Delphi for PHP. I was a Delphi programmer for about 5 years before taking up PHP about 6 years ago. What a convergence.

I have a great fondness and respect for the old Object Pascal based Delphi. Delphi’s VCL has been influential, inspiring the GUI components in Java. And, of course Ander Heijlsberg went on to put a huge stamp on C# and .NET that would be familiar to any Delphi programmers.

I’ve always admired this approach of extending the language syntax to make common things easy and for the integration between the language and the tools. In Delphi, this was evidenced by the excellent properties support. Six years later, this is the feature I miss the most in PHP. This language extension approach has seen its culmination in C# and LINQ. It almost pains me to say it, but the cutting edge of commercial language design is at Microsoft now.

On the other hand, I’ve never had that much respect for Borland as a company. We were big enough to have Borland representative’s come to our office and try sell us their products. They were terrible at the mechanics of selling into big companies. I was in their beta programs. I went to their conferences. I’ve never had any sense that they know what they are doing business wise. Inprise? What were they thinking? Now here they are, just having gotten their asses kicked by eclipse in the Java IDE space and what are they working on? They release an IDE for PHP, just as Zend is embracing Eclipse in the PHP space. Brilliant!

I don’t quite know what Delphi means now. To me, its always been and IDE plus Object Pascal. What is it now? I also don’t quite know what Borland has become. Is it CodeGear now? I guess that the Delphi for PHP IDE comes from Quadram and their now discontinued QStudio product. And the VCL is their WCL (no linkage found). Anytime I’ve been touched by the corporate entity that was Borland, confusion ensued. I’m confused now.

It appears that the PHP version of the VCL will be released on open source. There is nothing at the sourceforge project, yet, but I’ll be interested to see what it looks like, if only for old times sake.

The Delphi tool approach was to serialize an object based representation of an application, then offer tools to create that serialized representation, and to load that representation at run time. In Delphi, that serialization was done into the form files (.DFM). I’ll be interested to see how Delphi for PHP does it. Perhaps, this is an area where the Eclipse PHP Development Tool can learn. I know that I definitely had Delphi in mind when I was writing my column on Object Serialization for this month’s php | Architect.

Meanwhile, if you want to see the Delphi influence in PHP with code that you can download today, take a look at the Prado framework, which I imagine to be like the VCL for PHP, but without the supporting IDE.

This is a space I’ll definitely be watching.

Managing Open Source Projects

I ran across How Open Source Projects Survive Poisonous People (video) and Producing Open Source Software (book). Anyone know of any other interesting open source project management resources?

Free Software for Mac OS X

The software that comes with OS X is very capable. The mundane applications that come with OS X, such as the Finder, Preview, and Disk Utility can do some surprising things. I’ve been using Macs for 20 years and I’m still learning new tricks for these programs.

But, the installed apps can’t do everything. As part of setting up my new Mac, I’ve had to install a small set of very useful, dare I say essential software. This is the list of everything that I installed on my Mac for one reason or another.

Everything on this list is Free as in beer. These are only things that are perpetually useful. If it is a limited version it is at least useful. Nothing expires. These are not the things that I think you should look at, or any kind of best of or exhaustive list. These are just the things that I actually use. (With a noted exception or two.)

This is Part II in my “Setting up my Mac” series. See Part I: How to Transfer Mac OS X Application Data between Computers.

Video Codecs For Quicktime

QuickTime is the native video format for the Mac. However, there are many different video file formats floating around on the web. Fortunately, QuickTime is modular and there are many free components available for playing these formats. I think this list covers the most popular.

Plays Windows Media Player files in QuickTime (Except those that have DRM).
Plays some .avi files in QuickTime.
Adds support for AAC audio.
Maybe this is redundant with Perian?

Video Players

In a rare show lameness, the built-in QuickTime player cannot play QuickTime Video Full Screen unless you pay to upgrade to QuickTime Pro. This has always bugged me. Can you say nickel & dime? I’d rather they roll the price into the cost of OS X or my computer, if necessary. Fortunately, this is a restriction on the player, and not on the QuickTime Framework. Third players can play full screen, although perhaps at the expense of some QT player niceties like the remote control. I’ve installed these additional players.

This tiny player plays quicktime full screen and not much else.
Nice Player
This is a more capable QuickTime player. I’ve had the video and audio tracks get of of sync, tho. Still evaluating.
Real Player
Necessary to play the anything in Real format, or to play Real streams in your browser.
A very capable player that does not rely on QuickTime or its plugins.
Flash Player
OS X comes with flash player installed, but you might want to upgrade. View your current flash version number.

I don’t do much with video, but these players and the prior QuickTime plugins have handled everything that I’ve ever wanted to do.


At $400 and without the typical Windows PC OEM discounts or the student discounts available to some, Microsoft Office represents a significant investment, especially if you just need occasional word processing, or you just want to view Microsoft Office documents that people send you. NeoOffice is a mac native version of OpenOffice. I have Microsoft Office, but NeoOffice is still useful to open the OpenOffice formatted documents that people from the open source community sometimes send me.

BBEdit has been around in the Mac community for a long time. TextWrangler is its free but commercial quality and very capable little brother.


The finder does a pretty good job compressing and uncompressing zip files. (You knew it did that, right?) However, there are about a zillion different compression formats that might arrive at your doorstep via the magic of the internet.

Stuffit Expander
Stuffit is the time-honored way to uncompress stuff. However, this long standing Macintosh institution has fallen into disrepute lately. To download Stuffit, you have to surrender your email address, and they do use it. I’ve installed it anyway, thanks to a throw away email address.
The Unarchiver
I’ve switched to the free and open source Unarchiver as my primary de-compressor. So far, so good.

Chat Programs

iChat is nice, but there are more chat protocols out there than AOL and Jabber. I’ve also managed to collect a few different online profiles. You can reach me at procatajeff on AOL.

Allows you to connect to multiple chat protocols and multiple accounts at the same time. It doesn’t have all of the features of the native chat programs, but it is worth it to just have to run one program.
IRC client for Mac OS X.
Yahoo Messenger
The Yahoo Messenger for the Mac has many fewer features than its Windows cousin, but its not as loaded with advertisements as the windows version, either.
MSN Messenger
I don’t use MSN at all, but if you did…
AOL Messenger
Again, there are some AOL features you can’t get through iChat. I almost always use Adium instead of AOL Messenger or even iChat.

Web Browsers

I use Safari for 99% of my web browsing. However, I install the major alternative browsers, too.

There are still some sites that do not work with Safari. For those, Firefox can usually get you in.
Same rendering engine as FireFox, but a more “mac-like” user interface.
I only use it to check web pages to see if they’re rendering correctly.
NetNewsWire Lite
Great feed reader. I’m a registered user of the full versions. This was probably the best valued software purchase I’ve ever made. I started with the Lite version, although I’ve forgotten what the differences are by now.

File Transfer

The finder will do FTP, but CyberDuck does more. I use it for the synchronization capability. This program has always been a little buggy and never quite reached the level of stability that I would like, but I use it anyway.
I don’t do much with Torrents, but when you run across them, use this.
An alternative BitTorrent client. I haven’t used this one yet, but I’m gonna give it a try next time I want to download a torrent.


There are tons of haxies, so called maintenance utilities, and customizers for OS X. I don’t use any of them. Bad memories from the System 6 extension days, I guess. Here are a couple utilities I do use.

Menu Meters
Monitor CPU, Memory and Disk usage as well as network activity in the menu bar. Very nice. Running this on my old machine was a major contributor to my decision to purchase a new one.
Disk Inventory X
A graphical breakdown of how your disk is being used. A kinda shaky 1.0, but be prepared to get an education after you run it and see where your disk space is going. Keep it around for when you need to find some free space.

Associating Files with Applications

Most of the applications on this list overlap in terms of the file formats that they can open. Sometimes, though, the wrong program will open when you double click on a file or download something. RCDefault allows you to edit the associates between file types and data types and with the applications that can use them. You can do this in the Finder to a certain extent, but RCDefault gives you more options and puts it all in one place.

Anything Else

I put this list together to keep track of what I need to install after I rebuild my machine. Take a look at part I of my setting up a Mac series.

If there is something you think I should take a look at let me know in the comments. (But keep it in the free or perpetually useful spirit of this post.)

Best of Luck

How to Transfer Mac OS X Application Data between Computers

Mac OS XIts been a long time coming, but I finally got a new Mac. I’ve personally owned a Mac of one sort or another since 1987, but I didn’t start using a Mac full time for work until around 2000. I’ve been going through the process of setting up the new machine.

I decided to start from scratch on the new machine, rather than use the migration assistant. The previous machine had been the subject of countless experiments and upgrades. I wanted to start from a clean slate. I chose to reinstall all software and just transfer data files form the old machine to the new one.

I am recording the process on this blog to remind myself for next time and also hoping it might help someone else trying to do the same thing. I’ll probably do this again when I migrate from Tiger (OS X 10.4) to Leopard (OS X 10.5).

This guide may favor the Unix geek, but I’ll try to keep it non-geeky. If you’re uncomfortable with anything here, use the Apple migration assistant instead. These instructions represent what I actually did to move between machines. Your situation may be different. Use these instructions at your own risk. When in doubt, use the Apple supplied migration assistant. Always make backups of your old data.

I’ll assume that you can get your old mac and your new Mac talking on a network (You don’t even need a cross over cable), and that you can figure out how to enable file sharing on your old system so you can transfer your files over.

Do a Clean Install

InstallerMy Mac came ready to go. All I had to do is turn it on and answer a few networking and registration questions and I was, um productive, making comic books for the kids and playing with the iSight. However, I decided to wipe the hard drive and do a complete re-install.

There were a few reasons for this. One was to be able to play with impunity for a period of time, knowing that I could wash away my mistakes and experiments. Another was to be able to do a custom install. This laptop drive is fairly small for what I want to do with it. During my custom install, I omitted a bunch of printer drivers, trial apps and language translations to save a gig or two. Its never going to be easier to do this than now. Third, I just wanted to make sure that I could rebuild the system from scratch, while I still had a warrenty and tech support available.

I’m extremely conservative with my work system. I rely on it and I want it to work when I need it. I feel its better to allocate a fixed amount of time now to learn how to rebuild, rather than have spend an indeterminate amount of time with it in the event of some mishap.

When I re-installed, the first user I created on the new machine had the same short name as the primary user on the old machine. I haven’t tested these techniques for moving user accounts with different names, but overall I think they should work.

Moving your keychain

KeychainThe first thing to do after to get your machine is to copy your keychain from your old computer to the new one. The keychain contains all of your passwords. Its also one of the few centralized databases on the Mac that you can’t just regenerate. Its best to migrate it before you launch any programs on the new machine that might require authentication.

You can open the Keychain Access application to view and manage your passwords. Each User’s keychain is stored in their ~/Library/Keychains directory. (The ~ means this directory is a subdirectory of your user home directory.)

I just copied the login.keychain file from the old system and replaced the one on the new system. I would recommend logging out and logging back in after replacing the old file.

You may also want to migrate your system level keychain. This is located in /Library/Keychains/system.keychain. Notice that this is not in your user directory, but is a subdirectory of your main hard drive. I didn’t bother to migrate this one, but rather to enter the handful of passwords that it contained. If you do overwrite this file, make sure you look at its ownership & permissions via get info in the finder first, and restore the permissions after you are done.

You may want to do Keychain First Aid on the File menu of the Keychain Access application after this process, just to make sure everything is ok.

Move your Cookies

CookieI could never figure out why some people are so paranoid about cookies. Here is your chance to get rid of them all. Well, I don’t wear a tin foil hat; I want to keep my cookies. My cookies only take up 1.5MB after years of browsing on my mac. Having a smaller cookie file probably won’t make my browsing experience any better, and there are so many it really isn’t worth trying to sift through them. For me, the best option is to migrate the whole cookie file.

Safari and WebKit cookies are stored in ~/Library/Cookies/cookies.plist. Copying this file from the old machine to the same location on the new machine will transfer all of your cookies.

I don’t do anything cookie-worthy in any browser except Safari, so I didn’t bother migrating any of the cookies in the alternative browsers I have installed. If you have FireFox cookies that you want to preserve, I believe they are located in FireFox’s Application Support folder and will migrate just fine using the generic Migrating a Mac OS X Application instructions below.

Migrating User Data and Documents

HomeMoving your User data is easy. Just open your home folder on the old machine and copy all of the subdirectories you see to the new machine except for the ~/Library direcctory. Actually, you could copy the ~/Library directory wholesale, too, but the purpose of this post is to start with a clean slate of application settings and support files and most of these live in the Library directory.

The typical folders you will copy over are ~/Desktop, ~/Documents, ~/Movies, ~/Music and ~/Pictures. There shouldn’t be anything of consequence in these directories on the new machine. You can probably just replace them. You may also want to copy your ~/Public and ~/Sites directories if you have anything in them.

You want to copy over your data files before you launch any applications that might use that data, for example iPhoto or iTunes.

Migrating a Mac OS X Application

ApplicationAlmost all native OS X applications use the same file organization for their files. The settings for almost all applications can be transferred by looking in two places.

The ~/Library/Application Support directory contains folders with the same name as each application. copy the folders from the old machine to same location on the new machine for the applications whose support data you want to keep.

The ~/Library/Preferences directory contains many individual files (and sometimes a folder or two). The files have a java style naming convention. For example, Safari’s preference file is named Just copy over the preference files for each application that you want to retain the preferences for. Watch out, though, some applications have more than one preference file, such as iTunes.

As part of my clean slate initiative, I only migrated 5 or 6 preference files from my most used and most configured applications, such as iTunes, Safari and Adium.

Mac OS X applications are fairly liberal with these files. In order to conform to the Mac programming guidelines, any OS X application should be able to regenerate a fresh preference file, or a fresh application support file if their file comes up missing. Deleting these files can be a good way to “reset” an application to its standard defaults.

Moving Safari

SafariSafari takes a bit of special consideration to migrate. Safari stores bookmarks, browser history, form auto-fill values, and other data in a special folder located at ~/Library/Safari. Copying this folder to the same location the new system will preserve this information.

Safari Doesn’t have a folder under ~/Library/Application Support, but don’t forget to copy the safari preferences file from ~/Library/Preferences/

Moving Mail

MailApple’s also requires some special consideration. Mail stores its mail database in ~/Library/Mail. Copying this directory along with the preferences file at ~/Library/Preferences/ will transfer your mail.

If you are transferring from a 10.3 system, there was a major change in file format for the mail database between 10.3 and 10.4. Mail will automatically upgrade your mail files the first time you run it, however, it will not delete the old files. This apple tech note describes how to delete the unused files from your Mail directory.

Moving iTunes

iTunesMoving iTunes depends on your iTunes Music Folder Location and if you have iTunes Copy files to iTunes Music folder when adding to library. These settings are in the Advanced panel of iTunes preferences. Fortunately there are already some pretty good guides on how to do this. There is a Moving iTunes Music Folder tech note from Apple. This is augmented by instructions from HiFi Blog.

Following these instructions, I was able to transfer my iTunes Music without any problems.

Double Check your Library folder

LibrarySome applications store data in subdirectories of the Library folder other than Preferences and Application Support. You may wish to peruse the subfolders of the Library directory in your user directory for these stray bits of data. The Apple file organization document can help to tell you what they are and help you to decide if you should copy them over. I didn’t copy anything over, but for a couple exceptions which I’ve already enumerated above.

Don’t copy the ~/Library/Cache directory. This will just be regenerated on the new machine.

Additionally, there is a system wide /Library folder on your Hard drive. You may want to scan this folder for system wide settings that you want to transfer over. Again as part of my clean install, I did not transfer anything from this directory, although I recognized a few bits of software that I needed to install on the new machine.

Unix Stuff

If you’ve accessed the unix side of Mac OS X, you may have a variety of things to move or at least to re-install. These things are beyond the scope of this blog post, but you might want to look for custom settings in /etc or custom installed software in /usr/local or data files in /var. I’ll have a sequel to this post which covers these issues in more detail.

Repairing Permissions

Copying files between systems could end up with some file permissions and ownership problems. I try to keep my files inside my user home directory. So far, I haven’t had any problems. Your milage may vary. It’s probably a good idea to run the Verify Disk Permissions or the Repair Disk Permissions commands in the Disk Utility Application.

Rebuilding a System or Restoring from Backup

You can also use this guide to rebuild a Mac OS X installation, not just to copy from one system to another. The OS X installer has an “Archive and Install” option. If you have enough disk space, you can install a fresh copy of OS X and start from scratch. The installer will copy your old files into an archive directory. Then, you can copy your applications and data from the Archive folder to their proper places.

May you never need to use this guide to restore from a backup, but the same instructions apply. You do back up, right?

More Later

This is the first in a series about setting up a new mac. I’ll have the next installment ready in a couple of days.

I’ve certainly mis-explained some things here. I’ve probably gotten a few things wrong and have definitely omitted important details. Proceed at your own risk. Please share your experiences moving applications in the comments. Best of Luck.

OOP is Mature, not Dead

I ran across an interesting series of blog posts by Karsten Wagner claiming that OOP is dead (part 2 and part 3). The premise behind these posts is that OOP has failed to deliver and that it is on the decline in favor of more functional or meta programming techniques. Maybe its true that the discussion of the merits of OOP is on the decline. At least if you read reddit.

However, OOP is not on the decline. Quite simply, it has become mature. The discussion may be on the decline because almost every language that anyone actually uses implements a core set of OOP features. OOP has won its arguments. Good luck taking a language mainstream without it.

Oh, yeah, there are some OOP features that are still controversial or unusual. There is the single versus multiple inheritance debate, or perhaps Ruby’s open classes. But, I think these things have a way of cross-pollinating across the popular languages when they make sense.

A good example of this cross-pollination is happening now with properties, accessor methods and the uniform access principle. Language support for declared accessor method is slowly creeping across all of the major languages. Not that Objective C is all that popular, but Objective C 2.0 adds support for ’em. Even stodgy old Java is considering language level property support.

Sadly, PHP does not yet have language support for declared properties with accessor methods. What are __get and __set? They’re property missing handlers, not accessor methods. You can simulate accessor methods with them, but that is a poor solution for most applications. There is no way to support differing visibility, for example protected setters and public getters. Property not found handlers are prohibitively verbose to write, have a poor performance profile, have no capability for reflection, cause interoperability problems, and have inheritance edge case gotchas (not present in the java beans model, for example). My hope is to see good language support for properties in PHP 6.

Closures may not be object oriented, but they seem to be undergoing that same language cross-pollination. Thats seems to be a pretty good sign that they are useful. It doesn’t have to be closures OR objects, it can be closures AND objects. We can use each when they make sense.

Closures are another wish list item for PHP 6. PHP is almost wired for them with its callback psuedo type. Everywhere you can use a callback in PHP, you could use a closure. I’d like to see the callback Pinocchio become a real boy like integer or boolean. The cool thing is that with PHP’s weak typing the string and array forms of the callback pseudo-type can automatically be converted to a native closure type when needed.

As I said, the core OOP features that most programmers use are in all the mainstream languages. The interesting part is how they handle the OOP edge cases. This is the space where the framework developers live. As I wrote in culture of objects, PHP has some problems here. In some ways I think Ruby’s support for edge cases is exactly what allow a framework such as rails to be built in it, although, I’m not familiar enough with Ruby to say for certain.

I think addressing some of these issues in PHP 6 will make it a Ruby killer for web applications. It isn’t necessary to be perfect here, just to be good enough and allow the larger community, distribution, and stability to take over. Unfortunately, there is a long lead time here. If PHP 6 were to add support for declared accessor methods, closures, and late static binding — my top three framework enablers — it would still be at least 2-3 years before PHP 6 was sufficiently deployed and the frameworks could adapt to the new features.

In the meantime, while the PHP culture may have problems, the Ruby culture may not be without its own problems. The influx of lisp and smalltalk programmers, two languages that did not go “mainstream” may prevent Ruby from going mainstream. Take a look at The impending ruby fracture. Isn’t this one of the things that happened to SmallTalk and Lisp? I’m still not convinced that Ruby hasn’t inherited many of the same maintenance problems from its Perl heritage. Just like english, huh? Only time will reveal Ruby’s maintenance characteristics. I give it about 2 to 3 years for today’s Rails systems to hit full legacy mode. How long do you think it will take for top notch unicode support in Ruby?

Obviously PHP 6 is all about teh unicode. Including an opcode cache is going to be an important performance and adoption driver. However, I’d like to see more progress on framework enablers. I really want to see these in the next major PHP deployment cycle and not in the PHP 7 deployment cycle. Are there framework enablers other than closures, declared property accessors and late static binding that I have overlooked?

I have high hopes for PHP 6 as a mature and mainstream language.

php | architect back issue bargains

I’ve been writing the Test Pattern column in php | architect for a few months now. I’ve been enjoying it because it lets me explore topics in more depth than I could here on my blog. Although, its more challenging and writing is not easy for me.

So far I think my best two columns have been Organizing For Change and Dependency Injection. These are my favorites at least.

Why do I bring this up? Because today, as part of their 7 day promo fest, back issues are 50% off. That means you could pick up the back issues with my best columns for a measly $1.99 a piece in PDF form. (He blogs shamelessly.) I put a lot of effort into those columns, I’m proud of them and I want you to read them. :)

Looking forward to 2007

Well, I’m finally back in town after the holidays. Let me tell you, I’m glad to be home. Between multiple holidays and taking my grandma to her cancer treatments in Ann Arbor, I was gone far too much of last month.

My Grandma is doing well. They used an experimental new procedure called radio frequency ablation to remove the meta-static colon cancer tumors from her lungs. This procedure is amazing compared to the standard treatment. The doctors at the University of Michigan were impressive. We’ll know the results in a couple months when her lungs look a little less like scrambled eggs. We’re hopeful.

I’m not much for retrospectives. Looking forward into 2007, I have a few major goals. I joined a gym today. I’m going to get a new laptop and refresh my development environment next week after MacWorld. I want to get at least a beta release of WACT out by May. I have to prepare for php|tek. I need to find a new place to live by this fall. (Ann Arbor?) I want to move by the end of the year.

I loved all my christmas and birthday gifts this year. (My birthday is December 28th.) This year I pointed everyone to my wishlist and I ended up with a ton of good books to read. Jason Gillmore from Apress also sent me some web development books. My to-read stack for 2007 includes:

  • The Promise of Sleep – A survey of the subject of sleep for laymen, written by a top sleep researcher. I’m almost done with this one. This book has a bunch of sleep deprivation horror stories and a good survey of what is known about sleep, which is not much. Its incredible that we know so little about something we spend so much time doing. Its also amazing how many people have easily treatable sleep disorders that don’t even know it. Do you snore?
  • Don’t make me Think – Looks like a nice overview book on web usability.
  • Domain Driven Design – Recommended by Jason and Marcus. How did I get this far without reading this book?
  • Da Vinci Code – Wasn’t on my wishlist, but I’ll read it anyway. I read so little fiction these days. Where is a beach when you need one?
  • Getting Things Done – I’m almost through this one. It is a testimony to the power of the ideas that this book expresses that so many people recommend it, despite its being so incredibly dull. Useful? Yes. Inspiring? No. But, then I’ve read enough of these self help / personal productivity type books for a lifetime. Anyone want to buy a Franklin Planner? I used mine until I got a cell phone.
  • Practical Subversion – I’m really liking subversion. If you haven’t tried it, do so. I’m hoping to combine this with Greg Beaver’s book, The PEAR installer manifesto — the book on my wishlist I most wanted that I didn’t get, to create a new deployment process.
  • Pro CSS Techniques – A CSS book that tackles maintainability? I’m really looking forward to this one.
  • Pro MySQL – The last MySQL book I read was a couple years ago, yet I use it almost every day. I’m due for a refresh. This one looks good.
  • Pro PHP Security – Never hurts to brush up. This one looks like it has alot on encryption, SSL and SSH; not strong areas for me.
  • Pattern-Oriented Software ARchitecture Volume 2 – The first volume, A system of patterns, is one of my “always within reach when developing” books. Nice to add to the set.

Thanks for the books, guys. I’ll have in-depth reviews of some of these here in the future.

Happy New Year.

PDO versus MDB2

I was just putting together a small test program and I thought I would try using PDO. I really haven’t done anything serious with PDO, just try it a couple times. After recompiling PHP to include the mysql driver for PDO, I coded up the first version of my test program:

$db = new PDO('mysql:host=localhost;dbname=example', 'example', 'secret');
$tags = $db->prepare("
        bookmark_tags, tags 
        bookmark_tags.bookmark_id = ? AND = bookmark_tags.tag_id 
    ORDER BY");
$bookmarks = $db->prepare("SELECT * FROM bookmarks ORDER BY Title");
while ($bookmark = $bookmarks->fetchObject()) {
    echo "
  • {$bookmark->title} "; $tags->execute(array($bookmark->id)); while ($tag = $tags->fetchObject()) { echo $tag->name, " "; } ? echo "
  • n"
    ; } echo "n"; ?

    Unfortunately, this didn’t work and it took me a few minutes to figure out why. Actually, I still don’t know exactly why it doesn’t work, but I did find a way to make it work: by using two separate connections, one for each prepared statement. It doesn’t seem like you can have two active statements at the same time on the same connection. I find this hard to believe, so I’m probably doing something wrong.

    The other thing I didn’t care for with this PDO code is the non-standard method of iteration with the while loop. Well, the while loop is perfectly standard if you are coming from the PHP 4 style functional DB APIs. However, it doesn’t seem to fit in with the PHP 5 Iterator and foreach integration. PDO doesn’t seem to provide a distinct result set object, or a method of iterating over a result set using the standard PHP Iterator interface.

    Now, I can understand why this may be the case. The PDO interface seems to be designed to bind to php variables. Thats not going to work with the Iterator interface. However, I am not using that mode and don’t want to use that mode. It would be nice to be able to acquire an iterator for an example of use such as the one above without having to use fetchAll and ArrayIterator.

    Using the Iterator style makes it easier for me to decouple my code from the data source and makes it easier to write test cases for that code.

    I was a little bit disappointed. So I moved on to MDB2 with the same code …

    require_once 'MDB2.php';
    require_once 'MDB2/iterator.php';
    $db = MDB2::connect("mysql://example:secret@localhost/example");
    $tagLookup = $db->prepare("
            bookmark_tags, tags 
            bookmark_tags.bookmark_id = ? AND 
   = bookmark_tags.tag_id 
        ORDER BY 
    $bookmarkFinder = $db->prepare("SELECT * FROM bookmarks ORDER BY Title");
    $bookmarks = new MDB2_Iterator($bookmarkFinder->execute(), MDB2_FETCHMODE_OBJECT);
    echo "
      n"; foreach($bookmarks as $bookmark) { echo "
    • {$bookmark->title} "; ? $tags = new MDB2_Iterator($tagLookup->execute($bookmark->id), MDB2_FETCHMODE_OBJECT); foreach($tags as $tag) { echo $tag->name, " "; } ? echo "
    • n"
      ; } echo "
    ; ?

    I have about the same level of non-experience with MDB2 as with PDO, but my code worked perfectly on the first try and allowed me to use Iterator, which will be helpful to the next stage of my test.

    I was a bit impressed. I’ll see how well it handles the next stage of what I want to do.

    (And yes, I know there is no error checking in the above code.)

    Why is PHP Code Considered Hard to Maintain?

    Tobias Schlitt describes Tim Bray’s talk at the International PHP Conference. (PDF slides) Tim compares PHP, Java, and Rails along several dimensions. One of those dimensions is maintainability. Tim ranks PHP as least maintainable, Rails in the middle, and Java as most maintainable.

    This is not a surprising ranking. After all, Tim is from Sun, and the maintainability complaint is common in Anti-PHP rants. I’m not trying to suggest that Tim is anti-PHP, far from it, it seems. I’m just using his ranking as a spring board to ask questions.

    Chances are that your average Java jockey or C scientist’s first exposure to PHP is to download one of the popular PHP applications. These are usually the product of some open source mega-project with developers of varying degrees of skill. Our engineer-by-day spends a few evenings with the program. The code is not technically outstanding.

    How can something like this be so popular he asks? Yet, the software is successful by definition. Nobody downloads unsuccessful open source applications. The technocrat, heavily invested in his own technical prowess, faced with successful yet technically inferior code experiences cognitive dissonance. The only thing to do is to belittle the successful, but surely offensive code. “I could write better code than this,” he says, or “this code sucks,” or “this is unmaintainable.”

    It is easy to dismiss these gripes inside the PHP community. After all, those of us using PHP professionally can write maintainable code in PHP. Ask any programmer and they will tell you, “My code is maintainable.” Who writes all of this unmaintainable code, anyway?

    Lets take this gripe at face value for a moment. Why is PHP code considered hard to maintain? Is it the language that produces code that is hard to maintain, or is it that the popular ambassadors of the language happen to be programs that are hard to maintain?

    Another common PHP sucks complaint is that PHP doesn’t scale. When you are talking about traffic, there are all sorts of counter examples for this. Personally, I’m dying to learn the story behind those .php extensions on YouTube. But, this post is not about requests per second.

    Another kind of scalability is team size. I think that when some people complain that PHP doesn’t scale, what they mean is that PHP doesn’t scale to large development teams or large projects. Now we are back to the maintainability issue.

    What is it about PHP that makes people think that it is not suitable for larger development teams?

    The criticisms of maintainability and scalability generally come from outside the PHP community. But, there is a common complaint from within the PHP community.

    It is hard to find a PHP wish list that doesn’t include namespaces. It comes up again and again.

    Sometimes users request a feature without explicitly making their true desires and intentions known. They say “I want feature X,” but what they really mean is “solve problem Y.” Good programmers can hear the request for X, but make the jump to solving Y.

    When people ask for the namespace feature, the problem they want to solve is integrating code from multiple parties. I wonder if the frequency of this request is a signal of a problem in this department? Perhaps one that requires more than just namespaces to solve? Is the namespace request a proxy for a larger problem?

    What is it about PHP that makes it hard to integrate code written by multiple parties, whether they be different developers or different organizations?

    [ANN] Consodoc publishing server

    Hi all, I’ve just released a new tool for XML publishing, and now I’m looking for early adopters. You ought to be one of them if you are agree with the …

    Announce: TeXML 2.0

    Hello, I’m happy to announce TeXML 2.0. * Do you generate TeX code? Then download and try TeXML. * Do you convert XML to TeX? Then you ought to use TeXML.