The Right Tool for the Job (Part IV)

17 May 2011

##Database

This choice was really the first one I actually had to ruminate on. Once I wrote down my highest priority criteria for database engine selection, I was able to come to a clear conclusion. Let me take you through the process.

###SQLAlchemy Support

Because I’m using Pylons, which uses SQLALchemy by default, I wanted to be sure that whatever I use will work with SQLAlchemy. Here were my options with this criteria:

###Open Source

I intend to release this project under an open source license, so the database I choose should be open source (it doesn’t have to be, but I want it to be). Let us look at the licenses for these databases:

DB2: Proprietary IBM product
Drizzle: Fork of MySQL – GPL
Firebird: Non-standard open source license
SQL Server: Microsoft
MySQL: GPL
Oracle: Propriety beast
PostgreSQL: Self titled
SQLite: Public domain
ASE: Proprietary

###Levenshtein Distance

This is the one that makes the difference. I intend to go to great lengths to make sure that there is no unintentional duplication of data. The very heart of this application relies on the intricate relationships modeled in the database. In order to eliminate human error, the ability to do text search using Levenshtein distance as an aid to the user in entering data. There is some work going on to implement this for SQLite, and there are several UDFs available for MySQL. But PostgreSQL has supplied fuzzy string matching.

Based on my criteria, PostgreSQL will be my database of choice. MySQL would be a possibility as well, so long as the Levenshtein distance UDF is installed, but with the wealth of other features PostgreSQL brings to the table, and the looming Oracle monster, PostgreSQL is a clear winner.