My friend recently asked me why I don’t use explicit foreign key references when writing schemas for relational databases.
To be clear, I do not avoid the concept of foreign keys, just the SQL
REFERENCES key word that enforces it. I think the “informal reference” approach, where some columns are clearly named to represent the foreign key of another table, is the simpler way to implement them.
Foreign Key Use Cases
I think there are two primary advantages to using explicit foreign key references.
- They make database access inherently safer by preventing certain types of ill-formed data from being stored in foreign key columns.
- They allow the database to offer certain relationship-aware features, specifically cascade operations.
Both of these advantages are irrefutable facts. You will indeed have a “safer, more full-featured database environment” with explicit foreign keys. But in my experience building applications on top of relational databases, the magnitude of each advantage is insignificant.
Use Case 1: Improved Safety
Foreign keys enforce that only well-formed data is committed to the column.
Let’s consider what it means for an application to break that constraint. We’re talking about trying to store the wrong type of data or invalid data in a column. Any joins (actually any features at all) involving the column will be utterly and completely broken.
This is a basic, critical error that even the most bare bones smoke test would detect immediately. I’m not even talking about automated testing - executing the code in any meaningful way will obviously not work as expected.
Foreign keys are but one way an application can enforce data safety, and I believe that they’re one of the weakest. The additional safety that a foreign key brings is just the ability to discover a class of blatantly obvious errors slightly earlier than a smoke test. Realistically, they never detect bugs that tests wouldn’t.
It turns out this advantage is naught but a smidgen of safety.
Use Case 2: Cascade Operations
There are actually two cascade features enabled due to foreign key references, but
ON UPDATE CASCADE isn’t particularly useful.
The other one is quite handy:
ON DELETE CASCADE offers to drop recursively any related rows. But when it comes to databases, making the destruction of data “easy” and “automatic” is generally the exact opposite of what I’m looking for.
Deletion can be the most dangerous aspect of an application. A standard business application may even forego hard deletion altogether in favor of permanent soft deletion. That’s how valuable data is.
Even if cascading deletion were available separate from foreign key references, I wouldn’t use it because it seeks to make deletion far too implicit.
In a real world application, foreign keys fail to deliver on both, the promise of improved data safety, as well as the ergonomics of cascade operations.
There are some reasons to skip out on foreign keys on purpose, too. For example, they can make migrations that require setting up orphaned rows extremely difficult to perform. Separately, the database no longer incurs the performance penalty of validating reference constraints, although I doubt this has any impact for the majority of projects.
I don’t find any of those reasons particularly convincing, though. At the end of the day, my reasoning for skipping on explicit foreign keys is just this: in the absence of a solid reason to use an abstraction, I will usually choose not to use it.
Is there no place for foreign keys?
Not all systems have “basic smoke tests”. Not all applications are maintained well enough to put the right data in the right column, even in the simple case. If you’re working on such a legacy or rotting system, I can see how foreign keys could provide some basic guarantees regarding data integrity. I hope they’re paying you enough to wade through the muck.