Minor Mismatch
When using a database in an application, there are many ways the idioms from the database ecosystem can disagree with the idioms in the programming language used to write the application. The object-relational impedance mismatch is one such example of this, but that is mostly about differences between relational objects (i.e. rows in tables) and programming language type systems.
Sometimes, the differences are just cosmetic. But this can still cause issues with developer ergonomics, opportunities for subtle bugs or even lint violations for codebases on the stricter side. Here I want to highlight the specific issue when PostgreSQL column naming conventions disagree with field naming conventions in JavaScript / TypeScript types.
I Made a Mess
Consider the following TypeScript type for tracking issues in a ticketing system:
import * as typeorm from 'typeorm';
@typeorm.Entity()
export class Ticket {
@typeorm.PrimaryColumn({ type: 'uuid' })
id!: number;
@typeorm.Column()
owner!: string;
@typeorm.Column()
description!: string;
@typeorm.CreateDateColumn({ type: 'timestamp with time zone' })
createdAt!: Date;
@typeorm.UpdateDateColumn({ type: 'timestamp with time zone' })
updatedAt!: Date;
@typeorm.Column({ type: 'timestamp with time zone', nullable: true })
resolvedAt?: Date;
}
Using TypeORM with the defaults, a developer would end with a table of the form:
tracker=> \d ticket
Table "tracker.ticket"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+---------
id | uuid | | not null |
owner | character varying | | not null |
description | character varying | | not null |
createdAt | timestamp with time zone | | not null | now()
updatedAt | timestamp with time zone | | not null | now()
resolvedAt | timestamp with time zone | | |
Indexes:
"PK_d9a0835407701eb86f874474b7c" PRIMARY KEY, btree (id)
"IDX_4fd0fa28cf982e5252b358caa9" btree (owner, "updatedAt")
Note the idiomatic JavaScript field names (camelCase
) being directly ported
as column names. This causes problems when using them naively in PostgreSQL:
tracker=> SELECT owner, description, createdAt FROM ticket WHERE createdAt = updatedAt;
ERROR: column "createdat" does not exist
LINE 1: SELECT owner, description, createdAt FROM ticket WHERE creat...
^
HINT: Perhaps you meant to reference the column "ticket.createdAt".
A somewhat reasonable response to this may be "but TypeORM writes all my
queries". However, poking around in the database console (psql
here)
should happen all the time, both during development and when evaluating
migrations, feature usage and performing other other one-off tasks. In order
to actually use the console, the columns need to be quoted to preserve the
casing:
tracker=> SELECT owner, description, "createdAt" FROM ticket WHERE "createdAt" = "updatedAt";
owner | description | createdAt
------------------+--------------------------------+-------------------------------
[email protected] | Implement protobuf parsing | 2021-10-23 18:41:59.909022+00
[email protected] | Explicitly require API version | 2021-10-25 19:22:13.919551+00
(2 rows)
This is not great. Due to PostgreSQL handling of identifiers, having
column names be all lowercase is the common convention. In order to
differentiate word boundaries, snake_case
is a natural choice. For example
here we'd use created_at
instead of createdAt
.
How Do I Get out of the Mess?
In TypeORM, a column rename is as simple as
@typeorm.CreateDateColumn({ type: 'timestamp with time zone', name: 'created_at' })
createdAt!: Date;
so we can just commit this and run the auto-generated migration? Right? RIGHT? Maybe. But probably not. The generated migration is harmless enough:
await queryRunner.query(`ALTER TABLE "ticket" RENAME COLUMN "createdAt" TO "created_at"`);
However, there is almost always a need for migrations to be compatible with both the "old code" and the "new code". The most common reason for this is to enable fast rollbacks if the new code contains a defect. However, it's also common for applications to be deployed with rolling updates or with blue green deploys. In either case, the "old code" and the "new code" can both be running at literally the same time. So whether it's rollback safety or deployment strategy, it's easy to construct a strong argument that database migrations can be applied with the currently running version of the code.
How does this come into play here? Renaming to created_at
means that any
query in the currently running version of the code that references createdAt
would just break. At this point, we might just throw up our hands and regret
the fact that our columns have camelCase
names. However,
transactional DDL and PostgreSQL views can save us here!
The First Migration
We can create a view that pretends to be the ticket
table while
maintaining the presented column names. Simultaneously, the underlying table
can rename the columns. (In order to make a view named ticket
, we first
need to rename the table.) Doing any of these changes alone would represent
a broken state for the database, however PostgreSQL allows us to wrap all of
these DDL operations into a transaction!
The aforementioned changes require five statements:
-- Start a transaction
BEGIN;
-- Rename all the COLUMNs using `camelCase`
ALTER TABLE
ticket
RENAME COLUMN
"createdAt" TO created_at;
ALTER TABLE
ticket
RENAME COLUMN
"updatedAt" TO updated_at;
ALTER TABLE
ticket
RENAME COLUMN
"resolvedAt" TO resolved_at;
-- Rename the TABLE so that the VIEW can take its name
ALTER TABLE
ticket
RENAME TO
ticket_actual;
-- Create a VIEW to present the newly renamed column names, but also
-- present the new names
CREATE VIEW
ticket AS
SELECT
id,
"owner",
"description",
created_at AS "createdAt",
updated_at AS "updatedAt",
resolved_at AS "resolvedAt",
created_at,
updated_at,
resolved_at
FROM
ticket_actual;
-- Commit the transaction
COMMIT;
To sanity check, notice that even after changing ticket
to a view, reads
and writes continue to work:
tracker=> \d+ ticket
View "tracker.ticket"
Column | Type | Collation | Nullable | Default | Storage | Description
-------------+-----------------------------+-----------+----------+---------+----------+-------------
id | uuid | | | | plain |
...
View definition:
SELECT ticket_actual.id,
ticket_actual.owner,
...
FROM ticket_actual;
tracker=> INSERT INTO
tracker-> ticket (id, "owner", "description", "createdAt", "updatedAt")
tracker-> VALUES
tracker-> (
tracker(> '9ce77e1f-1390-4a9e-8142-dd5eb257b554',
tracker(> '[email protected]',
tracker(> 'Vendor in 3rd party protobuf schema',
tracker(> '2021-10-24 12:14:15.000173+00',
tracker(> '2021-10-24 06:37:11.130880+00'
tracker(> );
INSERT 0 1
tracker=> SELECT "owner", "description", "createdAt" FROM ticket;
owner | description | createdAt
-------------------+-------------------------------------+-------------------------------
[email protected] | Deprecate GET-form of DELETE | 2021-10-22 13:31:48.467759+00
[email protected] | Implement protobuf parsing | 2021-10-23 18:41:59.909022+00
[email protected] | Compare DB purge strategies | 2021-10-23 12:13:07.682559+00
[email protected] | Explicitly require API version | 2021-10-25 19:22:13.919551+00
[email protected] | Vendor in 3rd party protobuf schema | 2021-10-24 12:14:15.000173+00
(5 rows)
This migration can be coupled with a code change that starts to use the
newly introduced columns in the view: created_at
, updated_at
and
resolved_at
. During the next deployment (when migrations also run), the
"old code" will use the old column names (which are now aliases in the view)
and the "new code" will use the new column names. Due to the way we've
structured the view, these are both compatible.
The VIEW
and TABLE
should be interchangeable to application
code1.
The Second Migration
Once the previous deploy has stabilized (i.e. we know it won't be rolled back)
we know the old column names createdAt
, updatedAt
and resolvedAt
no
longer need to be kept around. As a result, we can remove the view and
restore the table back with its original name:
-- Start a transaction
BEGIN;
-- Remove the VIEW
DROP VIEW ticket;
-- Restore the TABLE to the original name
ALTER TABLE
ticket_actual
RENAME TO
ticket;
-- Commit the transaction
COMMIT;
All things being equal, application queries going through a TABLE
instead
of a VIEW
should be preferred. If a VIEW
can be removed, it
should be as soon as possible. Relying on a VIEW
in a codebase over a longer
time span can lead to inefficient queries caused by a mismatch between
developer assumptions about the exposed columns and the actual columns being
transformed in the view. The more complex the view is, the more likely this is
to be true.
Conclusion
Transactional DDL allows to do things that would otherwise be impossible to do without downtime. Combining this with PostgreSQL helpers for separating storage from presentation, it's possible to make cosmetic renames to tables and columns safely. It's always crucial to ensure migrations can safely interact with the versions of the codebase immediately before and after the migration is intended to run. The above approach shows a generic approach that allows for surfacing "before and after" column views at the same time. Relying on tools like TypeORM can provide a lot of value, but they can also cause problems if automated outputs like migrations aren't inspected closely.
- Unless the code uses
SELECT *
, which may cause breakage due to three new alias columns in theVIEW
. ↩