Effective Use of SQL Identity Columns (Oracle & Postgres)
The concept of SQL tables with an auto incremental column for its
primary key it’s obviously not new, as it is easily implemented using
sequences. Now, on December of 2003 the SQL:2003 standard
introduced the identity column type, along with the syntax GENERATED
AS IDENTITY
, which is currently supported by virtually all RDBMS
providers. Let’s take a quick look at how we can declare primary key
columns with auto generated identifiers, without relying on
vendor-specific syntax!
Plain Sequences
This is the most straightforward way to produce primary key values. Sequences generate unique integers and can be incremented by an arbitrary number like 1, 20, -1, etc. Consider the following table:
CREATE TABLE student (
id INTEGER,
name VARCHAR(100) NOT NULL,
CONSTRAINT pk_student PRIMARY KEY (id)
);
We could then create a sequence to generate valid data for our id
primary key column:
CREATE SEQUENCE seq_student;
/* Postgres specific syntax (i.e. the nextval function).
Oracle's equivalent would be seq_student.NEXTVAL */
INSERT INTO student (id, name) VALUES (nextval('seq_student'), 'Victoria');
INSERT INTO student (id, name) VALUES (nextval('seq_student'), 'Manuel');
This approach works fine but assumes some sort of nomenclature is in
place—you know which sequence belongs to the student
table due to
its name alone. Another developer might mistakenly use a different
sequence, an ad hoc value, or even drop the sequence entirely (no
errors will be thrown). Without any corresponding documentation there
really is no way to tell at a glance how id
values are produced,
perhaps they’re managed in a different system altogether.
Attached Sequences
Auto incremental primary key columns are so common that Postgres
offers a shorthand. We could redefine the student
table like
so:
CREATE TABLE student (
id SERIAL, --shorthand type to create unique identifier columns
name VARCHAR(100) NOT NULL,
CONSTRAINT pk_student PRIMARY KEY (id)
);
/* No need to explicitly define a sequence.
Later INSERT statements can omit the primary key column: */
INSERT INTO student (name) VALUES ('Victoria');
INSERT INTO student (name) VALUES ('Manuel');
--Deleting the table also removes the associated sequence:
DROP TABLE student;
In contrast, Oracle offers no shorthands and requires the declaration of a sequence beforehand, using the NEXTVAL function as the default value for the column:
CREATE SEQUENCE seq_student;
CREATE TABLE student (
id INTEGER DEFAULT seq_student.NEXTVAL,
name VARCHAR2(100) NOT NULL,
CONSTRAINT pk_student PRIMARY KEY (id)
);
INSERT INTO student (name) VALUES ('Victoria');
INSERT INTO student (name) VALUES ('Manuel');
These last two examples make the table student
depend on a certain
sequence, which can’t be dropped on a whim. The table declaration
itself makes it very clear how the primary key values are generated,
and thus our intentions more evident.
Of course, other RDBMS vendors have their own unique ways of generating unique values (see what I did there?), such as MySQL’s AUTO_INCREMENT attribute, which also works in H2:
CREATE TABLE student (
id INT AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
CONSTRAINT pk_student PRIMARY KEY (id)
);
INSERT INTO student (name) VALUES ('Maria');
It’s important to keep in mind that auto generated clauses do not necessarily create the primary key constraint automatically.
Identity Columns
Identity columns are the ultimate way to declare auto generated
primary key columns, using good ol’ standard SQL (ISO/IEC 9075) and
nothing else. Here’s what the student
table could be rewritten as,
using the identity clause:
CREATE TABLE student (
id INTEGER GENERATED BY DEFAULT AS IDENTITY,
name VARCHAR(100) NOT NULL,
CONSTRAINT pk_student PRIMARY KEY (id)
);
INSERT INTO student (name) VALUES ('Paige');
SELECT id FROM student WHERE name = 'Paige';
--> 1
INSERT INTO student (id, name) values (1, 'Robert');
--> Constraint violation error
--No dangling sequences either:
DROP TABLE student; --purge (Oracle)
Customized Sequence Properties
Because identity columns have an implicit sequence attached to it,
they can be tailored similarly to the CREATE SEQUENCE
statement:
CREATE TABLE student (
id INTEGER GENERATED ALWAYS AS IDENTITY (
START WITH 25 INCREMENT BY 25 --Override default options
),
name VARCHAR2(100) NOT NULL
);
INSERT INTO student (name) VALUES ('Ben');
INSERT INTO student (name) VALUES ('Wendy');
SELECT id FROM student WHERE name = 'Wendy';
--> 50
ALWAYS or BY DEFAULT?
The identity column clause has two different flavors: identifiers can
be generated ALWAYS
or BY DEFAULT
. The former instructs the
database to always generate a value, and will actually throw an error
if you do try to insert one yourself:
CREATE TABLE test_error (
id_col INTEGER GENERATED ALWAYS AS IDENTITY
);
INSERT INTO test_error (id_col) VALUES (1);
--> ORA-32795: cannot insert into a generated always identity column
Ultimately it all comes down to taste. Personally, I find the BY
DEFAULT
behaviour a bit more forgiving (say you need to change an ID
for whatever reason).
ORM Support
Object–relational mapping providers have great support for identity
columns out-of-the-box. In fact, it’s the default for primary key
properties in Entity Framework Core. Here’s how we could map the
student
table to a JPA Entity, using GenerationType.IDENTITY:
package org.danielaguilar.samples;
import jakarta.persistence.*;
@Entity
public class Student {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private int id;
private String name;
// ...
}
Conclusion
Identity columns are a great and universal way to declare primary key columns for our tables. Because it’s part of the SQL standard, this particular syntax works with almost all the major RDBMS vendors, hopefully decreasing the cognitive load necessary when working on multiple systems.