The concept of SQL tables with an auto incremental column for its primary key it’s obviously not new, as it is easily implemented using sequences. Now, on December of 2003 the SQL:2003 standard introduced the identity column type, along with the syntax GENERATED AS IDENTITY, which is currently supported by virtually all RDBMS providers. Let’s take a quick look at how we can declare primary key columns with auto generated identifiers, without relying on vendor-specific syntax!

Plain Sequences

This is the most straightforward way to produce primary key values. Sequences generate unique integers and can be incremented by an arbitrary number like 1, 20, -1, etc. Consider the following table:

CREATE TABLE student (
  id INTEGER,
  name VARCHAR(100) NOT NULL,
  CONSTRAINT pk_student PRIMARY KEY (id)
);

We could then create a sequence to generate valid data for our id primary key column:

CREATE SEQUENCE seq_student;
/* Postgres specific syntax (i.e. the nextval function).
Oracle's equivalent would be seq_student.NEXTVAL */
INSERT INTO student (id, name) VALUES (nextval('seq_student'), 'Victoria');
INSERT INTO student (id, name) VALUES (nextval('seq_student'), 'Manuel');

This approach works fine but assumes some sort of nomenclature is in place—you know which sequence belongs to the student table due to its name alone. Another developer might mistakenly use a different sequence, an ad hoc value, or even drop the sequence entirely (no errors will be thrown). Without any corresponding documentation there really is no way to tell at a glance how id values are produced, perhaps they’re managed in a different system altogether.

Attached Sequences

Auto incremental primary key columns are so common that Postgres offers a shorthand. We could redefine the student table like so:

CREATE TABLE student (
  id SERIAL, --shorthand type to create unique identifier columns 
  name VARCHAR(100) NOT NULL,
  CONSTRAINT pk_student PRIMARY KEY (id)
);

/* No need to explicitly define a sequence.
Later INSERT statements can omit the primary key column: */
INSERT INTO student (name) VALUES ('Victoria');
INSERT INTO student (name) VALUES ('Manuel');

--Deleting the table also removes the associated sequence:
DROP TABLE student;

In contrast, Oracle offers no shorthands and requires the declaration of a sequence beforehand, using the NEXTVAL function as the default value for the column:

CREATE SEQUENCE seq_student;
CREATE TABLE student (
  id INTEGER DEFAULT seq_student.NEXTVAL,
  name VARCHAR2(100) NOT NULL,
  CONSTRAINT pk_student PRIMARY KEY (id)
);

INSERT INTO student (name) VALUES ('Victoria');
INSERT INTO student (name) VALUES ('Manuel');

These last two examples make the table student depend on a certain sequence, which can’t be dropped on a whim. The table declaration itself makes it very clear how the primary key values are generated, and thus our intentions more evident.

Of course, other RDBMS vendors have their own unique ways of generating unique values (see what I did there?), such as MySQL’s AUTO_INCREMENT attribute, which also works in H2:

CREATE TABLE student (
  id INT AUTO_INCREMENT,
  name VARCHAR(100) NOT NULL,
  CONSTRAINT pk_student PRIMARY KEY (id)
);

INSERT INTO student (name) VALUES ('Maria');

It’s important to keep in mind that auto generated clauses do not necessarily create the primary key constraint automatically.

Identity Columns

Identity columns are the ultimate way to declare auto generated primary key columns, using good ol’ standard SQL (ISO/IEC 9075) and nothing else. Here’s what the student table could be rewritten as, using the identity clause:

CREATE TABLE student (
  id INTEGER GENERATED BY DEFAULT AS IDENTITY,
  name VARCHAR(100) NOT NULL,
  CONSTRAINT pk_student PRIMARY KEY (id)
);

INSERT INTO student (name) VALUES ('Paige');
SELECT id FROM student WHERE name = 'Paige';
--> 1
INSERT INTO student (id, name) values (1, 'Robert');
--> Constraint violation error

--No dangling sequences either:
DROP TABLE student; --purge (Oracle)

Customized Sequence Properties

Because identity columns have an implicit sequence attached to it, they can be tailored similarly to the CREATE SEQUENCE statement:

CREATE TABLE student (
  id INTEGER GENERATED ALWAYS AS IDENTITY (
    START WITH 25 INCREMENT BY 25 --Override default options
  ),
  name VARCHAR2(100) NOT NULL
);

INSERT INTO student (name) VALUES ('Ben');
INSERT INTO student (name) VALUES ('Wendy');

SELECT id FROM student WHERE name = 'Wendy';
--> 50

ALWAYS or BY DEFAULT?

The identity column clause has two different flavors: identifiers can be generated ALWAYS or BY DEFAULT. The former instructs the database to always generate a value, and will actually throw an error if you do try to insert one yourself:

CREATE TABLE test_error (
  id_col INTEGER GENERATED ALWAYS AS IDENTITY
);

INSERT INTO test_error (id_col) VALUES (1);
--> ORA-32795: cannot insert into a generated always identity column

Ultimately it all comes down to taste. Personally, I find the BY DEFAULT behaviour a bit more forgiving (say you need to change an ID for whatever reason).

ORM Support

Object–relational mapping providers have great support for identity columns out-of-the-box. In fact, it’s the default for primary key properties in Entity Framework Core. Here’s how we could map the student table to a JPA Entity, using GenerationType.IDENTITY:

package org.danielaguilar.samples;

import jakarta.persistence.*;

@Entity
public class Student {

	@Id
	@GeneratedValue(strategy = GenerationType.IDENTITY)
	private int id;
	
	private String name;
	
	// ...
}

Conclusion

Identity columns are a great and universal way to declare primary key columns for our tables. Because it’s part of the SQL standard, this particular syntax works with almost all the major RDBMS vendors, hopefully decreasing the cognitive load necessary when working on multiple systems.