Public Identifiers, UUIDs and a Tiny SEO Fix
A recent question from my friend and colleague Mohammad got me thinking about the way we identify data in web applications.
While working on the DBIC component of a REST API, he came across the term enumeration attack. In this type of attack, an attacker systematically guesses resource identifiers in order to access data they shouldn’t be able to see.
For example, if your API exposes URLs like this:
|
1 2 3 |
GET /users/123 GET /users/124 GET /users/125 |
then it’s easy for someone to try a large range of identifiers and see what they get back.
Mohammad’s question was simple:
Should we replace sequential IDs with UUIDs? And if we do, should we index the UUID column?
As is often the case, the answer turned out to be “it depends”.
Two Different Types of Data
The first thing I realised is that not all data objects have the same requirements.
Some objects are naturally public.
For example, books on a publishing website are intended to be discovered. In fact, you probably want people to be able to guess their URLs:
|
1 |
/books/design-patterns-in-modern-perl |
In this case, a human-readable slug makes perfect sense.Other objects are private by nature. User accounts, orders, invoices and API resources generally shouldn’t be enumerable. In those cases, a UUID is often a better choice:
|
1 |
/users/550e8400-e29b-41d4-a716-446655440000 |
The important observation is that slugs and UUIDs solve different problems.
- Slugs are for humans (and, perhaps, search engines).
- UUIDs are for machines.
Database Design
A common question is whether a UUID should replace the primary key.
In most cases, I don’t think it should.
My preferred design is:
|
1 2 3 4 |
CREATE TABLE users ( id BIGINT PRIMARY KEY, uuid UUID NOT NULL UNIQUE ); |
The integer primary key remains the internal identifier used for joins and foreign keys.
The UUID becomes the public identifier exposed through APIs.
This gives you the best of both worlds:
- Small, efficient foreign keys.
- Fast joins.
- Unguessable public identifiers.
If the application regularly searches by UUID then the UUID column should be indexed. In practice, declaring it UNIQUE will usually create the appropriate index automatically.
The Hybrid Approach
Thinking about this reminded me that many large sites use a hybrid approach.
Amazon product URLs contain both a human-readable title and a stable identifier:
|
1 |
/Design-Patterns-Modern-Perl/dp/B0XXXXX123 |
The ASIN is what really identifies the product.
The title is there for humans.
Stack Overflow does something similar:
|
1 |
/questions/12345/how-do-i-index-a-uuid-column |
Again, the question ID is authoritative. The title is helpful context.
My Line of Succession website uses the same idea.
A person page looks like this:
|
1 |
/p/2b5998-the-prince-william-prince-of-wales |
The important part is the identifier:
|
1 |
2b5998 |
The rest is descriptive text.
This turns out to be particularly useful for royalty because titles change constantly. Someone might be “Prince William”, then “The Prince of Wales”, and eventually “King William V”.
By separating identity from presentation, old links continue to work regardless of title changes.
A Tiny Bug
While thinking about all of this, I discovered a small bug in Line of Succession.
The site allows any descriptive text after the identifier. These URLs all resolve to the same person:
|
1 2 3 |
/p/2b5998-the-prince-william-prince-of-wales /p/2b5998-prince-billy /p/2b5998-fred |
The application correctly ignores the descriptive text and uses only the identifier.
However, there was a problem.
The page was generating its canonical URL from the incoming request path rather than from the person record.
That meant a request for:
|
1 |
/p/2b5998-prince-billy |
generated:
|
1 2 |
<link rel="canonical" href="https://lineofsuccession.co.uk/p/2b5998-prince-billy"> |
which is obviously not the canonical URL.
The fix was surprisingly small:
|
1 2 3 4 5 6 7 8 9 |
sub canonical( $self ) { if ($self->request->is_date_page) { return '/' . $self->canonical_date; + } elsif($self->request->is_person_page) { + return '/p/' . $self->request->person->slug; } else { return $self->request->path; } } |
At the same time I simplified another method by making it reuse the canonical URL logic.
The result was a six-line patch that fixed the SEO issue and made the code slightly cleaner.
Those are my favourite kinds of fixes.
Future Improvements
The fix also revealed an emerging abstraction in the code.
At the moment, various parts of the application know how to construct URLs for different object types.
A cleaner approach would be to give objects responsibility for generating their own URLs.
I’m considering a HasURL role that would require an object to provide an identifier and optionally a prefix, and then build the URL automatically.
That’s a job for another day.
For now, a small question about UUIDs led to a useful discussion about public identifiers, a review of URL design, and a tiny production fix. Not bad for an afternoon’s work.
