Public Identifiers, UUIDs and a Tiny SEO Fix

The titles (and URLs) of Prince William over time

Public Identifiers, UUIDs and a Tiny SEO Fix

A recent question from my friend and colleague Mohammad got me thinking about the way we identify data in web applications.

While working on the DBIC component of a REST API, he came across the term enumeration attack. In this type of attack, an attacker systematically guesses resource identifiers in order to access data they shouldn’t be able to see.

For example, if your API exposes URLs like this:

then it’s easy for someone to try a large range of identifiers and see what they get back.

Mohammad’s question was simple:

Should we replace sequential IDs with UUIDs? And if we do, should we index the UUID column?

As is often the case, the answer turned out to be “it depends”.

Two Different Types of Data

The first thing I realised is that not all data objects have the same requirements.

Some objects are naturally public.

For example, books on a publishing website are intended to be discovered. In fact, you probably want people to be able to guess their URLs:

In this case, a human-readable slug makes perfect sense. Other objects are private by nature. User accounts, orders, invoices and API resources generally shouldn’t be enumerable. In those cases, a UUID is often a better choice:

The important observation is that slugs and UUIDs solve different problems.

  • Slugs are for humans (and, perhaps, search engines).
  • UUIDs are for machines.

Database Design

A common question is whether a UUID should replace the primary key.

In most cases, I don’t think it should.

My preferred design is:

The integer primary key remains the internal identifier used for joins and foreign keys.

The UUID becomes the public identifier exposed through APIs.

This gives you the best of both worlds:

  • Small, efficient foreign keys.
  • Fast joins.
  • Unguessable public identifiers.

If the application regularly searches by UUID then the UUID column should be indexed. In practice, declaring it UNIQUE will usually create the appropriate index automatically.

The Hybrid Approach

Thinking about this reminded me that many large sites use a hybrid approach.

Amazon product URLs contain both a human-readable title and a stable identifier:

The ASIN is what really identifies the product.

The title is there for humans.

Stack Overflow does something similar:

Again, the question ID is authoritative. The title is helpful context.

My Line of Succession website uses the same idea.

A person page looks like this:

The important part is the identifier:

The rest is descriptive text.

This turns out to be particularly useful for royalty because titles change constantly. Someone might be “Prince William”, then “The Prince of Wales”, and eventually “King William V”.

By separating identity from presentation, old links continue to work regardless of title changes.

A Tiny Bug

While thinking about all of this, I discovered a small bug in Line of Succession.

The site allows any descriptive text after the identifier. These URLs all resolve to the same person:

The application correctly ignores the descriptive text and uses only the identifier.

However, there was a problem.

The page was generating its canonical URL from the incoming request path rather than from the person record.

That meant a request for:

generated:

which is obviously not the canonical URL.

The fix was surprisingly small:

At the same time I simplified another method by making it reuse the canonical URL logic.

The result was a six-line patch that fixed the SEO issue and made the code slightly cleaner.

Those are my favourite kinds of fixes.

Future Improvements

The fix also revealed an emerging abstraction in the code.

At the moment, various parts of the application know how to construct URLs for different object types.

A cleaner approach would be to give objects responsibility for generating their own URLs.

I’m considering a HasURL role that would require an object to provide an identifier and optionally a prefix, and then build the URL automatically.

That’s a job for another day.

For now, a small question about UUIDs led to a useful discussion about public identifiers, a review of URL design, and a tiny production fix. Not bad for an afternoon’s work.

2 thoughts on “Public Identifiers, UUIDs and a Tiny SEO Fix

  1. When you wrote that you can use anything after the ID, I would urge caution:


    /p/2b5998-the-prince-william-prince-of-wales
    /p/2b5998-prince-billy
    /p/2b5998-fred

    I hit this same issue with a very large company (most readers would know their name) where someone who didn’t understand Catalyst wrote their own router. It simply dropped segments it didn’t recognize. That meant, for this very public company, I could share the url, https://very.public.company/categories/curtis-poe-is-an-idiot/1234, and as long as “1234” was a valid category id, the URL would properly resolve.

    I have tested this on your website and you have the same issue. You can visit https://lineofsuccession.co.uk/p/2b5998-the-prince-william-prince-of-wales-is-a-very-fine-gentleman and it works. Or you can replace ‘-is-a-very-fine-gentleman` with something less, um, appropriate, and you potentially have a PR disaster on your hands.

    For your site, maybe not a big deal. For a huge public company, it’s a very big deal. (Note: that very public company has had this bug for years because the person who wrote the custom router rejected a request to fix it)

    1. I agree completely. I wrote about this very problem fifteen years ago.

      https://blog.dave.org.uk/2011/04/independent-urls.html

      The fix I mentioned above gets halfway to fixing this (it stops Google from indexing the wrong slug) but, yes, it really needs another level where the page redirects to the correct URL.

      And I notice that the Independent story that I was writing about still exists on their website – but the incorrect URL now redirects to the correct one.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.