Adding structured data with Perl

JSON-LD for books

If you have a website, then it’s very likely that you would like as many people as possible to see it. One of the best tools for achieving that is to ensure that your site is returned close to the top of as many search results pages as possible.

In order to do that, you really have two targets:

  1. Ensure the search engines know what your website is about
  2. Ensure the search engines think your website is an important source of information on the topics it covers

The second item on the list is mostly about getting other websites on the same topic to link to you – and it is outside the scope of this post. In this post, I want to talk about a good way to ensure search engines know what your site is about.

Of course, the search engines have invested a lot of money in working that out for themselves. They scan the text on your site and processes it to extract the meaning. But there are various ways you can make it easier for them. And they like sites that make their lives easier.

One of the most powerful ways to achieve this is to add structured data to your site. That means adding extra mark-up to your web pages which explains what the page is about. On the Schema.org website, you can find dozens of “things” the you can describe in structured data – for example, here is the definition of the Person entity. Each entity has a number of (largely optional) properties which can be included in structured data about an object of that type. Each property can be a string or another structured data entity. Additionally, entities are arranged in a hierarchy, so one entity can be based on another, more generic, entity. A Person, for example, inherits all of the properties of a Thing (which is the most generic type of entity). This is a lot like inheritance in Object-Oriented Programming.

Perhaps most usefully, the definition of each entity type ends with some examples of how structured data about an entity of that type could be added to an HTML document. The examples cover three formats:

  1. Microdata. This involved adding a lot of new attributes to various elements in the HTML (you might also add more <span> and <div> elements in order to have a place to put these attributes. These attributes have names like “itemscope”, “itemtype” and “itemprop”.
  2. RDFa. This looks a lot like microdata, but the attributes have different names – “vocab”, “typeof” and “property”.
  3. JSON-LD. This is different from the other two formats. It is not added to the existing mark-up, but it is a separate element which contains JSON defining the entities on the page.

Because it is completely separate to the existing mark-up, I find JSON-LD to be easier to work with. And for that reason, I wrote MooX::Role::JSON_LD which makes it easy to generate JSON-LD for classes that are based on Moo or Moose. Let’s look at a simple example of using it to add Person JSON-LD to a web page of a person. We’ll assume we already have a Person class that we use to provide the data on a web page about a person. It has attributes first_name, last_name and birth_date.

We start with some configuration. We load the role and define two subroutines which tell us which entity type we’re working with and which attributes we want to include in the JSON-LD. The code might look like this:

We can now use our Person class like this:

This produces the following output:

This looks pretty good. But, sadly, it’s not valid JSON-LD. In the Schema.org Person entity, the relevant properties are called “givenName”, “familyName” and “birthDate”. Obviously, if we were designing our class from scratch, we could create attributes with those names. But often we’re adding features to existing systems and we don’t have that luxury. So the role allows us to change the names of attributes before they appear in the JSON-LD. We need to look more closely at the json_ld_fields() subroutine. It defines the names of the attributes that will appear in the JSON-LD. It returns an array reference and each element of the array contains a string which is the name of an attribute. But one of these elements can also contain a hash reference. In that case, the key of the hash is the name of the property we want to appear in the JSON-LD and the value is the name of the matching attribute in our class. So we can redefine our subroutine to look like this:

And now we get the following JSON-LD:

Which is now valid.

There’s one other trick we can use. We’ve seen the Schema.org Person entity has a “firstName” and “lastName” properties which map directly onto our “first_name” and “last_name” attributes. But the Person entity inherits from the Thing entity and that has a property called “name” which might be more useful for us. So perhaps we want to combine the “first_name” and “last_name” attributes into the single JSON-LD property. We can do that by changing our json_ld_fields() subroutine again:

In this version, we’ve added the “name” as the key of a hashref and the value is an anonymous subroutine that is passed the object and returns the name by concatenating the first and last names separated by a space. We now get this JSON-LD:

Using this approach, allows us to build arbitrary JSON-LD properties from a combination of attributes from our object’s attributes.

Let’s look at a real-world example (and the reason why I was reminded of this module’s existence earlier this week.

I have a website called ReadABooker. It’s about the books that compete for the Booker Prize. Each year, a shortlist of six novels is announced and, later in the year, a winner is chosen. The winning author gets £50,000 and all of the shortlisted novels get massively increased sales. It’s a big deal in British literary circles. I created the website a few years ago. It lists all of the events (the competition goes back to 1969) and for each year, it lists all of the shortlisted novels. You can also see all of the authors who have been shortlisted and which of their shortlisted novels have won the prize. Each novel has a “Buy on Amazon” button and that link includes my associate ID – so, yes, it’s basically an attempt to make money out of people who want to buy Booker shortlisted novels.

But it’s not working. It’s not working because not enough people know about the site. So last week I decided to do a bit of SEO work on the site. And the obvious improvement was to add JSON-LD for the book and author pages.

The site itself is fully static. It gets updated twice a year – once when the shortlist is announced and then again when the winner is announced (the second update is literally setting a flag on a database row). The data about the novels is stored in an SQLite database. And there are DBIx::Class classes that allow me to access that data. So the obvious place to add the JSON-LD code is in Booker::Schema::Result::Book and Booker::Schema::Result::Person (a person can exist in the database if they have been an author, a judge or both).

The changes for the Person class were trivial. I don’t actually hold much information about the people in the database.

The changes in the Book class have one interesting piece of code:

The link between a book and its author is obviously important. But in the database, that link is simply represented by a foreign key in the book table. Having something like “author : 23” in the JSON-LD would be really unhelpful, so we take advantage of the link between the book and the author that DBIx::Class has given us and call the json_ld_data() method on the book’s author object. This method (which is added to any class that uses the role) returns the raw data structure which is later passed to a JSON encoder to produce the JSON-LD. So by calling that method inside the anonymous subroutine that creates the “author” attribute we can reuse that data in our book data.

The Person class creates JSON-LD like this:

And the Book class creates JSON-LD like this:

There were two more changes needed. We needed to get the JSON-LD actually onto the HTML pages. The site is created using the Template Toolkit and the specific templates are author.html.tt and title.html.tt. Adding the JSON-LD to these pages was as simple as adding one line to each template:

And

We haven’t mentioned the json_ld_wrapped() method yet. Let me explain the hierarchy of the three main methods that the role adds to a class.

  • json_ld_data() returns the raw Perl data structure that contains the data that will be displayed in the JSON-LD
  • json_ld() takes the value returned from json_ld_data() and encodes it into a JSON document
  • json_ld_wrapped() takes the JSON returned from json_ld() and wraps it in the <script type="application/ld+json">..</script> tag that is used to embed JSON-LD in HTML. This is the method that you usually want to call from whatever is generating your HTML

And that’s how I added JSON-LD to my website pretty easily. I now need to wait and see just how effective these changes will be. Hopefully thousands of people will be buying books through my site in the coming weeks and I can sit back and stop having to write code for a living.

It’s the dream!

How about you? Which of your websites would benefit from the addition of a few carefully-crafted pieces of JSON-LD?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.