Adding structured data with Perl

If you have a website, then it’s very likely that you would like as many people as possible to see it. One of the best tools for achieving that is to ensure that your site is returned close to the top of as many search results pages as possible.

In order to do that, you really have two targets:

Ensure the search engines know what your website is about
Ensure the search engines think your website is an important source of information on the topics it covers

The second item on the list is mostly about getting other websites on the same topic to link to you – and it is outside the scope of this post. In this post, I want to talk about a good way to ensure search engines know what your site is about.

Of course, the search engines have invested a lot of money in working that out for themselves. They scan the text on your site and processes it to extract the meaning. But there are various ways you can make it easier for them. And they like sites that make their lives easier.

One of the most powerful ways to achieve this is to add structured data to your site. That means adding extra mark-up to your web pages which explains what the page is about. On the Schema.org website, you can find dozens of “things” the you can describe in structured data – for example, here is the definition of the Person entity. Each entity has a number of (largely optional) properties which can be included in structured data about an object of that type. Each property can be a string or another structured data entity. Additionally, entities are arranged in a hierarchy, so one entity can be based on another, more generic, entity. A Person, for example, inherits all of the properties of a Thing (which is the most generic type of entity). This is a lot like inheritance in Object-Oriented Programming.

Perhaps most usefully, the definition of each entity type ends with some examples of how structured data about an entity of that type could be added to an HTML document. The examples cover three formats:

Microdata. This involved adding a lot of new attributes to various elements in the HTML (you might also add more <span> and <div> elements in order to have a place to put these attributes. These attributes have names like “itemscope”, “itemtype” and “itemprop”.
RDFa. This looks a lot like microdata, but the attributes have different names – “vocab”, “typeof” and “property”.
JSON-LD. This is different from the other two formats. It is not added to the existing mark-up, but it is a separate element which contains JSON defining the entities on the page.

Because it is completely separate to the existing mark-up, I find JSON-LD to be easier to work with. And for that reason, I wrote MooX::Role::JSON_LD which makes it easy to generate JSON-LD for classes that are based on Moo or Moose. Let’s look at a simple example of using it to add Person JSON-LD to a web page of a person. We’ll assume we already have a Person class that we use to provide the data on a web page about a person. It has attributes first_name, last_name and birth_date.

We start with some configuration. We load the role and define two subroutines which tell us which entity type we’re working with and which attributes we want to include in the JSON-LD. The code might look like this:

with 'MooX::Role::JSON_LD';

sub json_ld_type { 'Person' };

sub json_ld_fields { [ qw[ first_name last_name birth_date ] ] };

with 'MooX::Role::JSON_LD';

sub json_ld_type { 'Person' };

sub json_ld_fields { [ qw[ first_name last_name birth_date ] ] };

We can now use our Person class like this:

use Person;

my $bowie = Person->new({
  first_name => 'David',
  last_name  => 'Bowie',
  birth_date => '1947-01-08',
});

say $bowie->json_ld;

use Person;

my $bowie = Person->new({

first_name => 'David',

last_name => 'Bowie',

birth_date => '1947-01-08',

});

say $bowie->json_ld;

This produces the following output:

{
   "@context" : "http://schema.org/",
   "@type" : "Person",
   "first_name" : "David",
   "last_name" : "Bowie",
   "birth_date" : "1947-01-08"
}

{

"@context" : "http://schema.org/",

"@type" : "Person",

"first_name" : "David",

"last_name" : "Bowie",

"birth_date" : "1947-01-08"

}

This looks pretty good. But, sadly, it’s not valid JSON-LD. In the Schema.org Person entity, the relevant properties are called “givenName”, “familyName” and “birthDate”. Obviously, if we were designing our class from scratch, we could create attributes with those names. But often we’re adding features to existing systems and we don’t have that luxury. So the role allows us to change the names of attributes before they appear in the JSON-LD. We need to look more closely at the json_ld_fields() subroutine. It defines the names of the attributes that will appear in the JSON-LD. It returns an array reference and each element of the array contains a string which is the name of an attribute. But one of these elements can also contain a hash reference. In that case, the key of the hash is the name of the property we want to appear in the JSON-LD and the value is the name of the matching attribute in our class. So we can redefine our subroutine to look like this:

sub json_ld_fields {
    [
      { givenName  => 'first_name' },
      { familyName => 'last_name' },
      { birthDate  => 'birth_date' },
    ]
}

sub json_ld_fields {

[

{ givenName => 'first_name' },

{ familyName => 'last_name' },

{ birthDate => 'birth_date' },

]

}

And now we get the following JSON-LD:

{
  "@context" : "http://schema.org/",
  "@type" : "Person",
  "givenName" : "David",
  "familyName" : "Bowie",
  "birthDate" : "1947-01-08"
}

{

"@context" : "http://schema.org/",

"@type" : "Person",

"givenName" : "David",

"familyName" : "Bowie",

"birthDate" : "1947-01-08"

}

Which is now valid.

There’s one other trick we can use. We’ve seen the Schema.org Person entity has a “firstName” and “lastName” properties which map directly onto our “first_name” and “last_name” attributes. But the Person entity inherits from the Thing entity and that has a property called “name” which might be more useful for us. So perhaps we want to combine the “first_name” and “last_name” attributes into the single JSON-LD property. We can do that by changing our json_ld_fields() subroutine again:

sub json_ld_fields {
    [
      { birthDate => 'birth_date'},
      { name => sub { $_[0]->first_name . ' ' . $_[0]->last_name} },
    ]
  }

sub json_ld_fields {

[

{ birthDate => 'birth_date'},

{ name => sub { $_[0]->first_name . ' ' . $_[0]->last_name} },

]

}

In this version, we’ve added the “name” as the key of a hashref and the value is an anonymous subroutine that is passed the object and returns the name by concatenating the first and last names separated by a space. We now get this JSON-LD:

{
  "@context" : "http://schema.org/",
  "@type" : "Person",
  "birthDate" : "1947-01-08"
  "name" : "David Bowie",
}

{

"@context" : "http://schema.org/",

"@type" : "Person",

"birthDate" : "1947-01-08"

"name" : "David Bowie",

}

Using this approach, allows us to build arbitrary JSON-LD properties from a combination of attributes from our object’s attributes.

Let’s look at a real-world example (and the reason why I was reminded of this module’s existence earlier this week.

I have a website called ReadABooker. It’s about the books that compete for the Booker Prize. Each year, a shortlist of six novels is announced and, later in the year, a winner is chosen. The winning author gets £50,000 and all of the shortlisted novels get massively increased sales. It’s a big deal in British literary circles. I created the website a few years ago. It lists all of the events (the competition goes back to 1969) and for each year, it lists all of the shortlisted novels. You can also see all of the authors who have been shortlisted and which of their shortlisted novels have won the prize. Each novel has a “Buy on Amazon” button and that link includes my associate ID – so, yes, it’s basically an attempt to make money out of people who want to buy Booker shortlisted novels.

But it’s not working. It’s not working because not enough people know about the site. So last week I decided to do a bit of SEO work on the site. And the obvious improvement was to add JSON-LD for the book and author pages.

The site itself is fully static. It gets updated twice a year – once when the shortlist is announced and then again when the winner is announced (the second update is literally setting a flag on a database row). The data about the novels is stored in an SQLite database. And there are DBIx::Class classes that allow me to access that data. So the obvious place to add the JSON-LD code is in Booker::Schema::Result::Book and Booker::Schema::Result::Person (a person can exist in the database if they have been an author, a judge or both).

The changes for the Person class were trivial. I don’t actually hold much information about the people in the database.

with 'MooX::Role::JSON_LD';

sub json_ld_type { 'Person' }

sub json_ld_fields {
  [
    qw/name/,
  ];
}

with 'MooX::Role::JSON_LD';

sub json_ld_type { 'Person' }

sub json_ld_fields {

[

qw/name/,

];

}

The changes in the Book class have one interesting piece of code:

with 'MooX::Role::JSON_LD';

sub json_ld_type { 'Book' }

sub json_ld_fields {
  [
    { name => 'title' },
    { author => sub {
      $_[0]->author->json_ld_data }
    },
    { isbn => 'asin' },
  ];
}

with 'MooX::Role::JSON_LD';

sub json_ld_type { 'Book' }

sub json_ld_fields {

[

{ name => 'title' },

{ author => sub {

$_[0]->author->json_ld_data }

{ isbn => 'asin' },

];

}

The link between a book and its author is obviously important. But in the database, that link is simply represented by a foreign key in the book table. Having something like “author : 23” in the JSON-LD would be really unhelpful, so we take advantage of the link between the book and the author that DBIx::Class has given us and call the json_ld_data() method on the book’s author object. This method (which is added to any class that uses the role) returns the raw data structure which is later passed to a JSON encoder to produce the JSON-LD. So by calling that method inside the anonymous subroutine that creates the “author” attribute we can reuse that data in our book data.

The Person class creates JSON-LD like this:

{
   "@context" : "http://schema.org/",
   "@type" : "Person",
   "name" : "Theresa Mary Anne Smith"
}

{

"@context" : "http://schema.org/",

"@type" : "Person",

"name" : "Theresa Mary Anne Smith"

}

And the Book class creates JSON-LD like this:

{
   "@context" : "http://schema.org/",
   "@type" : "Book",
   "author" : {
      "@context" : "http://schema.org/",
      "@type" : "Person",
      "name" : "Theresa Mary Anne Smith"
   },
   "isbn" : "B086PB2X8F",
   "name" : "Office Novice"
}

{

"@context" : "http://schema.org/",

"@type" : "Book",

"author" : {

"@context" : "http://schema.org/",

"@type" : "Person",

"name" : "Theresa Mary Anne Smith"

"isbn" : "B086PB2X8F",

"name" : "Office Novice"

}

There were two more changes needed. We needed to get the JSON-LD actually onto the HTML pages. The site is created using the Template Toolkit and the specific templates are author.html.tt and title.html.tt. Adding the JSON-LD to these pages was as simple as adding one line to each template:

[% author.json_ld_wrapped -%]

1	[% author.json_ld_wrapped -%]

And

[% book.json_ld_wrapped -%]

1	[% book.json_ld_wrapped -%]

We haven’t mentioned the json_ld_wrapped() method yet. Let me explain the hierarchy of the three main methods that the role adds to a class.

json_ld_data() returns the raw Perl data structure that contains the data that will be displayed in the JSON-LD
json_ld() takes the value returned from json_ld_data() and encodes it into a JSON document
json_ld_wrapped() takes the JSON returned from json_ld() and wraps it in the <script type="application/ld+json">..</script> tag that is used to embed JSON-LD in HTML. This is the method that you usually want to call from whatever is generating your HTML

And that’s how I added JSON-LD to my website pretty easily. I now need to wait and see just how effective these changes will be. Hopefully thousands of people will be buying books through my site in the coming weeks and I can sit back and stop having to write code for a living.

It’s the dream!

How about you? Which of your websites would benefit from the addition of a few carefully-crafted pieces of JSON-LD?

2 thoughts on “Adding structured data with Perl”

Kees Reuzelaar says:

13 January, 2025 at 10:37

Please write an update with the results of this change. Did it work to get more visitors?

1. Dave Cross says:
  
  28 January, 2025 at 16:02
  
  I definitely will. But I suspect it’s going to be a long slog before I have any success. There are a lot of companies who have a lot more resources than I do and who invest a lot of money in SEO around searches for books. Imagine how seriously a company like Amazon takes this?

Perl Hacks

Just another Perl Hacker's blog

Adding structured data with Perl

Related

2 thoughts on “Adding structured data with Perl”

Leave a ReplyCancel reply

Share this:

Related

2 thoughts on “Adding structured data with Perl”

Leave a ReplyCancel reply