Building Planets with Perlanet and GitHub

Planets

Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun. Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.

Douglas Adams – The Hitchhiker’s Guide to the Galaxy

I don’t still wear a digital watch, but I do like other things that are almost as unhip. In particular, I pine for the time about twenty years ago when web feeds looked like they were about to take over the world. Everyone had their favourite feed reader (I still miss Google Reader) and pretty much any useful web site would produce one or more web feeds that you could subscribe to and follow through your feed reader. For a few years, it was almost unthinkable to produce a web site without publishing a feed which included the changes to the site’s content.

Then, at some point, that changed. It wasn’t that web feeds vanished overnight. They still exist for many sites. But they are no longer ubiquitous. You can’t guarantee they’ll exist for every site you’re interested in. I remember people saying that social media would replace them. I was never convinced by that argument but, interestingly, one of the first times I noticed them vanishing was when Twitter removed their web feed of a user’s posts. They wanted people to use their AP instead (so I wrote twitter-json2atom that turned their API’s JSON into an Atom feed – I suspect it no longer works). Honestly, I think the main reason for the fall in popularity of web feeds was that people wanted you to read their content on their web sites where the interesting content was surrounded by uninteresting adverts.

But, as I said, not all web feeds vanished. There are still plenty of them out there (often, I expect because the sites’ owners don’t realise they’re there or don’t know how to turn them off). And that means the web feed-driven technologies of the early 2000s can still be useful.

One such piece of technology is the feed aggregator. I remember these being very popular. You would create a web site and configure it with a list of web feeds that you were interested in. The site would be driven by a piece of software that every few hours would poll the web feeds in the configuration and use the information it found to create a) a web page made up of information from the feeds and b) another feed that contained all of the information from the source feeds. The most popular software for building these sites was called Planet Planet and was written in Python (it seems to have vanished sometime in the last twenty years, otherwise I would link to it). When I wrote a Perl version, I called it (for reasons I now regret) Perlanet.

I still use Perlanet to build planet sites. And they’re all listed at The Planetarium. Recently, I’ve started hosting all my planets on GitHub Pages, using GitHub Actions to rebuild the sites periodically. I thought that maybe other people might be old-skool like me and might want to build their own planets – so in the rest of this post I’ll explain how to do that, using Planet Perl as an example.

The first thing you’ll need is a GitHub account and a repo to store the code for your planet. I’m going to assume you know how to set those up (in the interest of keeping this tutorial short). You only actually need two files to create a planet – a config file and a template for the web site.

Here’s part of the config for Planet Perl:

I’ve tried to make it self-explanatory. At the top, there are various config options for the output (the web page and the aggregated feed) and, below, are details of the feeds that you want to aggregate. Let’s look at the output options first.

  • title and description: these are both strings that you can include on the web page that is created. They’re also used in the aggregate feed that is produced
  • url: this is where the web page will be available on the web
  • author: this contains details of the person publishing the aggregated site and feed. The Twitter handle is optional
  • entries: is the maximum number of entries that your output will contain in total
  • entries_per_feed: is the maximum number of entries that you will use from each of your feeds. This is to stop your output being swamped with entries from a particularly busy feed. This can be omitted, in which case there will be no limit
  • opml_file: OPML stands for “Outline Processor Markup File”. It used to be trendy to publish an OPML file which is a machine-readable data file which contains a list of the feeds that you are aggregating. These days, no-one cares. If you omit this setting, the file won’t be created.
  • page: this contains details of the web page you create. The template is the name of a template file that is used to create the HTML page (more on that below) and file is where the output page is written. If you keep the value used in my example, then things will work well with GitHub Pages as we’ll see later
  • feed: this contains details of the aggregate feed we create. You can choose a format (Atom or RSS) and the filename. Again, the default filename will work well with GitHub Pages
  • google_ga: if this value exists, then it will be used as the Google Analytics identifier for the web page that is created
  • cutoff_duration: this is another way to control which entries are used in your output feed. Any entries that were published longer ago than this period of time will be ignored

Then we have the section of the config file that defines the feeds that we are going to aggregate. Each feed has three data items:

  • feed: the URL of the feed
  • title: a string to use to describe the feed
  • web: the URL of the feed’s original web page

And that’s all you need for the config file. Create that, put it in a file called “perlanetrc” and add it to your repo.

The other file you need is the template for the HTML page. This is usually called “index.tt”. The one I use for Planet Perl is rather complicated (there are all sorts of Javascript tricks in it). The one I use for Planet Davorg is far simpler – and should work well with the config file above. I suggest going with that initially and editing it once you’ve got everything else working.

I said those are the only two files you need. And that’s true. But the site you create will be rather ugly. My default web page uses Bootstrap for CSS, but you’ll probably want to add your own CSS to tweak the way it looks – along with, perhaps, some Javascript and some images. All of the files that you need to make your site work should be added to the /docs directory in your repo.

Having got to this stage, we can test your web site. Well, we’ll need to install Perlanet first. There are two ways to do this. You can either install it from CPAN along with all of its (many) dependencies – using “cpan Perlanet” or there’s a Docker image that you can use. Either way, once you have the software installed, running it is as simple as running “perlanet”. That will trundle along for a while and, when it has finished, you’ll find new files called “index.html” and “atom.xml” in the /docs directory. My favourite way to test the output locally is to use App::HTTPThis. Having installed this program, you can just run “http_this docs” from the repo’s main directory and then visit http://localhost:7007/index.html to see the site that was produced (or http://localhost:7007/atom.xml to see the feed.

You now have a system to build your new planet. You could run that on a server that’s connected to the internet and set up a cronjob to regenerate the file every few hours. And that’s how I used to run all of my planets. But, recently, I’ve moved to running them on GitHub Pages instead. And that’s what we’ll look at next.

There are two parts to this. We need to configure our repo to have a GitHub Pages site associated with it and we also need to configure GitHub Actions to rebuild the site every few hours. Let’s take those two in turn.

Turning on GitHub Pages is simple enough. Just go to the “Pages” section in your repo’s settings. Choose “GitHub Actions” as the deployment source and tick the box marked “Enforce HTTPS”. Later on, you can look at setting up a custom domain for your site but, for now, let’s stick with the default URL which will be https://<github_username>.github.io/<repo_name>. Nothing will appear yet, as we need to set up GitHub Actions next.

Setting up a GitHub Action workflow is as simple as adding a YAML file to the /.github/workflows directory in your repo. You’ll obviously have to create that directory first. Here’s the workflow definition for Planet Perl (it’s in a file called “buildsite.yml”, but that name isn’t important).

The first section of the file defines the events that will trigger this workflow. I have defined three triggers:

  1. Pushing a commit. I could be cleverer here and only work when certain files are changed (for example, the config or the index.tt)
  2. On a schedule. My example runs at 37 minutes past the hour every four hours (so at 04:37, 08:37, etc.)
  3. Manually. The “workflow_dispatch” trigger adds a button to the repo’s “Actions” page on GitHub allowing you to run the workflow manually, whenever you want

Following that, we define the jobs that need to be run and the steps that make up those jobs. We have two jobs – one that builds the new version of the site and one that deploys that new site to GitHub Pages. Remember how I mentioned earlier that there is a Perlanet container on the Docker Hub? Well, you’ll see that the build job runs on that container. This is because pulling a container from the Docker Hub is faster than using a standard Ubuntu container and installing Perlanet.

The steps in these jobs should be pretty self-explanatory. Basically, we check out the repo, run “perlanet” to build the site and then deploy the contents of the /docs directory to the GitHub Pages server.

Once you’ve created this file and added it to your repo, you’ll see details of this workflow on the “Actions” tab in your repo. And whenever you push a change or when a scheduled run takes place (or you press the manual run button) you’ll see logs for the run and (hopefully) your web site will update to contain the latest data.

I reckon you can get a new planet up and running in about half an hour. Oh, and if you label your repo with the topic “perlanet”, then it will automatically be added to The Planetarium.

So, what are you waiting for? What planet would you like to build?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.