Category Archives: CPAN

More RPM Stuff

It’s been a while since I wrote anything here. if anyone is keeping score I’ve probably failed the Iron Man challenge of posting something every ten days.

Don’t have much to add here either but I thought some of you might be interested in a quick tweak I made to my spreadsheet of CPAN RPMs available for Fedora. It now lists all of the RPMs available across all of the repositories that I use and shows you which version of the module is available. I’ve also added the current CPAN version for all of these modules.

This gives me the information I need to do a few things that I’ve wanted to do for a while. In particular I should be able to script the automatic removal of RPMs from my repository when the official Fedora repository catches up with the version I’m carrying. I can also easily identify CPAN modules where the latest Fedora version (from any of the repositories) is lagging behind the CPAN version.

As always, the code is available on Github and patches are very welcome.

Update: And here’s another spreadsheet covering CPAN RPMs available for Centos.

META.yml and Building RPMs

An email has flooded in. It was in response to my piece about Building RPMs from CPAN Distributions and it was from Andreas Koenig. Andreas runs PAUSE, which is the service CPAN authors use to upload stuff to CPAN, so he knows what he’s talking about when it comes to CPAN (and many other matters). He says this:

It’s not correct that the META.yml contains the exact list of dependencies. The META.yml is not the authoritative source for them. The reason behind is that dependencies do differ across architectures. Exceptions to this rule may declare dynamic_config=0. In order to obtain the real list of dependencies you must run your Makefile.PL or Build.PL. Recent Module::Build provides a MYMETA.yml after Build.PL has run. You could use that instead. MakeMaker always had the dependency as a comment in the Makefile.

He is, of course, right. My previous article skipped blithely over some of the more gnarly corners of this problem. I should point out that Gabor and I discussed some of these over the weekend but it’s almost certainly worthwhile going into a little more detail.

It’s true that a static META.yml file can’t deal with all of the possibilities. Here are a number of examples of areas that need to be looked at in more detail.

Environmental differences
This is the area that Andreas is talking about. And the Padre problem I mentioned on Monday is one example of this. Padre runs on several different platforms. And some dependencies will only be required on certain platforms. For example the Win32::API module is only required if the module is being installed on Windows.

But it’s not just different operating systems or architectures that cause problems like this. If you’re trying to use Plack on a server with Apache 2 installed, you’ll need Apache2::Request. If your server has Apache 1 installed you’ll need Apache::Request. In each case, you won’t need the request module for the Apache version that you aren’t using. As things stand, the META.yml for Plack doesn’t list either of the Apache request modules, but a more intelligent system could work out which one of them is required and add that one to the list of dependencies.

“Choose One” requirements
Some modules exist simply as a way of allowing the user to choose between one of a number of implementations of a feature. A good example is JSON::Any. There are (at least) three different JSON modules on CPAN – JSON, JSON::DWIW and JSON::XS). Different systems will have different ones installed. JSON::Any allows a program to use any JSON module and not care which of them is installed. But how do you model that dependency? If you make any (or all) of the supported modules a required dependency, you rather miss the whole point of the module. JSON::Any’s META.yml ignores the problem and leaves it to the Makefile.PL to work out what to do. The Fedora RPM for this module takes a weird approach and makes JSON::XS a required dependency. Even if META.yml could support this mode of working, RPM doesn’t have this feature.

Added features
Some modules have optional requirements. That is, if certain other modules are installed then the module gains more features. One example is the Template-XML distribution. Template-XML contain a plugin (Template::Plugin::XML) for the Template Toolkit. Template::Plugin::XML is a wrapper around a number of XML processing modules. If a particular module (for example, XML::DOM) is installed then Template::Plugin::XML allows the user to uses XML::DOM for XML processing. It works similarly for XML::LibXML, XML::Parser, XML::RSS, XML::Simple and XML::XPath. None of them are required, different functionality is turned on for each one that is installed. You don’t have to configure Template::Plugin::XML at all to work with these modules. It just works if a particular module is installed. If, at a later date, you remove that module then Template::Plugin::XML removes the features supported by that module.

This seems to be somewhere where I have philosophical differences with the Fedora RPM packaging team. I believe that all of these modules should be seen as optional and there for shouldn’t be listed as dependencies in the META.yml or the RPM. The Fedora team disagrees. They want each RPM to depend on all of the modules it needs in order to have as many features as possible, The Template-XML RPM therefore requires all of the XML processing modules I listed above. That seems wrong to me.

META.yml supports the concept of  “recommended modules”. I think that these optional modules should be listed there. But I don’t believe that RPM has a similar feature.

So there are a few problems that I see with the META.yml approach. In the face of these issues I should probably back down slightly from my previous position that META.yml is the definitive way to get a list of dependencies. What I now believe is that parsing META.yml will give you a better position to start from than parsing the Perl code and extracting all of the “use” statements.

But I hadn’t previously heard of the MYMETA.yml that Andreas mentioned in his email. That’s certainly a way to get round the environmental differences I listed above. I don’t think it solves the other two issues though.

Are there any other corner cases that I’ve missed. Does anyone else have any opinions on building RPMs from CPAN distributions?

Useful RPM Stuff

I forgot to mention this yesterday. I’ve set up a github project (see http://github.com/davorg/rpm_stuff) where I’ll dump bits and pieces that I’m writing to make my RPM-building life easier.

The first utility I’ve uploaded there is called can_rpmbuild. You pass it an RPM spec file and it tells you which of the dependencies are available from the RPM repositories that you use and which of them you’ll have build yourself.

I’m sure it can be improved. Patches welcome of course. And please let me know if you find it useful.

Building RPMs from CPAN Distributions

Regular readers will know that in the past I’ve shown some interest in building RPMs from CPAN distributions. It’s been a while since I did much work in this area (although I do still release the occasional module to my RPM repository.

Over the weekend I was at FOSDEM and I attended Gabor’s talk on packaging CPAN modules for Linux distributions. This has rekindled my interest in this area and I spent most of the train journey back from Brussels hacking around the area.

There’s one thing that has been bothering me in particular recently. The standard RPM building mechanism (or, at least, the way it’s configured in Fedora and Centos) does something incredible brain dead when trying to work out what other modules the current module depends on. It does it by parsing the source code and looking for “use” statements. This means that a module that might only be used in really obscure cases is going to be listed as a mandatory requirement for your module.

Gabor and I actually saw an example of this over the weekend when the Fedora packaging team raised a bug against Padre because it requires Win32::API. Padre, of course, only uses Win32::API when being used on Windows. And for that reason Win32::API is not listed as a dependency in its META.yml.

And that’s, of course, where the RPM builders should be going to get a list of dependencies. META.yml contains the list of other modules that the author wants the module to depend on. This should be seen as the definitive list. Of course, there might be errors in that list – but that should be addressed by raising a bug against the module.

I’ve poked at this problem a few times, trying to work out how the RPM system parses the code and trying to replace that with code that looks at META.yml instead. But the RPM system uses a baroque system of interdependent macros and eventually they all lead to a piece of rather clunky Perl code. So each time I’ve approached this problem, I’ve backed off again.

The problem became more urgent when I wanted to package Plack for Fedora. Plack supports all sorts of hosting environments and therefore includes “use” statements loading a number of modules that most people will never use. Fedora includes Apache2, so Apache::Request (which is for Apache1) will never be available. It’s not listed in META.YML, but it is used by one of the modules. The RPM build system was therefore insisting that it should be present. An impasse was reached.

Then I decided to turn the problem on its head. RPM building has two steps. You create a spec file for the RPM and then you build the RPM using the spec file and your original tarball. I started wondering if I could ensure that the spec had all of the requirements (from the META.yml). Once I’d done that I would only need to find some way to turn off the RPM build system’s default behaviour.

People packaging CPAN modules for Fedora (and Centos) use a program called ‘cpanspec’ to generate spec files. I started digging into the code there in order to find out how to insert the list of correct dependencies.

Only to find that it has already been done. cpanspec is already doing the right thing and generating a list of ‘Requires’ statements from the data in META.yml.

Then all I needed to do was to see if I could turn off the (broken) default RPM build behaviour which was adding spurious extra dependencies. That proved to be easy too. It’s just a case of adding %__perl_requires %{nil} to your .rpmmacros files.

So now all of my RPMs will have only the correct dependencies listed. This makes me very happy.

I suppose I should go back and rebuild all of the older ones too.

Oh, and because I’ve worked out a really easy way to generate this – here’s a spreadsheet listing which CPAN modules are available as RPMs for Fedora. I plan to keep this list up to date (and make it much longer). [Link now fixed]

p.s. More about my trip to FOSDEM and the Perl marketing push there over the next couple of days.

Important SpamAssassin Update

Do you run SpamAssassin to remove spam from your mailbox? Have you noticed a fall in the amount of mail you’ve received in the last couple of days? I mean all mail, not just spam.

You might well be falling foul of this problem. The rules in the latest version of SpamAssassin available from CPAN include this rule.

##{ FH_DATE_PAST_20XXheader   FH_DATE_PAST_20XX      Date =~ /20[1-9][0-9]/ [if-unset: 2006]
describe FH_DATE_PAST_20XX      The date is grossly in the future.
##} FH_DATE_PAST_20XX

And here are the scores for that rule:

score FH_DATE_PAST_20XX 2.075 3.384 3.554 3.188 # n=2

Basically, any mail with a date header greater than 2009 is given a score of about 3. Depending on your local configuration and other rules that the mail might trigger, that might well increase the chance of legitimate mail being marked as spam.

So have a look in your spam folder. You might find some good stuff in there.

Of course, I might be the only person who has been caught by this. It’s possible that you all knew of the existence of sa-update and you’ve all got it running in a cronjob so you’re using the latest rules and not the eighteen month old set that I have. I’ll be setting that up today.

This has been a public service announcement for people who are as stupid as me.

Perlanet Improvements

I can be a bit of a lazy open source author at times. I love it when someone improves my code and then just emails me patches. I love it even more now that I’m using Github so that people can just fork my code and send me pull requests.

That happened over the weekend. I got a pull request from cycles saying that he’d been refactoring Perlanet and asking if I’d be interested in merging his changes. These refactorings had come out of a mini-hackathon that had been run by the North West England Perl Mongers. I knew that they had been interested in using Perlanet to power the Iron Man aggregator, but that the monolithic nature of  the code made it hard for them to subclass the bits that they needed to change.

So what cycles has done is to take my code and break it down into a number of smaller functions (and a couple more classes) so that people can subclass it and override methods where they want different behaviour to the defaults that I use. I can see this leading to a little ecosystem of Perlanet subclasses where people release their favourite tweaks to CPAN. I see this as a good thing.

I liked cycles’ changes so I merged them into my repository. They’re available there now if you want to look. I haven’t yet released a new version to CPAN as there are a few things I want to check out first.

Firstly, this new update seems to have increased the version of Moose required. With the version of Moose that I was running on my system (0.88), a lot of the tests were failing. Updating to 0.93 solved that problem. I need to work out exactly what the problem was and update the “requires” line in Build.PL appropriately.

Secondly, the newer version of Moose gave a deprecation warning when used with CHI. Updating to the latest version of CHI fixed that (but that, in turn, meant upgrading a few modules in the Log::Any family). This is all starting to get a bit too close to the bleeding edge of CPAN for a module that I want as many people as possible to use.

Finally, cycles had started to use TryCatch in the module. Not, of course, that I object to high quality exception handling in my code, but this is another module that isn’t yet in general use. It’s something that you won’t find in a “standard” (whatever that means) Perl installation.

I’m in the process of building RPMs of all of the missing modules (or later versions for the modules where the Fedora/Centos build is just lagging CPAN a bit). They’ll be available from rpm.mag-sol.com in the next few days.

Currently I’m leaning towards just releasing the new version and hoping that the people who want to use it will have enough enthusiasm that they won’t complain about the updated and new modules that are required. But I thought it would be interesting to ask for your opinions too.

As a user of CPAN modules, how do you decide when a module is too cutting edge for you to use? Do you just install newer versions of modules automatically when an installation asks you too? Or are you a little more careful than that? Would the constraints in this latest version of Perlanet prevent you from using it?

And as a writer of CPAN modules, how cutting edge do you allow yourself to be? Are you happy to release stuff that only works with the very latest versions of Moose or other fast-moving modules? Or do you like to ensure that your stuff is usable by people who might be a little behind the curve?

I should make it clear that I’m very grateful for the work that cycles did and I’m not disparaging his efforts at all. I’m just dithering a bit about how cutting edge I want to be.

Perlanet Update

Maybe it’s just me, but when I know that people are using my code it galvanises me into improving it. Following the discovery that people were actually using Perlanet, I’ve made quite a few releases over the last week or so. I thought people might be interested in what I’ve been doing.

Release 0.30 was a big one. I incorporated a lot of the improvements that Alex had made in his fork. Most important was probably switching to URI::Fetch instead of LWP::UserAgent. This means that we can now cache the feeds and only re-request them when they have changed.

Release 0.31 documented the caching feature. It also removed some annoying debugging output unless the program was running in a console. I also tweaked the output for failed requests and thereby introduced a nasty bug that wasn’t fixed for some days.

Release 0.32 added some better help for the ‘perlanet’ command line program and added a lot more of Alex’s fixes and improvements. I also tried running the code through both ‘perltidy’ and ‘perlcritic’ and made some changes based on their suggestions.

Release 0.33 featured a much improved test suite. It also fixed the nasty bug that I introduced in 0.31. If you were using a cache and a feed hadn’t changed from the previous request then it wasn’t processed at all. I found this bug whilst working on the test suite. Testing is good, boys and girls!

Release 0.34 went out in the last hour. Overnight I got a bug report about the caching support. I’ve been using CHI to provide caching, and because not everyone wants to use caching I had marked it as an optional dependency in Build.PL. But I was loading it in the code whether or not it was being used. So if you didn’t have it installed, Perlanet didn’t work at all. I’ve now made it truly optional – it’s only loaded if required. And if you try to use caching but don’t have CHI installed everything still works – just without a cache.

That got me thinking. And the version currently in github applies the same principle to OPML support using XML::OPML::SimpleGen. Not everyone wants to generate an OPML file, so I shouldn’t force everyone to install that module. That will be in release 0.35 which will go out in the next couple of days. I’m also thinking of doing the same for the HTML::Tidy[1] and HTML::Scrubber support.

I still have some more of Alex’s patches to apply. But I’m considering how to make things like filter support into an optional add-on. I’ve tried to get some discussion of these features going on the Perlanet mailing list. If you’re interested in Perlanet, please subscribe to the list and get involved in the discussions.

[1] I should point out that Perlanet already has support for HTML::Tidy, but installing HTML::Tidy is a bit of a black art currently. The RT queue seems to imply that the module has been abandoned. Does anyone want to offer to take it over from Andy?

Moose or No Moose

I’ve known about Moose for some time.The first time I talked about it in a training course was at the Teach-In back in the summer of 2007. It’s been part of my training courses ever since.

But even though I was telling people about Moose in my training courses, I wasn’t really making use of it in my own code. Obviously as a freelancer, in my day-job I’m constrained to using whatever technologies my clients are using and I have yet to be paid to work on a project that is already using Moose (although I’ve suggested that a few clients switch to it).

That leaves my own code. Which is largely my CPAN modules. Until recently I hadn’t used Moose for any of them. The first one I released that used Moose was Guardian::OpenPlatform::API. That was a new module which used Moose right from the start. But I knew that eventually I’d want to go back and refactor my existing modules to use Moose wherever it was appropriate.

Having gone to YAPC::Europe this year, I was subjected to another week of people telling me face to face just how cool Moose was. So I decided that it was time to bite the bullet and start refactoring.

The next module I tried was Parse::RPM::Spec. I chose this for two reasons. Firstly, it’s very much an attribute-driven module. Apart from the initial parsing of the spec file, objects of this class exist simply to return the values of their attributes. This makes it a great match for Moose. In fact converting it to use Moose was largely a case of removing code. The second reason for choosing it was more pragmatic – as far as I know no-one is using the module so if I broke anything I wouldn’t have hordes of angry users on my back.

Impressed by how easily that conversion was I moved on to Array::Compare. This module holds a special place in my heart as it was my very first CPAN module. I don’t think it’s a particularly useful module. It’s algorithm is pretty basic and other than the project that I originally wrote it for I’ve never used it in any production code. I often use it to try out new techniques. I released version 2.00 on August 9th (the change of implementation seemed to justify bumping the major version number).

Yesterday I got an RT ticket asking me to stop using Moose. The ticket is pretty clear that for several uses (it specifically mentions command line usage) the extra overhead added by Moose leads to an unacceptable performance hit.

I’m not sure which way to go on this. As a developer I like using Moose. Moose makes it much easier to write object oriented code in Perl. And I think that over the next year or so more and more CPAN developers will be using Moose in their code. Already if you’re writing something reasonably complex in Perl there’s a good chance that you’ll be using a module that uses Moose. As time goes on the percentage of Perl code that relies on Moose will increase. But if it really imposes an unacceptable performance hit for smaller applications, do I really want to force developers into using Moose sooner than they want to?

I think I’m going to put on hold my plans to move stuff to Moose until I’ve thought this through a bit more. But I’d be interested in hearing other people’s opinions. If you’re a CPAN author, are you planning to move your modules to Moose. And if you’re an application developer, have you started to avoid CPAN modules which force you to use Moose?

Let me know what you think.

Update: Well, didn’t this entry start a lot of discussion? As well as the comments here, there have been a number of other posts that reference my post. Here are the ones I’ve seen (in no particular order):

I’ll add more as I come across them. Of course, the more discussion I see, the more unclear my decision becomes.

But thanks for the interest. The discussion has been fascinating.