Symbol::Approx::Sub

You can do some really stupid things with Perl if you put your mind to it. In this article, Dave Cross explains why and how he wrote one of the most pointless modules on CPAN.

This article originally appeared in issue #20 of The Perl Journal.

A version of this article also appeared in Games, Diversions & Perl Culture: Best of the Perl Journal

What is it?

Symbol::Approx::Sub is a Perl module which allows you to successfully call subroutines, even if you spell their names wrong. Using it can be as simple as adding

to your scripts. Once you have done this, you never have to worry about spelling your subroutine names correctly again. For example, running this short script:

you will call the subroutine &foo, even though you have spelt it incorrectly as &few.

Why is it?

This is obviously a very stupid thing to want to do, so what made me decide to write it?

In July 2000 I was at The Perl Conference in Monterey, California and attended Mark-Jason Dominus’ “Tricks of the Wizards” tutorial. In this class, he explains a number of concepts that can take your Perl programs to a new level of complexity and elegance. The most important of these concepts are typeglobs and the AUTOLOAD function. It was the first time that I’d really tried to understand either of these concepts and, thanks to Dominus’ clear explanations, I began to understand their power.

One example that Dominus uses in this class is a demonstration of how you could use AUTOLOAD to catch misspelt function names and perhaps do something about it. He showed a slide containing code something like this:

On the following slide, he goes into some detail about what a really bad idea this would be and how it would make your code completely unmaintainable.

But it was too late. I was already thinking about how I could write was too late. I was already thinking about how I could write the “black box” get_real_name_of_sub function and put into a module which could be used in any Perl program.

How does it work?

During the twelve-hour flight home from Monterey, I thrashed out the implementation details.

Let’s list what needs to happen in order for this module to work:

  • When the module is loaded we need to install an AUTOLOAD function in our calling package.
  • When our AUTOLOAD is called we need to get a list of all of the subroutines in our calling package.
  • We need to compare each of those subroutine names with the one that user called incorrectly and choose the most likely candidate.
  • We then need to call the chosen subroutine

The key to the first two stages was the other main topic of Dominus’ talk – typeglobs.

In a Perl package, the symbol table is an object known as a stash. This object is very similar to a normal hash. The keys are the names of the various objects in the package and the values are references to the typeglobs. A typeglob is a data structure which contains references to each of the objects of that name which exist in the package. Each typeglob contains a space for a reference to one of each of Perl’s data types: scalar, array, hash, filehandle, format and (another) typeglob. You know that in a Perl program you can have variables called $a, @a and %a and they are all completely separate – well now we can see that they would all live in the same typeglob.

The first item in our list is achieved with a useful typeglob trick. You can assign values (which should be references) to the various slots in a typeglob. This has the effect of aliasing the typeglob’s name to the value that you have taken a reference to. For example, if you do this:

then @a will become an alias to @array_with_a_really_long_name and any changes you make to to @a will actually happen to the other array. You do this with any typeglob object (including subroutines). The two objects don’t even have to be in the same package, look at this code:

In this example we create a subroutine called foo in package other. We then alias that subroutine to &main::bar.  This means that within the main package, if we call bar we actually call &other::foo. This is basically the way that the Exporter module works.

Therefore when our module is loaded (i.e. in the import subroutine) we alias our caller’s AUTOLOAD function to one in our module. We know what our AUTOLOAD needs to do, but how do we get a list of subroutines in the calling package?

Let’s look at a simple typeglob example. The next piece of code declares three package variables and a subroutine. We then write a simple foreach loop to print out the contents of the %main:: stash. If you run this program then between the various standard filehandles (like STDIN and STDOUT) and other standard variables (like @INC and %ENV) you’ll see a, b, c and d which are the names of our package objects.

Having established a list of the typeglobs, our next task is to work out which of them contain subroutines. For this we can use a mechanism called the *FOO{THING} syntax. In the same way that scalar names always start with a $ and array names always start with a @, when we refer to that whole typeglob, the name must start with a **FOO therefore refers to the typeglob called FOO (which will contain $FOO@FOO%FOO and &FOO). By using the *FOO{THING} syntax you can find out whether the typeglob FOO contains an object of type THINGTHING can be SCALARARRAYHASHIOFORMAT or GLOB. The next piece of code, therefore, shows which of the typeglobs in our current package contain a subroutine.

We’re now at the stage where we have installed our AUTOLOAD function and when it is called, we know that we can get a list of subroutines. Within the AUTOLOAD function, the name of the function that was called is in the $AUTOLOAD variable. All we need to do is carry out some sort of fuzzy matching on the set of function names and the misspelt function name to find the best match.

Actually, that’s not as simple as it sounds. I didn’t want to have to write my own fuzzy matching algorithm so I decided to borrow someone else’s. Perl comes with a standard library module called Text::Soundex. What this does is to convert any string passed to it to a letter and three digits. This is what I initially used to do my fuzzy matching.

The module gets the soundex value for the misspelt function and then gets the soundex values for each of the subroutines in the caller’s package. If none match then it mimics Perl’s standard “undefined subroutine called” error message. If one matches then that is the required subroutine. But what if more than one match (which is quite possible with a relatively weak fuzzy match like soundex)?

I thought about this for a while before deciding that the only option would be to pick one at random. I really couldn’t see any other reasonable approach.

So that’s pretty much how the original version of the module worked. I called it Sub::Approx and released it to CPAN in the summer of 2000.

People started to talk to me about the module, and one of the most common things they said was “Really interesting idea, but you should do the fuzzy interesting idea, but you should do the fuzzy matching using Some::Other::Module”. After fielding a number of these requests, I decided to do something about it.

Version 0.05 of Sub::Approx included what I called “fuzzy configurability” (or sometimes “configurable fuzziness” – depending on my mood!) To implement this I had a lot of help from Leon Brocard. We decided to make the process of matching a subroutine more modular. To do this we introduced the concept of a matcher. This is a subroutine which is called with the name of a subroutine that we’re trying to match and the list of subroutines in the package. The matcher must return an array of the subroutine names which match the required name by some sort of fuzzy logic. We supplied three standard matchers which used Text::Soundex, Text::Metaphone and String::Approx. You could therefore now use Sub::Approx like this:

and matching would be carried out using Text::Metaphone instead of the standard Text::Soundex.

To make it even more flexible, we allowed you to define your own matching subroutines and use them, by passing a reference to the subroutine to Sub::Approx. This would look like this:

In this case, if your subroutine doesn’t exist, the matcher will search for a subroutine whose name is the reverse of the subroutine you have tried to call.

One last feature was the ability to define your own chooser function. This is the function which decides what to do if more than one subroutine match the name of the called subroutine. This function is passed a list of matching subroutine names and should return the name of the one it chooses. The default chooser still picks one at random, but you can define your own like this:

This example will always choose the first item in the list of matching subroutines.

Symbol::Approx::Sub

This was how things remained until the end of September when I gave a lightning talk on Sub::Approx at YAPC::Europe. After the talk, a number of discussions took place which changed the shape of Sub::Approx. These changes included:

  • RFC 324 was drafted. This suggested that in Perl 6, the AUTOLOAD function should be renamed to AUTOGLOB and called when any typeglob object which doesn’t exist is called. This would allow us to create Scalar::Approx, Array::Approx, etc.
  • A mailing list was set up to discuss Sub::Approx and associated matters. You can subscribe to the list at //www.astray.com/mailman/listinfo/subapprox/.[Note: Don’t look for it now. It hasn’t existed for many years.]
  • The typeglob walking code from Sub::Approx was abstracted out into a new module called GlobWalker. This was so it could be reused in Scalar::Approx and friends. Later, I discovered the Devel::Symdump module on CPAN which did much the same thing and converted to using that.
  • We realised that if we were going to produce Scalar::Approx and friends, we would be “polluting” a number of module namespaces. After some discussion on the modules and subapprox mailing lists we decided on the name Symbol::Approx::Sub (Note: I like the fact that the new name has the word “Symbol” in it as it means that we can also call it The::Module::Formerly::Known::As::Sub::Approx)

This, then, is the current situation. Symbol::Approx::Sub version 1.60 is currently on CPAN.

Robin Houston has started work on a Symbol::Approx::Scalar module. Variables are trickier than subroutines for two reasons.

  • There is currently no AUTOLOAD facility for variable access. Robin is getting round this by tieing the scalar variables.
  • Most variables in most scripts are lexical variables rather than package variables and therefore don’t live in typeglobs. Robin (who knows more about Perl internals that I do) is therefore writing a PadWalker module which does the same for lexical variables as GlobWalker (or Devel::Symdump) does for typeglobs.

You can find early versions of these modules together with a talk that Robin gave London.pm at //www.kitsite.com/~robin/.

Future Plans

On the mailing list, we are already planning Symbol::Approx::Sub version 2.0. Features that have been mentioned include:

  • Separating the matcher component out into two separate stages: canonisation and matching. Canonisation would take a subroutine name and return some kind of canonical version. This may include removing underscores or converting all characters to lower case. This introduces the possibility of chained canonisers, each of which carries out one transformation.
  • A plugin architecture for canonisers, matchers and choosers. This would make it easy for other people to produce their own modules which work with Symbol::Approx::Sub.
  • Trying to find a way around the problem of what happens when our calling package already defines an AUTOLOAD function.

Through all of this, I have yet to find a real use for the module. As far as I can see, it’s simply a very good demonstration of just how far you can go with Perl in doing things that would be almost impossible in other languages.

Of course, if you think you have an interesting use for it, please let the mailing list know.