Perl for the People

A Modest Proposal for Increasing Perl Usage Using a Combination of Legal and Technological Means

Dave Cross

dave@dave.org.uk
http://www.dave.org.uk

Abstract

This paper addresses the problems of Perl advocacy. We touch on the fact that Perl usage is currently believed to be falling and discuss some of the reasons for this. We then propose a combination of legal and technological steps that can be taken which together will ensure that the usage of Perl will increase once more. This proposals hinge on minorities discrimination legislation and Perl's ability to do almost anything with a program's source code. We survey other programming languages to discover whether they have the same features and finally look at the effects that these solutions will have on the computer industry.

The Problem

Over the last year or so there has been a rising level of concern within the Perl community that we are losing the advocacy battle. During the 1990s, Perl's use as the de facto standard for CGI programming meant that there were huge increases in Perl usage in the corporate world. These gains are now being eaten away as many large corporations are moving away from Perl to technologies like Java Servlets, PHP and even ASP.

There are a number of reasons for this. Two of the most obvious are:

There have been many sterling efforts to combat this trend. One particularly good example of this is O'Reilly's promotion of Perl Success Stories[1]. Also, when the Perl Mongers took over the pm.org domain they made it into a Perl advocacy resource[2]. The Perl Advocacy mailing list[3] is a good place to discuss these initiatives.

The Solution in Brief

This paper proposes a two-pronged attack on this problem that has the potential to dramatically increase the use of Perl across all areas of the computing industry. We suggest using a combination of legal and technological means to bring about this goal.

We will start by considering the kinds of legislation required to define new standards of accessibility that programming languages should reach in a perfect world. We will then look at technological solutions that can be applied to ensure that Perl meets all of these stringent constraints.

Legal Considerations

The legal aspect of this proposal focuses on the rights of minorities within society. One of the major trends over the last twenty-five years of legislation both in Europe and the USA has been that of increased rights for minorities. We intend to use this trend to our advantage. Two particular areas of discrimination that we will address are discrimination against disabled people and discrimination against people from other races.

Disability Discrimination

Most countries of the world now have legislation in place preventing discrimination against disabled people in a number of ways. This usually takes the form of "accessibility legislation". See, for example the United Kingdom's Disability Discrimination Act 1995[4] These laws guarantee disabled people equal access to jobs, to public buildings and, more recently, to web sites. In many of the more enlightened states of the US and countries of Europe, Government web sites must be as useable for visitors using a reading browser as they are for visitors with the latest version of Mozilla. This makes as much sense as ensuring that a public library has an entrance ramp for wheelchair access. For an overview of current legislation in this area, see the Web Accessibility Initiative's "Policies Relating to Web Accessibility" web page[5].

The next obvious step, surely, is to make the same provisions for access to programming languages. The disability that does the most damage in preventing people from writing software is dyslexia or "word-blindness". People suffering from this disease have difficultly in recognising written words and any text that they produce will usually be spelt in a non-standard manner.

It is very satisfying to imagine a world where dyslexics find it as easy to write programs as everyone else. Imagine the huge numbers of potentially talented programmers who are prevented from taking their rightful place in the computer industry simply because programming languages that insist on keywords and other object names are spelt the same way each time they are used. [Footnote: As a side effect, this solution could go a long way to solving the skills shortage that we are currently seeing in the computer industry.]

We would there need to pass laws which demand that programming languages allow a certain degree of "fuzziness" when writing programs. In particular this would need to cover the names of programming "objects" (subroutines and variables) as well as language keywords. Any language that does not provide this level of functionality should be declared illegal as it would be discriminating against people with disabilities.

Racial Discrimination

In a very similar manner to the disabled persons accessibility legislation discussed above, a number of countries have laws in place which prevent people from different races from being discriminated against. Under these laws, for example, it is illegal to refuse to offer a job to a person on the basis of their race. A good example of this kind of law is the United Kingdom's "Race Relations Act", 1976 [6]. In such countries there are also often laws which say that Government information is available in languages other than the official language of the country. My local council, for example, makes all informational leaflets available in a number of Asian languages because of the large number of non-English speakers living in the area. This can also extend to web sites, which should provide the same information in a number of different languages.

Once again, it is easy to see how we can extend this concept to programming languages. Most programming languages are based in English. This clearly discriminates against people whose first language is not English. Many of these people would find programming much easier if they could program in their own chosen language, dialect or patois.

The solution to this problem is very similar to the solution to the dyslexia problem - the programming language must be able to accept spellings of subroutine names, variable names and keywords in various different languages. Again, any language which does not provide this service should be declared illegal.

Technical Considerations

We have seen that in order to be able to pass our rigourous new constraints, a programming language needs to be flexible enough to recognise various syntactical constructions when they are spelt in ways that differ from the norm. Our next task, therefore, must be to ensure that Perl has this flexibility.

We will investigate this in two stages:

Each of these areas will have different solutions.

Subroutine and Variable Names

Let us recap exactly what the requirement is here. When a Perl program accesses a program object (whether it is a subroutine or a variable), this access becomes a request for access to the symbol table where these object are stored. In Perl, each package has its symbol table called a stash and each stash contains references to a number of data structures called typeglobs. Each typeglob contains references to all of the objects with a certain name. For example, the typeglob called *foo contains references to the scalar $foo, the array @foo, the hash %foo, the subroutine &foo as well as the filehandle and format that are both called foo. It is possible to write code which walks a stash and returns a list of the typeglobs within it like this:

  use vars qw($a @b %c);

  sub d { print 'sub d' };

  foreach (keys %main::) {
    print "$_\n";
  }

If you run this code, you will see entries for the three variables and one subroutine that we have defined, interleaved between entries for the built-in Perl variables (e.g. $_, @INC and %ENV) and filehandles (e.g. STDIN and STDOUT).

It is further possible to interrogate a typeglob and find out which kinds of object it contains, using code like this:

  use vars qw($a @b %c);

  sub d { print "sub d" };

  while (my ($name, $glob) = each %main::) {
    print "$name contains a sub\n" if defined *$glob{CODE};
  }

This code snippet looks at all of the typeglobs in the %main:: stash and prints a message if a typeglob contains a subroutine. There is a module on the CPAN called Devel::Symdump[7] written by Andreas König which makes it very easy to carry out these kinds of manipulations.

In Perl 5, when a script calls a subroutine which doesn't exist in the relevant typeglob, the interpreter looks for a subroutine called AUTOLOAD within the current package. If this subroutine is defined then it is called and the special variable $AUTOLOAD is set to the name of the missing subroutine. This is often used for interesting tricks, the most common is probably to use it to "fake" attribute accessor methods in object oriented programming. We can, however, make use of this feature to get us some way towards our goal.

The CPAN module Symbol::Approx::Sub[8] provides a way to use "fuzzy matching" when calling subroutines in Perl programs. The article "Symbol::Approx::Sub (a Perl module for bad typists)"[9] describes the module in some detail so here we will only give a brief overview.

The module installs its own AUTOLOAD function into its caller's namespace. When an unknown subroutine is called, this AUTOLOAD function is called. The AUTOLOAD function uses Devel::Symdump to produce a list of subroutines in the current package and compares this list with the content of the $AUTOLOAD variable. If it finds a suitable match, it calls this subroutine instead.

The fuzzy matching that Symbol::Approx::Sub uses is implemented using a number of techniques "borrowed" from other existing modules. The default matching algorithm uses the standard Text::Soundex module, but the module also supports the use of String::Approx and Text::Metaphone. It is also possible for users to define their own matching algorithm and tell the module to use that. Version 2.0 of the module introduced a "plug-in" architecture which made it far easier to add new matching algorithms.

It is easy to see, therefore, how the idea of Symbol::Approx::Sub takes us some way towards our goal of allowing Perl to pass our stringent new legal standards for programming languages. Certainly for subroutine calls, it would simply be a case of encoding the right kind of spelling mistakes into a Symbol::Approx::Sub matching algorithm. The author has yet to research exactly what these algorithms would need to be, but feels that a task of this nature will be trivial to the massed minds of the Perl community.

This solution, however, only gets us a part of the way to our goal. The AUTOLOAD method is only applicable to subroutine calls. Other objects that live in a typeglob do not have a similar mechanism to deal with bad spelling. This hampers us in attempts to write modules called Symbol::Approx::Array, Symbol::Approx::Hash or the more generic Symbol::Approx::Typeglob. This is an obvious omission in the design of Perl 5 and it is something that we hope to address in Perl 6. Some colleagues of the author have written an Perl 6 RFC on this topic, called "Extend AUTOLOAD functionality to AUTOGLOB"[10]. It is RFC 324 and there is currently no reason to believe that it won't be incorporated into Perl 6.

The absence of the AUTOGLOB subroutine from Perl 5 doesn't, however, completely prevent us from working on Symbol::Approx::Scalar and its siblings. Robin Houston has made some admirable progress on writing Symbol::Approx::Scalar. This is harder than the subroutine version not only because of the absence of an AUTOLOAD mechanism for variables, but also because in most Perl programs the majority of variables don't actually live in a typeglob as described earlier.

Perl, in fact, has two storage mechanisms for variables. The variables that are stored in typeglobs as described above are known as package variables. Another type of variable called a lexical variable is far more common in most Perl programs. These are variables that are introduced using the keyword my. Instead of living in a typeglob, they live in a different data structure called a pad.



Robin has plans to ensure that Symbol::Approx::Scalar can use the same plug-ins as Symbol::Approx::Sub. You can see Robin's "work in progress" together with the slides of a talk that he gave to London.pm on the project at <http://www.kitsite.com/~robin/>.

So you can see that in Perl it is possible (if not necessarily easy) to deal with fuzzy matching on the names of subroutines and variables. This gets us a certain distance towards our goals, but we still have to deal with the question of language keywords. This is a completely different (and far more complex) issue.

Language Keywords

Whilst is it possible to address the misspelling of subroutine and variable names using various clever tricks with typeglobs and AUTOLOAD as discussed above, it is far harder to deal with things like the languages built-in functions and keywords. There is no standard mechanism to interrupt a call to substr or the setting up of a while loop. By looking on CPAN, however, we can start to form a plan of attack for this problem.

On CPAN you will find the Filter bundle of modules by Paul Marquess[11]. This is a bundle of modules which allows you to write source filters, modules which will filter your program's source code before it is seen by the Perl compiler. More recently, Damian Conway has added a Filter::Simple module[12] which provides a simplified interface to the original module.

A extreme version of the kind of module that can be written using source filters is Damian Conway's Lingua::Romana::Perligata[13] which translates source code in Latin into a Perl script before executing it. This is of particular interest to us because this filter does not simply translate Latin terms into Perl syntax. It also deals with the fact that Perl (like most programming languages) uses the position of a term within an expression to determine its role whilst Latin relies on a complex system of word endings to do the same thing. Conway's paper[14] on this explains this distinction in greater detail. This kind of translation is surely far more complex than anything that we will need to achieve?

This is probably not true. Remember that Perligata (like Latin) is a case-based language. You can determine the role of a term within an expression by using its ending. The set of allowed endings is completely regular and is a finite set. When dealing with the set of potential misspellings in our situation we don't have that luxury. It is very likely, for example, that people whose first language isn't English may change the order of the terms that make up a Perl statement. A good example is that native German speakers may want to move subroutine calls (the equivalent of natural language verbs) to the end of the statement.

This problem becomes less intractable if we address it in stages. For a first pass of the solution, we will assume that the various terms in a statement appear in the same place as they would do in the correctly spelt version. This then simplifies the problem by an order of magnitude or two. We simply have to match the terms in the "munged" statement with the part of a correct Perl statement. Very little work has currently been done in this area as it is to be hoped that it will become a great deal simpler once Damian Conway releases his long-awaited Parse::Perl module. The first goal will be to produce a module called Parse::Perl::Approx. Hopefully this module will be able to use some of the same plugins as the Symbol::Approx::foo series of modules. At this stage will should be able to deal with statements like:

  four (1 .. 100) {
    pritn "$_\n";
  }

or

  wile () {
    phus @arr, substring($_, 72);
  }

The next stage will be to parse any malformed and mispelt statement into a valid Perl statement. It is felt, however, that the general solution to this problem is some way off.

Other Languages

Having made great efforts to ensure that Perl is able to conform to the stringent new laws that we are proposing, it is only fair to survey other languages and see which of them have a comparable level of flexibility. In order to research this, the following message was posted to a number of Usenet newsgroups earlier this year:

  Newsgroups: comp.lang.c,comp.lang.c++,comp.lang.java.advocacy,
              comp.lang.python,comp.lang.ruby,comp.lang.tcl,
              microsoft.public.vb.general.discussion
  Subject: Weird Language Features
  
  [Please watch the replies on this message as it's heavily cross-posted]

  I'm doing some comparisons on programming language features and I'd be very 
  interested to know how you would handle the following scenarios in your 
  programming language of choice.

  1/ The programmer calls a function that doesn't actually exist within the 
  application (or libraries). Is the a feature whereby the programmer can 
  create a "catch-all" function which is called in cases like these? Can this 
  function examine the list of existing functions and call the most 
  appropriate one? Or  create a new function on the fly and install it into 
  the application?

  2/ Can you filter the input source code before compilation (or 
  interpretation) in some way so that language keywords could be changed 
  for other strings? Imagine you wanted to allow someone to program your 
  language of choice in, say, French. How would you go about translating 
  French keywords into ones that the compiler (or interpreter) could 
  understand. What if the translation wasn't one-to-one or fixed? Could 
  you put enough intelligence into the translator so that it could handle 
  certain strings differently depending on where they appeared in the source 
  code?


  If you're wondering why I'm inventing these bizarre scenarios, it's for a 
  paper I'm writing for this year's Perl Conference.  Perl does have these 
  features (see the AUTOLOAD function and source filters) and I'm interested 
  in seeing how widespread they are in other languages.

  Of course, if you'd like to tell me just why you consider it's a good thing 
  that your language of choice doesn't have these features, then I'd be only 
  too  happy to hear that too.

  I'd just like to make it clear that I'm not interested in getting into "my 
  language is better than your language" types of flamewars. I'm certainly not 
  trying to argue that Perl is better than other languages for having these 
  features.

  Thanks for your time.

  Dave...

The results from the various groups were very interesting and are summarised below.

C

Most people in the C newsgroups seemed to think that this was a very strange thing to want to do. There was some discussion about being able to use the C pre-processor to achieve some of these aims, but this obviously isn't anywhere nearly as powerful as Perl's source filters. The final verdict from the group was that if you really wanted to do these kinds of things then you would write a program that pre-processed the source code before compiling it.

C++

There was a very similar response from the C++ group, with perhaps a touch more disbelief that anyone would actually waste their time doing this.

Java

There was a distinct lack of interest in this subject in the Java group, but the couple of responses received were very similar to those from the C and C++ groups.

Python

The Python group seemed a little more interested in the subject. This is possibly because of the "old enemy" feeling towards Perl - "anything Perl can do, we can do better". I was told that if a missing function was called, Python would throw an exception which the programmer could handle in any appropriate manner - this could include searching the symbol table for an alternative function.

Ruby

The Ruby group gave possibly the best answers of all of the newsgroups, in fact one member of the group, Rob Partington, had already implemented a Ruby module called subapprox.rb[15] which simulates Symbol::Approx::Sub for Ruby programmers. Ruby has a method_missing feature which works very similarly to AUTOLOAD.

Tcl

The Tcl group was another one that suffers from some good-hearted rivalry with the Perl community. Tcl also implements an AUTOLOAD-like feature called unknown.

Visual Basic

There was no response from the Visual Basic group.

Language Summary

It seems that the languages surveyed split pretty well into two groups. The "scripting" languages (Python, Ruby and Tcl) all share a lot of the same features as Perl. On the other hand the "proper" programming languages (C, C++ and Java) don't have similar features and, perhaps more importantly, the programmers using those languages showed far less inclination to discuss these scenarios.

This probably says a lot about a) the language designers and b) the people attracted to the various languages, but exactly what it says is probably left to another paper.

It is likely therefore that when our new legal constraints come into force, an number of the other "scripting" languages will also be able to pass the tests. Perl will not have the computing industry all to itself, but the playing field will have certainly become a lot more even.

Implementation Plans

We will now take a look at exactly how we can implement these plans and bring about the proposed changes. It seems obvious the that legal changes will take longer to bring about than the technological ones, so we will look at those first.

Legal Timetable

If we can get the proposed legislation passed in the USA and Europe, then many other countries will follow. It is probably best, therefore, to concentrate our efforts in these areas.

USA

The USA currently has a Republican President and this has to be seen as giving us very little chance of getting this legislation passed during this presidential term. We should, therefore, be looking towards the 2004 presidential election as the launch pad for our campaign. We should try to ensure that whoever wins in 2004 is committed to our cause. We are at an advantage here, because the Democrats are most likely to be sympathetic to the "equal rights" part of our message and are also looking for good reasons for people to vote them back into power next time. Here are a couple of suggestions of actions that Perl Mongers can be taking in the meantime.

If we follow these simple suggestions then there is a good chance that in 2005 you will inaugurate a President who is aware of the issues and is willing to act on them.

Europe

It might seem a tricky task to sway the governments of the fifteen member states of the Europe Union, but in reality that is not necessary. France, Germany and the UK are the most powerful states within the Union and once those governments are on board, it will be simple to force the others to follow. I will concentrate here on the UK. French and German Perl Mongers may like to adapt these plans to their own circumstances.

As this paper is being written, the UK is on the verge of a general election, which will have taken place by the time the paper is presented. It seems very likely that the Labour Party will have won - albeit with a reduced majority. British Perl Mongers therefore need to address themselves to the next general election in 2004/5. Like the US, however, we can already be doing some useful groundwork.

If we stick to this plan, then in four or five years time we will governments in place in both the US and Europe who are sympathetic to our cause. This is, of course, only half the battle, as we then need to force them to actually enact the laws we need. We should aim to get the laws passed as early as possible in the term of the government and must be prepared to lobby strongly for this to happen. Remember that once one or two countries change their laws then others will follow.

According to this schedule, then, we hope to have the required legal framework in place by 2005 or 2006. This gives use plenty of time to look at what needs to be done from the technical point of view.

Technical Timetable

It is harder to predict the technical timetable as in a lot of cases we are dependent on the development of Perl 6. Most of the things we need to do can be achieved with current versions of Perl, but many of them become far easier with Perl 6 - as long as certain RFCs are accepted.

There is already a mailing list set up for the discussion of Symbol::Approx::Sub and associated topics[16] and this list will be a useful central co-ordination point for all of our efforts.

The first task is to write plug-ins for Symbol::Approx::Sub which support the kind of fuzzy matching that is required. Research will be required in to the kinds of mistakes that the minorities in question are liable to make when writing English. These rules will then be coded up in Perl. As mentioned before the plug-in architecture aims to ensure that the same plug-ins can be used for Symbol::Approx::Sub, the other Symbol::Approx::foo modules and even Parse::Perl::Approx.

Having proved that these plug-ins work using Symbol::Approx::Sub as a test bed we will be ready to try them in the other Symbol::Approx::foo modules. By this time we will know more about the shape of Perl 6 and it will be clear whether or not RFC 324 will be included. If it is, then writing the other Symbol::Approx::foo modules becomes almost trivial as they all inherit most of their behaviour from the original module. If not, then we will need to refine and enhance the methods that Robin Houston is already devising for Symbol::Approx::Scalar. It is difficult to see this stage of the process going on beyond the end of 2002. At this stage will be able to handle all subroutine and variable names - this, in itself, will be a useful achievement.

Our next major task is the development of the Parse:Perl::Approx module. As discussed above, this will be a subclass of Parse::Perl. As I write, Parse::Perl is still in the planning stages. Damian Conway is fond of saying that his modules will be "ready by Christmas", but planning can be a little difficult as he is loathe to tie it down to any particular Christmas.

Nevertheless, we shall assume that by the end of 2002, we will have some version of Parse::Perl which we can then pull apart and recast as Parse::Perl::Approx. Don't assume that this will be a trivial task - it could well be the end of 2003 before we have something which handles all possibilities.

This still leaves us with a year or two before the legal timetable puts our legislation in place. How will we spend this time? Testing and debugging our new modules with the help of as large a part of the Perl community as possible. It may seem an anathema to the careful programmers of the Perl community, but we will be begging to write programs with as many typos in as possible. We will want to know that our modules work under every possible circumstance. Perl 6 may well be available by this point so we'll be able our modules under both Perl 5 and Perl 6. One rather extreme test might be to see how well other modules cope with a Perl 6 program under Perl 5 and vice verse.

The deadline for this project is, however, determined by the legal considerations discussed above. By the time of the US and UK elections in 2004 or 2005 we need to be in a position to deal with the forthcoming new laws.

The Future - A World Without Java

It's worth stopping a while to imagine what our brave new world will be like. Many programming languages will vanish completely. Of course this change won't happen overnight. There will need to be a changeover period whilst all the existing C, COBOL and Visual Basic programs are converted to Perl. During this time, Perl programmers will be as popular as COBOL programmers were at the end of 1999. Our market value will rocket and we'll all be able to gold plate our cats. Of course converting all of these programs to Perl won't be a massive job. I'd estimate that it will take about two weeks.

Further ahead, once the changeover has been completed, the real benefits of our legislation will be seen.

Acknowledgements

A number of people have been involved in discussions which lead to the ideas in this paper. I'd particularly like to thank:

References

  1. Perl Success Stories <http://perl.oreilly.com/news/success_stories.html>
  2. Perl Mongers Advocacy Resource Site <http://www.pm.org>
  3. Perl Advocacy Mailing List <http://lists.perl.org/showlist.cgi?name=advocacy>
  4. UK Disability Discrimination Act 1995 <http://www.hmso.gov.uk/acts/acts1995/1995050.htm>
  5. "Policies Relating to Web Accessibility ", Web Accessibility Initiative, <http://www.w3.org/WAI/References/Policy>
  6. UK Race Relations Act 1976 <http://www.homeoffice.gov.uk/racerel1.htm>
  7. König , Andreas, Devel::Symdump < http://search.cpan.org/search?mode=module&query=Devel%3A%3ASymdump>
  8. Cross, David, Symbol::Approx::Sub < http://search.cpan.org/search?mode=module&query=Symbol%3A%3AApprox%3A%3ASub>
  9. Cross, David, Symbol::Approx::Sub (a Perl module for bad typists) <http://www.mag-sol.com/Articles/s_a_s.html>
  10. Cantrell, David & Adler, David, RFC 324 "Extend AUTOLOAD functionality to AUTOGLOB" <http://dev.perl.org/rfc/324.html>
  11. Marquess, Paul, Filter <http://search.cpan.org/search?dist=Filter>
  12. Conway, Damian, Filter::Simple <http://search.cpan.org/search?dist=Filter-Simple>
  13. Conway, Damian, Lingua::Romana::Perligata < http://search.cpan.org/search?mode=module&query=Lingua%3A%3ARomana%3A%3APerligata>
  14. Conway, Damian: "Lingua::Romana::Perligata - Perl for the XXI-imus Century", in "Proceedings of the Perl Conference 4.0", O'Reilly, 2000.
  15. Partington, Rob, subapprox.rb <http://frottage.org/rjp/ruby/subapprox.html>
  16. Symbol::Approx::Sub mailing list <http://www.astray.com/mailman/listinfo/subapprox/>