Perl’s Special Variables

In this beginner level article, Dave Cross looks at some of Perl’s internal variables.

This article was originally the lead article on perl.com in June 2004.

One of the best ways to make your Perl code look more like … well, like Perl code – and not like C or BASIC or whatever you used before you were introduced to Perl – is to get to know the internal variables that Perl uses to control various aspects of your program’s execution.

In this article we’ll take a look at a number of variables that give you finer control over your file input and output.

Counting Lines

I decided to write this article because I am constantly amazed by the number of people who don’t know about the existence of $.. I still see people producing code that looks like this:

For some reason, many people seem to completely miss the existence of $., which is Perl’s internal variable that keeps track of your current record number. The code above can be rewritten as:

I know that it doesn’t actually save you very much typing, but why create a new variable if you don’t have to?

One other nice way to use $. is in conjunction with Perl’s “flip-flop” operator (..). When used in list context, .. is the list construction operator. It builds a list of elements by calculating all of the items between given start and end values like this:

But when you use this operator in a scalar context (like, for example, as the condition of an if statement), its behaviour changes completely. The first operand (the left-hand expression) is evaluated to see if it is true or false. If it is false then the operator returns false and nothing happens. If it is true, however, the operator returns true and continues to return true on subsequent calls until the second operand (the right- hand expression) returns true.

An example will hopefully make this clearer. Suppose you have a file and you only want to process certain sections of it. The sections that you want to print are clearly marked with the string “!! START !!” at the start and “!! END !!” at the end. Using the flip-flop operator you can write code like this:

Each time around the loop, the current line is checked by the flip- flop operator. If the line doesn’t match /!! START !!/ then the operator returns false and the loop continues. When we reach the first line that matches /!! START !!/ then the flip- flop operator returns true and the code in the if block is executed. On subsequent iterations of the while loop, the flip-flop operator checks for matches again /!! END !!/, but it continues to return true until it finds a match. This means that all of the lines between the “!! START !!” and “!! END !!” markers are processed. When a line matches /!! END !!/ then the flip-flop operator returns false and starts checking against the first regex again.

So what does all this have to do with $.? Well, there’s another piece of magic coded into the flip-flop operator. If either of its operands are constant values then they are converted to integers and matched against $.. So to print out just the first 10 lines of a file you can write code like this:

One final point on $., there is only one $. variable. If you are reading from multiple filehandles then $. contains the current record number from the most recently read filehandle. If you want anything more complex then you can use something like IO::File objects for your filehandle. These objects all have an input_line_number method.

The Field Record Separators

Next, we’ll look at $/ and $\ which are the input and output record separators respectively. They control what defines a “record” when you are reading or writing data.

Let me explain that in a bit more detail. Remember when you were first learning Perl and you were introduced to the file input operator. Almost certainly you were told that <FILE> read data from the file up to and including the next newline character. Well that’s not true. Well, it is, but it’s only a specialized case. Actually it reads data up to and including the next occurrence of whatever is currently in $/ – the file input separator. Let’s look at an example.

Imagine you have a text file which contains amusing quotes. Or lyrics from songs. Or whatever it is that you like to put in your randomly generated signature. The file might look something like this.

[text]
This is the definition of my life
%%
We are far too young and clever
%%
Stab a sorry heart
With your favourite finger
[/text]

Here we have three quotes separated by a line containing just the string %%. How would you go about reading in that file a quote at a time?

One solution would be to read the file a line at a time, checking to see if the new line is just the string %%. You’d need to keep a variable that contains current quote that you are building up and process a completed quote when you find the termination string. Oh, and you’d need to remember to process the last quote in the file as that doesn’t have a termination string (although, it might!)

A simpler solution would be to change Perl’s idea of what constitutes a record. We do that by changing the value of $/. The default value is a newline character – which is why <...> usually reads in a line at a time. But we can set it to any value we like. We can do something like this

Now each time we call the file input operator, Perl reads data from the filehandle until it finds %%\n (or the end of file marker). A newline is no longer seen as a special character. Notice, however, that the file input operator always returns the next record with the file input separator still attached. When $/ has its default value of a newline character, you know that you can remove the newline character by calling chomp. Well it works exactly the same way when $/ has other values. It turns out that chomp doesn’t just remove a newline character (that’s another “simplification” that you find in beginners books) it actually removes whatever is the current value of $/. So in our sample code above, the call to chomp is removing the whole string %%\n.

Changing Perl’s Special Variables

Before we go on I just need to alert you to one possible repercussion of changing these variables whenever you want. The problem is that most of these variables are forced into the main package. This means that when you change one of these variables, you are altering the value everywhere in your program. This includes any modules that you use in your program. The reverse is also true. If you’re writing a module that other people will use in their programs and you change the value of $/ inside it, then you have changed the value for all of the remaining program execution. I hope you can seen why changing variables like $/ in one part of your program can potentially lead to hard to find bugs in another part.

So we need to do what we can to avoid this. Your first approach might be to reset the value of $/ after you have finished with it. So you’d write code like this.

The problem with this is you can’t be sure that $/ contained \n before you started fiddling with it. Someone else might have changed it before your code was reached. So the next attempt might look like this.

This code works and doesn’t have the bug that we’re trying to avoid but there’s another way that looks cleaner. Remember the local function that you used to declare local variables until someone told you that you should use my instead? Well this is one of the few places where you can use local to great effect.

It’s generally acknowledged that local is badly named. The name doesn’t describe what the function does. In Perl 6 the function is likely to be renamed to temp as that’s a far better description of what it does – it creates a temporary variable with the same name as an existing variable and restores the original variable when the program leaves the innermost enclosing block. This means that we can write our code like this.

We’ve enclosed all of the code in another pair of braces to create a naked block. Code blocks are usually associated with loops, conditionals or subroutines, but in Perl they don’t need to be. You can introduce a new block whenever you want. Here, we’ve introduced a block purely to delimit the area where we want $/ to have a new value. We then use local to store the old $/ variable somewhere where it can’t be disturbed and set our new version of the variable to %%\n. We can then do whatever we want in the code block and when we exit from the block, Perl automatically restores the original copy of $/ and we never needed to know what it was set to.

For all this reason, it’s good practice to never change one of Perl’s internal variables unless it is localized in a block.

Other Values For $/

There are a few special values that you can give $/ which turn on interesting behaviours. The first of these is setting it to undef. This turns on “slurp mode” and the next time you read from a filehandle you will get all of the remaining data right up to the end of file marker. This means that you can read a whole file in using code like this.

A do block returns the value of the last expression evaluated within it, which in this case is the file input operator. And as $/ has been set to undef it returns the whole file. Notice that we don’t even need to explicitly set $/ to undef as all Perl variables are initialized to undef when they are created.

There is a big difference between setting $/ to undef and setting it to an empty string. Setting it to an empty string turns on “paragraph” mode. In this mode each record is a paragraph of text terminated by one or more empty lines. You might think that this effect can be mimicked by setting $/ to \n\n, but the subtle difference is that paragraph mode acts as thought $/ had been set to \n\n+ (although you can’t actually set $/ equal to a regular expression.)

The final special value is to set $/ to either a reference to a scalar variable that holds an integer, or to a reference to an integer constant. In these cases the next read from a filehandle will read up to that number of bytes (I say “up to” because at the end of the file there might not be enough data left to give you). So you read a file 2Kb at a time and you can do this.

$/ and $.

Note that changing $/ alters Perl’s definition of a record and therefore it alters the behavior of $.. $. doesn’t actually contain the current line number, it contains the current record number. So in our quotes example above, $. will be incremented for each quote that you read from the filehandle.

What About $\?

Many paragraphs back I mentioned both $/ and $\ as being the input and output record separators. But since then I’ve just gone on about $/. What happened to $\?

Well, to be honest, $\ isn’t anywhere near as useful as $/. It contains a string that is printed at the end of every call to print. Its default value is the empty string, so nothing gets added to data that you display with print. But if, for example, you longed for the days of Pascal you could write a println function like this.

Then every time you called println, all of the arguments would be printed followed by a newline.

Other Print Variables

The next two variables that I want to discuss are very easily confused although they do completely different things. To illustrate them, consider the following code.

Now, without looking it up do you know what the difference is between the output from the two calls to print?

The answer is that the first one prints the three elements of the array with nothing separating them (like this – 123) whereas the second one prints the elements separated by spaces (like this – 1 2 3). Why is there a difference?

The key to understanding it is to look at exactly what is being passed to print in each case. In the first case print is passed an array. Perl unrolls that array into a list and print actually sees the three elements of the array as separate arguments. In the second case, the array is interpolated into a double quoted string before print sees it. That interpolation has nothing at all to do with the call to print. Exactly the same process would take place if, for example, we did something like this.

So in the second case, the print function only sees one argument. The fact that it is the results of interpolating an array in double quotes has no effect on how print treats the string.

We therefore have two cases. When print receives a number of arguments it prints them out with no spaces between them. And when an array is interpolated in double quotes it is expanded with spaces between the individual elements. These two cases are completely unrelated, but from our first example above it’s easy to see how people can get them confused.

Of course, Perl allows us to change these behaviors if we want to. The string that is printed between the arguments passed to print is stored in a variable called $, (because you use a comma to separate arguments). As we’ve seen, the default value for that is an empty string but it can, of course, be changed.

This code prints the string 1,2,3.

The string that separates the elements of an array when expanded in a double quoted string is stored in $". Once again, it’s simple to change it to a different value.

This code prints 1+2+3".

Of course, $" doesn’t necessarily have to used in conjunction with a print statement. You can use it anywhere that you have an array in a doubled quoted string. And it doesn’t just work for arrays. Array and hash slices work just as well.

This displays 1 < 2 < 3.

Conclusion

In this article we’ve just scratched the surface of what you can do by changing the values in Perl’s internal variables. If this makes you want to look at this subject in more detail, then you should read the perlvar manual page.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.