[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ next ]


Some Mini-Howtos of Interest
Chapter 5 - Perl Language


This chapter has been translated into Spanish language by Maria Ramos from Webhostinghub.com/support/edu.


5.1 Installing a CPAN module

There are two possible ways to install a CPAN module. We give both alternatives.


5.1.1 First alternative for installing a CPAN module

Download the CPAN module (we use the module Devel-SmallProf-2.02 as an example) and untar it

     # tar xzvf Devel-SmallProf-2.02.tar.gz
     # cd Devel-SmallProf-2.02
     Devel-SmallProf-2.02# ls
     Changes  MANIFEST  META.yml  Makefile.PL  README  TODO  lib  t

Compile and install the module

     Devel-SmallProf-2.02# perl Makefile.PL
     Checking if your kit is complete...
     Looks good
     Writing Makefile for Devel::SmallProf
     
     Devel-SmallProf-2.02# make
     cp lib/Devel/SmallProf.pm blib/lib/Devel/SmallProf.pm
     Manifying blib/man3/Devel::SmallProf.3pm
     
     Devel-SmallProf-2.02# make test
     PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
     t/part1....ok
     t/part2....ok
     t/part3....ok
     t/part4....ok
     t/pods.....skipped
             all skipped: Only the author needs to check that POD docs are right
     All tests successful, 1 test skipped.
     Files=5, Tests=14,  1 wallclock secs ( 0.08 cusr +  0.01 csys =  0.09 CPU)
     
     Devel-SmallProf-2.02# make install
     Installing /usr/local/share/perl/5.8.8/Devel/SmallProf.pm
     Installing /usr/local/man/man3/Devel::SmallProf.3pm
     Writing /usr/local/lib/perl/5.8.8/auto/Devel/SmallProf/.packlist
     Appending installation info to /usr/local/lib/perl/5.8.8/perllocal.pod

5.1.2 Second alternative for installing a CPAN module

In this case we use an interactive shell that we launch as

     # perl -MCPAN -e shell

The first time the shell is launched the system has to be configured and upgraded. The system prompts for several programs (unzip, etc.). Install them if they are not already installed in the computer. Select a repository (in my case ftp://ftp.rediris.es/mirror/CPAN) and most questions can safely answered using the default choice.

The first thing to do after configuration is to upgrade your CPAN:

     cpan> install Bundle::CPAN
     
     
     CPAN: Storable loaded ok
     Fetching with LWP:
       ftp://ftp.rediris.es/mirror/CPAN/authors/01mailrc.txt.gz
     Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
     CPAN: Compress::Zlib loaded ok
     Fetching with LWP:
       ftp://ftp.rediris.es/mirror/CPAN/modules/02packages.details.txt.gz
     .
     .
     .

Then you reload it:

     cpan> reload cpan

And install the required module, Roots in this example:

     cpan> install Math::Function::Roots
     
     .
     .
     .
     
     Appending installation info to /usr/lib/perl/5.8/perllocal.pod
       /usr/bin/make install  -- OK

5.2 Interesting perl oneliners

Updated on March 5th, 2017.

Updated on October 29th, 2016.

Updated on May 21st, 2015.

Updated on January 31st, 2015.

  1. Execute a program, in this case epstopdf, using as an input all files sharing an common extension , in this case eps, in the current directory.

         perl -e 'system "epstopdf $_" for (glob "*.eps");'
    
  1. Erasing "phantom" files with 0 bytes size.

         perl -e 'foreach (glob "*") \
         {unless (-s $_) {"Deleting $_\n";unlink "$_";}}'
    
  1. Checking Postscript files referred in a LaTeX output message.

         latex filename.tex | perl -e \
         'while (<>){foreach (split) {/<(.*?\.eps)>/ and push(@eps, $1)}};\ 
         foreach (sort @eps) {print;print "\n"}'
    
  1. Print apostrophe character.

         perl -le 'print "'\'' is an apostrophe..."'
    
  1. Changing a text file from UTF-8 encoding to ASCII. Note that it does not work for Spanish accented characters.

         perl -ne 'for (unpack "U*", $_) \ 
         { printf $_ > 128 ? "x" :  "%c", $_ }' fileUTF.txt > fileASCII.txt
    
  1. Adding an E character to the output of Fortran programs with three digits in the exponent that lacks this character.

         perl -pi'.bak' -e 's/(\d)-(\d\d\d)/$1E-$2/g' fort.output
    
  1. Delete empty lines in a file.

          perl -ni -e 'chomp($_);print "$_\n" if ($_)'test.dat
    
  1. Makes a substitution in a file, in the selected example substitute the word figures by Figures in all files with extension tex in the current directory.

          perl -pi'.bak' -e 's/figures\//Figures\//g' *.tex
    
  1. Counting the number of words per line in a file. If the filename is test.out then

         perl -n -e 'my @line = split;print scalar @line,"\n"' test.out 
         10000
         202
         10000
         202
         10000
         202
         10000
         202
         10000
         202
    
  1. Join even lines at the end of odd lines in a file. If the filename is test.out then we can count the number of words in each line prior and after the merging.

         perl -n -e 'my @line = split;print scalar @line,"\n"' test.out 
         10000
         202
         10000
         202
         10000
         202
         perl -n -e 'chomp; $.%2 ? print "$_ ": print "$_\n";' test.out > test2.out 
         perl -n -e 'my @line = split;print scalar @line,"\n"' test2.out 
         10202
         10202
         10202
    
  1. Search and display occurrences in a log file (/var/log/loginlog.0 in this example) of succesful logins of user curro, showing the number of times the user has logged from each machine.

          
         perl -e 'while (<>) {
         if (m|\d+:\d+:\d+\s+(.*?)\s+.*ccepted.*curro\s+from\s+(.*?)\s+.*|) 
          {$vh{"$1 from $2"}++;} } 
          foreach (keys %vh) 
           {print "$vh{$_} login(s) to $_\n";}'/var/log/loginlog.0
    
  1. Count the number of files in a directory.

    In the first example we use globbing to count the total number of files including hidden files or the number of files subject to some restriction.

          
         $ perl -e 'my @files = glob "* .*"; print 1+$#files."\n"'
         129
         $ perl -e 'my @files = glob "1*gif"; print 1+$#files."\n"'
         9
    

    The same can be accomplished using directory handles and grep

          
         $ perl -e 'opendir DH, ".";my @files = readdir DH; print 1+$#files."\n"'
         129
         $ perl -e 'opendir DH, ".";my @files = grep /^1.*\.gif$/, (readdir DH); print 1+$#files."\n"'
         9
    
  1. Get the last line of a file.

    We include a oneliner that gets the last line of a series of files

         perl -e 'foreach (@ARGV) {my $line = `tail -n 1 $_`; print $line}' output_notes_1* output_notes_2* output_notes_3* ...
    

    In the original application the line was prepended with a number appearing in the filename as follows

         perl -e 'foreach (@ARGV) {/.*(\d\d).*/;my $line = $1.`tail -n 1 $_`; print $line}' output_notes_1* output_notes_2* output_notes_3* ...
    
  1. Change Mac carriage return to UNIX new line

    Let's assume we have a bunch of csv files with Mac carriage return that our system interprets as a very long unique line. Instead of using emacs we can easily fix this with

         perl -pi -e 's/\r/\n/g' *.csv
    
  1. Extract the figure names from a LaTeX compile output and prepare a tarball with the figure files.

    We assume that we compile a file called rdiary_2014.tex and all figs are in a directory called Figs and are png files. We use two pipes, the first one connect the output of pdflatex with a perl oneliner that reads the standad input and extract the file names. A second pipe sends to tar the file names. Notice the -T - options.

         pdflatex rdiary_2014.tex | perl -e 'while (<>) {print "$1\n" if /<.*(F.*png).*>/g}' | tar czf figs.tgz -T -
    
  1. We have two files, each with two columns of data in an X Y format. Let's say File_1 with X1 Y1 and File_2 with X2 Y2. The abscissa are common and we want to create a third file with called File_3 with three columns and the following format: X1 Y1 Y2.

          
         perl -e '@fhs=map { open my $fh, "<", $_; $fh } @ARGV; $f0 =
         $fhs[0]; $f1 = $fhs[1]; while (!$done) { $done=1; chomp($l0=<$f0>); $l1=<$f1>; do { @ll1 = split " ", $l1; print "$l0 $ll1[-1] \n"; $done=0 } if (defined $l0); } ' File_1 File_2
    

5.3 Environment Codification and Character Ordering

The following short script permit to test a terminal codification.

       
     #!/usr/bin/perl
     use warnings;
     use strict;
     use Encode;
     my @charsets = qw(utf-8 latin1 iso-8859-15 utf-16);
     # some non-ASCII codepoints:
     my $test = 'Ue: ' . chr(220) .'; Euro: '. chr(8364) . "\n";
     #
     for (@charsets){print "$_: " . encode($_, $test);}

Once the script is run, different lines appear, and the terminal charset is the one of the line correctly displayed. For example, if we execute the script in a terminal using the UTF-8 coding system the output is something similar to

       
     $ encodings.plx 
     utf-8: Ue: Ü; Euro: €
     latin1: Ue: �; Euro: ?
     iso-8859-15: Ue: �; Euro: �
     utf-16: ��Ue: �; Euro:  �

Note that both special characters, Ü and the euro symbol, only appear in correcto form in the utf-8 charset line.

The function chr in Perl takes a number as an argument and returns the character represented by that NUMBER in the selected character set. The function encode allows the codification of the character is different character sets. The four most common character sets are the ones included in the former example.

Another problem with character sets arise when ordering alphabetically a set of words of characters when the characters in the set are not the standard 127 ASCII character. For example, suppose that we are trying to order alphabetically the following set of names

     Álvarez
     Mínguez  
     Pérez
     Perales
     Pilar
     Mola
     Borrero
     Díaz
     Diz
     Delgado
     Cuesta
     Castro
     Cáñamo

A standard program, comparing with the cmp function is as follows

     #!/usr/bin/perl
     use strict;
     use warnings;
     #
     my @names;
     #
     while (defined(my $line = <>)) {
         chomp($line);
         my $elem = push(@names,$line);
         print "$elem element(s) added\n";
     }
     #
     print "Reading process finished. Sorting ... ";
     #
     print "Done.\n\n";
     print "Sorted set of names:\n";
     foreach (sort by_name @names) {
         print "\t$_\n";
     }
     #
     sub by_name {$a cmp $b}

However, when we run the program we obtain the somewhat surprising output

     Sorted set of names:
     	Borrero
     	Castro
     	Cuesta
     	Cáñamo
     	Delgado
     	Diz
     	Díaz
     	Mola
     	Mínguez  
     	Perales
     	Pilar
     	Pérez
     	Álvarez

Clearly this is not the expected output if we intend to sort alphabetically (using Spanish sorting rules). The reason of this unexpected behavior is that the cmp function compares non-ASCII chars by codepoint number[9], which might give unexpected results. In order to sort according to a particular languague convention we should use the locale pragma. The previous program can be rewritten as follows

     #!/usr/bin/perl
     use strict;
     use warnings;
     #
     ##########
     use locale;
     use POSIX qw(locale_h);
     setlocale(LC_COLLATE, 'es_ES@euro') or die "Locale es_ES\@euro not installed.\n";
     ##########
     #
     #
     my @names;
     #
     while (defined(my $line = <>)) {
         chomp($line);
         my $elem = push(@names,$line);
         print "$elem element(s) added\n";
     }
     #
     print "Reading process finished. Sorting ... ";
     #
     print "Done.\n\n";
     print "Sorted set of names:\n";
     foreach (sort by_name @names) {
         print "\t$_\n";
     }
     #
     sub by_name {$a cmp $b}

After this change the word order is the usual one in Spanish.

     Sorted set of names:
             Álvarez
             Borrero
             Cáñamo
             Castro
             Cuesta
             Delgado
             Díaz
             Diz
             Mínguez  
             Mola
             Perales
             Pérez
             Pilar

5.3.1 References

  1. http://perlgeek.de/en/article/encodings-and-unicode

  1. http://perldoc.perl.org/perllocale.html#USING-LOCALES


5.4 Extracting matches from a regular expressions

This can be easily done using the grouping metacharacters '()'. They allow the extraction of the parts of a string that matched the imposed condition. Each grouping marked by parentheses goes into a special variable $1, $2, etc. They can be used as ordinary variables.

If we want to extract the day, month and year from a date expressed as dd/mm/yyyy we can do the following

      # extract day, month, year
            if ($date =~ m!(\d\d)/(\d\d)/(\d\d\d\d)!) { # match dd/mm/yyyy format
                $day = $1;
                $month = $2;
                $year = $3;
            }

Note the use of the pattern match operator m!! to change the standard pattern delimiters. We can rewrite in a shorter form the previous code, taking advantage of the different behavior of the binding operator in scalar and list contexts.

In scalar context the binding operator returns a true or false value.

     $answer = $date =~ m!(\d\d)/(\d\d)/(\d\d\d\d)!

Thus $answer equals to one or zero. In list context, however, the binding operator returns the list of matched values ($1, $2, $3, ...). Thus we can abbreviate the previous code as

     ($day,$month,$year) = ($date =~ m!(\d\d)/(\d\d)/(\d\d\d\d)!)

If the groupings in a regexp are nested, $1 gets the group with the leftmost opening parenthesis, $2 the next opening parenthesis, etc.

For more information: man perlretut.


5.5 Basic use of fork to launch a program

Apart from the system utility, a Perl script can launch child processes using the fork utility. Let's assume that we are interested in launching applications, called fort_1 and fort_2 from a script, but we are not interested in waiting for the end of the application. Using fork we can do the following in our script

     defined(my $pid0 = fork) or die "Cannot fork: $!";
     unless ($pid0) {
       # Child 0 process is here
       exec "fort_1";
       die "cannot exec fort_1: $!";
     }
     defined(my $pid1 = fork) or die "Cannot fork: $!";
     unless ($pid1) {
       # Child 1 process is here
       exec "fort_2";
       die "cannot exec fort_2: $!";
     }
     print "Program output: \n";
     # Parent processes are here
     # script continues ...
     waitpid($pid0, 0);
     waitpid($pid1, 0);

Only the parent process has a non-zero value in $pid0 and $pid1 and skip the two unless conditionals. The program arrives to the waitpid function. This function waits for a particular child process to terminate and returns the pid of the deceased process. It is important to do so in order to get rid of zombie processes.

For more information: man perlipc.


5.6 Perl predefined variables. Some examples.

Apart from the ubiquitous Perl default variable, $_ there is a large number of useful predefined variables. We give some examples of them in the following short codes.

  1. $.

    Current line number for the last filehandle accessed.

    The following code displays each line of the file and the corresponding line number.

         #
         open(INPUT,"</etc/motd") or die "/etc/motd: $!";
         #
         while (<INPUT>) {
             print "Line $.: $_";
         }
    
  1. $0

    Name of the program being executed.

    The following code removes directories preceding the program name and stores it in a variable called $prgname

         #
         (my $prgname) = $0 =~ m#.*/(.+$)#;
         #
    

For more information: man perlvar.


5.7 Using a named pipe for interprocess communication in Perl

A named pipe (or fifo file) can be used for interprocess communication between a parent process and a child process or children processes. Let's suppose that we forked and launched a couple of child processes Basic use of fork to launch a program, Section 5.5 and we want to check whether each of the child processes has finished. Once the first process finishes we execute the waitpid. It is not efficient to directly execute waitpid because we do not know which of the processes will finish first.

A commented sample of code that manages to do so, launching a couple of child processes and waiting for each of them to finish is the following

     #!/usr/bin/perl
     #
     # named pipe use for ipc example
     #
     # by Currix TM
     #
     use strict;
     use warnings;
     use POSIX qw(mkfifo);
     #
     # fifo definition
     my $FIFOname = ".prgfifo";
     unless (-p $FIFOname) { # Create the pipe if it doesn't exist
         unlink $FIFOname;
         mkfifo($FIFOname, 0700) or die "mkfifo in the current directory failed: $!";
     }
     #
     my @pid;
     #
     print "This is the parent process before forking with pid $$\n";
     #
     defined ($pid[0]=fork) or die "Cannot fork (1): $!";
     #
     # 
     unless ($pid[0]) {
         print "fork1 pid: $pid[0]\n";
         print "fork1 ps: $$\n";
         sleep 10; # Sleeeeeeeping
         system "cat /etc/motd";
     #   Child process ended. Write process number in the FIFO
         open (FIFO, ">$FIFOname") || die "can't write prgfifo: $!";
         print FIFO "$$";
         sleep 2;    # to avoid dup signals
         print "Exiting child 1\n";
         exit(0) # Remember to cleanly close the child process
     } else {
         print "This is the parent process after forking 1 with pid: $pid[0]\n";
     }
     #
     defined ($pid[1]=fork) or die "Cannot fork (2): $!"; 
     #
     unless ($pid[1]) {
         print "fork2 pid: $pid[1]\n";
         print "fork2 ps: $$\n";
         sleep 5; # Sleeeeeeeping
         system "cat /etc/fstab";
     #   Child process ended. Write process number in the FIFO
         open (FIFO, ">$FIFOname") || die "can't write prgfifo: $!";
         print FIFO "$$";
         sleep 1;    # to avoid dup signals
         print "Exiting child 2\n";
         exit(0); # Remember to cleanly close the child process
     } else {
         print "This is the parent process after forking 2 with pid: $pid[1]\n";
     }
     #
     print "This are the pids from the parent process after forking: $pid[0], $pid[1]\n";
     #
     my $iprocess = 0;
     open (FIFO, "<$FIFOname") || die "can't read prgfifo: $!";
     while (1)  {
         my $kidpid = <FIFO>;
         if (defined $kidpid) {
     	print "child process $kidpid ended\n";
     	sleep 2; # To avoid dup signals again
     	waitpid($kidpid, 0);
     	last if ((++$iprocess) == 2);
         }
     } 
     #
     print "The two child processes have finished. Closing the parent process.\n";
     # 
     unlink("$FIFOname"); # Remove the named pipe

For more information: man perlipc and references below.


5.7.1 References

  1. Perldoc website


5.8 CperlMode in Emacs

The CPerlMode can be set as the standard mode for editing Perl adding the following line to the .emacs configuration file:

     (defalias 'perl-mode 'cperl-mode)

To access the documentation about the mode use the describe-mode function by typing C-h m when in CPerlMode. When not in CPerlMode use M-x describe-function RET cperl-mode or C-h f cperl-mode.


5.8.1 References

  1. Emacs wiki


5.9 Using Perl to benchmark code.

The Benchmark module included in the base Perl distribution includes a series of procedures to to benchmark running times of code.

Some of the available procedures are the following[10]

The procedures marked with an asterisk (*) are not included by default and should be explicitly loaded.

Two of the most useful options are timethese and cmpthese.

The timethese procedure runs several chunks of code several times. The syntax is

     timethese($count, {
     'Name1' => sub { ...code1... },
     'Name2' => sub { ...code2... },
     });

If the argument $count is a positive integer it gives the number of times the code is run, a negative intiger indicates the minimum number of CPU seconds to run[11]. The minimum in this case is 0.1 sec. If $count is zero a default value of 3 CPU seconds is assumed.

The output of timethese is an object that can be used as an input for cmpthese.

We apply this to the following example, comparing different ways of calculating the square of a number.

     #!/usr/bin/perl
     use strict;
     use warnings;
     use Benchmark qw( timethese cmpthese ) ;
     my $x = 3.1;
     my $CNT = -6;
     my $r = timethese( $CNT, {
     a => sub{$x*$x},
     b => sub{$x**2},
     c => sub{exp(2*log($x))}
     } );
     cmpthese $r;
     $CNT=40_000_000;
     $r = timethese( $CNT, {
     a => sub{$x*$x},
     b => sub{$x**2},
     c => sub{exp(2*log($x))}
     } );
     cmpthese $r;

The procedures are run twice, the first with $count=-6 and the second with $count=40_000_000.

In the first case the timethese output is the following

     Benchmark: running a, b, c for at least 6 CPU seconds...
        a:  8 wallclock secs ( 7.07 usr +  0.00 sys =  7.07 CPU) @ 17313412.45/s (n=122405826)
        b:  7 wallclock secs ( 6.13 usr + -0.02 sys =  6.11 CPU) @ 12221032.41/s (n=74670508)
        c:  6 wallclock secs ( 6.39 usr +  0.00 sys =  6.39 CPU) @ 3914053.68/s (n=25010803)

In these case the real (wallclock) time is given, and also the distribution of the addition of the time spent by the user and the system to accomplish the CPU time goal[12]. In case the program spawns one or more children processes the cusr and csys times are also given. The number after the @ symbol is the number of iterations per second and n is the total number of iterations. Thus the larger the better in these two last cases. The first version of the code can be concluded to be more efficient. This is more easily denoted using the cmpthese output. It gives in increasing order the number of iterations per second and the percentage of improvement (positive) or worsening (negative value) compared to the other options.

      
            Rate    c    b    a
     c  3914054/s   -- -68% -77%
     b 12221032/s 212%   -- -29%
     a 17313412/s 342%  42%   --

In this case the codes are ordered starting on the slowest (c in this case), giving in Rate the iterations per second and the percentages of comparison of the rate with the rate of the other codes under evaluation.

If the $count argument is positive the code is executed the number of times indicated by the argument. If this number is high enough the results should coincide with the previously obtained.

     enchmark: timing 40000000 iterations of a, b, c...
        a:  2 wallclock secs ( 1.90 usr + -0.01 sys =  1.89 CPU) @ 21164021.16/s (n=40000000)
        b:  3 wallclock secs ( 3.25 usr +  0.00 sys =  3.25 CPU) @ 12307692.31/s (n=40000000)
        c: 10 wallclock secs (10.16 usr +  0.00 sys = 10.16 CPU) @ 3937007.87/s (n=40000000)
             Rate    c    b    a
     c  3937008/s   -- -68% -81%
     b 12307692/s 213%   -- -42%
     a 21164021/s 438%  72%   --

The output will vary even for the same box, and several runnings are sometimes necessary to get a final answer. Also the output vary from box to box. If the same code is run in a different computer we obtain

     
     Benchmark: running a, b, c for at least 6 CPU seconds...
        a:  7 wallclock secs ( 6.18 usr +  0.00 sys =  6.18 CPU) @ 19332920.23/s (n=119477447)
        b:  7 wallclock secs ( 7.25 usr +  0.00 sys =  7.25 CPU) @ 10521698.76/s (n=76282316)
        c:  8 wallclock secs ( 6.96 usr +  0.00 sys =  6.96 CPU) @ 4018543.53/s (n=27969063)
             Rate    c    b    a
     c  4018544/s   -- -62% -79%
     b 10521699/s 162%   -- -46%
     a 19332920/s 381%  84%   --
     Benchmark: timing 40000000 iterations of a, b, c...
        a:  2 wallclock secs ( 0.77 usr +  0.00 sys =  0.77 CPU) @ 51948051.95/s (n=40000000)
        b:  2 wallclock secs ( 2.40 usr +  0.00 sys =  2.40 CPU) @ 16666666.67/s (n=40000000)
        c:  9 wallclock secs ( 9.40 usr +  0.00 sys =  9.40 CPU) @ 4255319.15/s (n=40000000)
             Rate     c     b     a
     c  4255319/s    --  -74%  -92%
     b 16666667/s  292%    --  -68%
     a 51948052/s 1121%  212%    --

5.9.1 References

  1. Perldoc Benchmarc Entry

  1. Benchmarking in techrepublic

  1. Process time


5.10 Accessing recursively files and directories in Perl

Added on November 22nd, 2012.

The easiest way in Perl to access files and directories recursively is making use of the File::Find module. For example let's assume that we want to, recursively, change the permissions of a given directory contents in such a way that files have rw-r---- permision and directories rwxr-x--.

We can do this with the following script, than makes use of the File::Find module.

     #!/usr/bin/perl
     #
     # script to process recursively a directory.
     # by Currix TM.
     use strict;
     use warnings;
     #
     use File::Find;
     #
     sub process_files {
       my $permission_dir = 0750;
       my $permission_file = 0740;
       if (-d $_) {
         #print "processing dir $_\n";
         chmod $permission_dir, $_;
       } elsif (-f $_) {
         #print "\tprocessing file $_\n";
         chmod $permission_file, $_;
       }
     }
     @ARGV = qw(.) unless @ARGV;
     find(\&process_files, @ARGV);

Notice that the chmod function in Perl needs that the permission are expressed in octal values. Note also the lack of apostrophes in the permisision variables definition.


5.11 Correct way of opening filehandles in Perl

Added on December 17th, 2018.

If we need to open for reading a file whose name is encoded in the variable $filename, we create a variable for the filehandle name $fh, and we want to stop the program if the file does not exist, we should use something similar to the following example:

     use strict;
     use warnings;
      
     my $filename = 'data.txt';
     open(my $fh, '<:encoding(UTF-8)', $filename)
       or die "Could not open file '$filename' $!";
      
     while (my $row = <$fh>) {
       chomp $row;
       print "$row\n";
     }

Notice that the open function has three arguments and instead of using only the symbol \< for reading the encoding is provided.

In early Perl versions (prior to 2000) the standard way of doing this would be

     use strict;
     use warnings;
      
     my $filename = 'data.txt';
     open(IN, "<$filename")
       or die "Could not open file '$filename' $!";
      
     while (my $row = <IN>) {
       chomp $row;
       print "$row\n";
     }

In this case only two arguments are given for open. The filehandle is IN, a global bareword that is not catched by the use stric pragma. The problem with this old fashion way is

  1. IN is global to all the script with possible clashes.

  1. It is difficult to pass the filehandle to functions and subroutines as an argument.

  1. Having two parameters the filename variable can make possible to experience unexpected side effects. For example defining $filename = ">/etc/crontab" and making possible to modify this file (hopefully you do not run as root your scripts...).

Same apply for writing...


5.11.1 References

  1. PerlMaven.com


[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ next ]


Some Mini-Howtos of Interest

Curro Perez-Bernal mailto:francisco.perez@dfaie.uhu.es