[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ next ]
This chapter has been translated into Spanish
language by Maria Ramos from Webhostinghub.com/support/edu
.
There are two possible ways to install a CPAN module. We give both alternatives.
Download the CPAN module (we use the module Devel-SmallProf-2.02 as an example) and untar it
# tar xzvf Devel-SmallProf-2.02.tar.gz # cd Devel-SmallProf-2.02 Devel-SmallProf-2.02# ls Changes MANIFEST META.yml Makefile.PL README TODO lib t
Compile and install the module
Devel-SmallProf-2.02# perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Devel::SmallProf Devel-SmallProf-2.02# make cp lib/Devel/SmallProf.pm blib/lib/Devel/SmallProf.pm Manifying blib/man3/Devel::SmallProf.3pm Devel-SmallProf-2.02# make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/part1....ok t/part2....ok t/part3....ok t/part4....ok t/pods.....skipped all skipped: Only the author needs to check that POD docs are right All tests successful, 1 test skipped. Files=5, Tests=14, 1 wallclock secs ( 0.08 cusr + 0.01 csys = 0.09 CPU) Devel-SmallProf-2.02# make install Installing /usr/local/share/perl/5.8.8/Devel/SmallProf.pm Installing /usr/local/man/man3/Devel::SmallProf.3pm Writing /usr/local/lib/perl/5.8.8/auto/Devel/SmallProf/.packlist Appending installation info to /usr/local/lib/perl/5.8.8/perllocal.pod
In this case we use an interactive shell that we launch as
# perl -MCPAN -e shell
The first time the shell is launched the system has to be configured and upgraded. The system prompts for several programs (unzip, etc.). Install them if they are not already installed in the computer. Select a repository (in my case ftp://ftp.rediris.es/mirror/CPAN) and most questions can safely answered using the default choice.
The first thing to do after configuration is to upgrade your CPAN:
cpan> install Bundle::CPAN CPAN: Storable loaded ok Fetching with LWP: ftp://ftp.rediris.es/mirror/CPAN/authors/01mailrc.txt.gz Going to read /root/.cpan/sources/authors/01mailrc.txt.gz CPAN: Compress::Zlib loaded ok Fetching with LWP: ftp://ftp.rediris.es/mirror/CPAN/modules/02packages.details.txt.gz . . .
Then you reload it:
cpan> reload cpan
And install the required module, Roots in this example:
cpan> install Math::Function::Roots . . . Appending installation info to /usr/lib/perl/5.8/perllocal.pod /usr/bin/make install -- OK
Updated on March 5th, 2017.
Updated on October 29th, 2016.
Updated on May 21st, 2015.
Updated on January 31st, 2015.
Execute a program, in this case epstopdf
, using as an input all
files sharing an common extension , in this case eps, in the
current directory.
perl -e 'system "epstopdf $_" for (glob "*.eps");'
Erasing "phantom" files with 0 bytes size.
perl -e 'foreach (glob "*") \ {unless (-s $_) {"Deleting $_\n";unlink "$_";}}'
Checking Postscript
files referred in a LaTeX
output
message.
latex filename.tex | perl -e \ 'while (<>){foreach (split) {/<(.*?\.eps)>/ and push(@eps, $1)}};\ foreach (sort @eps) {print;print "\n"}'
Print apostrophe character.
perl -le 'print "'\'' is an apostrophe..."'
Changing a text file from UTF-8 encoding to ASCII. Note that it does not work for Spanish accented characters.
perl -ne 'for (unpack "U*", $_) \ { printf $_ > 128 ? "x" : "%c", $_ }' fileUTF.txt > fileASCII.txt
Adding an E character to the output of Fortran
programs with three digits in the exponent that lacks this character.
perl -pi'.bak' -e 's/(\d)-(\d\d\d)/$1E-$2/g' fort.output
Delete empty lines in a file.
perl -ni -e 'chomp($_);print "$_\n" if ($_)'test.dat
Makes a substitution in a file, in the selected example substitute the word figures by Figures in all files with extension tex in the current directory.
perl -pi'.bak' -e 's/figures\//Figures\//g' *.tex
Counting the number of words per line in a file. If the filename is
test.out
then
perl -n -e 'my @line = split;print scalar @line,"\n"' test.out 10000 202 10000 202 10000 202 10000 202 10000 202
Join even lines at the end of odd lines in a file. If the filename is
test.out
then we can count the number of words in each line prior
and after the merging.
perl -n -e 'my @line = split;print scalar @line,"\n"' test.out 10000 202 10000 202 10000 202 perl -n -e 'chomp; $.%2 ? print "$_ ": print "$_\n";' test.out > test2.out perl -n -e 'my @line = split;print scalar @line,"\n"' test2.out 10202 10202 10202
Search and display occurrences in a log file (/var/log/loginlog.0 in this example) of succesful logins of user curro, showing the number of times the user has logged from each machine.
perl -e 'while (<>) { if (m|\d+:\d+:\d+\s+(.*?)\s+.*ccepted.*curro\s+from\s+(.*?)\s+.*|) {$vh{"$1 from $2"}++;} } foreach (keys %vh) {print "$vh{$_} login(s) to $_\n";}'/var/log/loginlog.0
Count the number of files in a directory.
In the first example we use globbing to count the total number of files including hidden files or the number of files subject to some restriction.
$ perl -e 'my @files = glob "* .*"; print 1+$#files."\n"' 129 $ perl -e 'my @files = glob "1*gif"; print 1+$#files."\n"' 9
The same can be accomplished using directory handles and grep
$ perl -e 'opendir DH, ".";my @files = readdir DH; print 1+$#files."\n"' 129 $ perl -e 'opendir DH, ".";my @files = grep /^1.*\.gif$/, (readdir DH); print 1+$#files."\n"' 9
Get the last line of a file.
We include a oneliner that gets the last line of a series of files
perl -e 'foreach (@ARGV) {my $line = `tail -n 1 $_`; print $line}' output_notes_1* output_notes_2* output_notes_3* ...
In the original application the line was prepended with a number appearing in the filename as follows
perl -e 'foreach (@ARGV) {/.*(\d\d).*/;my $line = $1.`tail -n 1 $_`; print $line}' output_notes_1* output_notes_2* output_notes_3* ...
Change Mac
carriage return to UNIX
new line
Let's assume we have a bunch of csv files with Mac
carriage return that our system interprets as a very long unique line. Instead
of using emacs
we can easily fix this with
perl -pi -e 's/\r/\n/g' *.csv
Extract the figure names from a LaTeX
compile output and prepare a
tarball with the figure files.
We assume that we compile a file called rdiary_2014.tex
and all
figs are in a directory called Figs and are png
files. We use two pipes, the first one connect the output of
pdflatex
with a perl
oneliner that reads the standad
input and extract the file names. A second pipe sends to tar
the
file names. Notice the -T - options.
pdflatex rdiary_2014.tex | perl -e 'while (<>) {print "$1\n" if /<.*(F.*png).*>/g}' | tar czf figs.tgz -T -
We have two files, each with two columns of data in an X Y format. Let's say
File_1
with X1 Y1 and File_2
with
X2 Y2. The abscissa are common and we want to create a third file
with called File_3
with three columns and the following format: X1
Y1 Y2.
perl -e '@fhs=map { open my $fh, "<", $_; $fh } @ARGV; $f0 = $fhs[0]; $f1 = $fhs[1]; while (!$done) { $done=1; chomp($l0=<$f0>); $l1=<$f1>; do { @ll1 = split " ", $l1; print "$l0 $ll1[-1] \n"; $done=0 } if (defined $l0); } ' File_1 File_2
The following short script permit to test a terminal codification.
#!/usr/bin/perl use warnings; use strict; use Encode; my @charsets = qw(utf-8 latin1 iso-8859-15 utf-16); # some non-ASCII codepoints: my $test = 'Ue: ' . chr(220) .'; Euro: '. chr(8364) . "\n"; # for (@charsets){print "$_: " . encode($_, $test);}
Once the script is run, different lines appear, and the terminal charset is the one of the line correctly displayed. For example, if we execute the script in a terminal using the UTF-8 coding system the output is something similar to
$ encodings.plx utf-8: Ue: Ü; Euro: € latin1: Ue: �; Euro: ? iso-8859-15: Ue: �; Euro: � utf-16: ��Ue: �; Euro: �
Note that both special characters, Ü and the euro symbol, only appear in correcto form in the utf-8 charset line.
The function chr in Perl
takes a number as an
argument and returns the character represented by that NUMBER in the selected
character set. The function encode allows the codification of the
character is different character sets. The four most common character sets are
the ones included in the former example.
Another problem with character sets arise when ordering alphabetically a set of words of characters when the characters in the set are not the standard 127 ASCII character. For example, suppose that we are trying to order alphabetically the following set of names
Álvarez Mínguez Pérez Perales Pilar Mola Borrero Díaz Diz Delgado Cuesta Castro Cáñamo
A standard program, comparing with the cmp function is as follows
#!/usr/bin/perl use strict; use warnings; # my @names; # while (defined(my $line = <>)) { chomp($line); my $elem = push(@names,$line); print "$elem element(s) added\n"; } # print "Reading process finished. Sorting ... "; # print "Done.\n\n"; print "Sorted set of names:\n"; foreach (sort by_name @names) { print "\t$_\n"; } # sub by_name {$a cmp $b}
However, when we run the program we obtain the somewhat surprising output
Sorted set of names: Borrero Castro Cuesta Cáñamo Delgado Diz Díaz Mola Mínguez Perales Pilar Pérez Álvarez
Clearly this is not the expected output if we intend to sort alphabetically (using Spanish sorting rules). The reason of this unexpected behavior is that the cmp function compares non-ASCII chars by codepoint number[9], which might give unexpected results. In order to sort according to a particular languague convention we should use the locale pragma. The previous program can be rewritten as follows
#!/usr/bin/perl use strict; use warnings; # ########## use locale; use POSIX qw(locale_h); setlocale(LC_COLLATE, 'es_ES@euro') or die "Locale es_ES\@euro not installed.\n"; ########## # # my @names; # while (defined(my $line = <>)) { chomp($line); my $elem = push(@names,$line); print "$elem element(s) added\n"; } # print "Reading process finished. Sorting ... "; # print "Done.\n\n"; print "Sorted set of names:\n"; foreach (sort by_name @names) { print "\t$_\n"; } # sub by_name {$a cmp $b}
After this change the word order is the usual one in Spanish.
Sorted set of names: Álvarez Borrero Cáñamo Castro Cuesta Delgado Díaz Diz Mínguez Mola Perales Pérez Pilar
This can be easily done using the grouping metacharacters '()'. They allow the extraction of the parts of a string that matched the imposed condition. Each grouping marked by parentheses goes into a special variable $1, $2, etc. They can be used as ordinary variables.
If we want to extract the day, month and year from a date expressed as dd/mm/yyyy we can do the following
# extract day, month, year if ($date =~ m!(\d\d)/(\d\d)/(\d\d\d\d)!) { # match dd/mm/yyyy format $day = $1; $month = $2; $year = $3; }
Note the use of the pattern match operator m!! to change the standard pattern delimiters. We can rewrite in a shorter form the previous code, taking advantage of the different behavior of the binding operator in scalar and list contexts.
In scalar context the binding operator returns a true or false value.
$answer = $date =~ m!(\d\d)/(\d\d)/(\d\d\d\d)!
Thus $answer equals to one or zero. In list context, however, the binding operator returns the list of matched values ($1, $2, $3, ...). Thus we can abbreviate the previous code as
($day,$month,$year) = ($date =~ m!(\d\d)/(\d\d)/(\d\d\d\d)!)
If the groupings in a regexp are nested, $1 gets the group with the leftmost opening parenthesis, $2 the next opening parenthesis, etc.
For more information: man perlretut.
fork
to launch a program
Apart from the system
utility, a Perl
script can
launch child processes using the fork
utility. Let's assume that
we are interested in launching applications, called fort_1 and
fort_2 from a script, but we are not interested in waiting for the
end of the application. Using fork
we can do the following in our
script
defined(my $pid0 = fork) or die "Cannot fork: $!"; unless ($pid0) { # Child 0 process is here exec "fort_1"; die "cannot exec fort_1: $!"; } defined(my $pid1 = fork) or die "Cannot fork: $!"; unless ($pid1) { # Child 1 process is here exec "fort_2"; die "cannot exec fort_2: $!"; } print "Program output: \n"; # Parent processes are here # script continues ... waitpid($pid0, 0); waitpid($pid1, 0);
Only the parent process has a non-zero value in $pid0 and $pid1 and skip the two unless conditionals. The program arrives to the waitpid function. This function waits for a particular child process to terminate and returns the pid of the deceased process. It is important to do so in order to get rid of zombie processes.
For more information: man perlipc.
Perl
predefined variables. Some examples.
Apart from the ubiquitous Perl
default variable, $_
there is a large number of useful predefined variables. We give some examples
of them in the following short codes.
$.
Current line number for the last filehandle accessed.
The following code displays each line of the file and the corresponding line number.
# open(INPUT,"</etc/motd") or die "/etc/motd: $!"; # while (<INPUT>) { print "Line $.: $_"; }
$0
Name of the program being executed.
The following code removes directories preceding the program name and stores it in a variable called $prgname
# (my $prgname) = $0 =~ m#.*/(.+$)#; #
For more information: man perlvar.
Perl
A named pipe (or fifo file) can be used for interprocess
communication between a parent process and a child process or children
processes. Let's suppose that we forked and launched a couple of child
processes Basic use of fork
to launch a
program, Section 5.5 and we want to check whether each of the child
processes has finished. Once the first process finishes we execute the
waitpid. It is not efficient to directly execute
waitpid because we do not know which of the processes will finish
first.
A commented sample of code that manages to do so, launching a couple of child processes and waiting for each of them to finish is the following
#!/usr/bin/perl # # named pipe use for ipc example # # by Currix TM # use strict; use warnings; use POSIX qw(mkfifo); # # fifo definition my $FIFOname = ".prgfifo"; unless (-p $FIFOname) { # Create the pipe if it doesn't exist unlink $FIFOname; mkfifo($FIFOname, 0700) or die "mkfifo in the current directory failed: $!"; } # my @pid; # print "This is the parent process before forking with pid $$\n"; # defined ($pid[0]=fork) or die "Cannot fork (1): $!"; # # unless ($pid[0]) { print "fork1 pid: $pid[0]\n"; print "fork1 ps: $$\n"; sleep 10; # Sleeeeeeeping system "cat /etc/motd"; # Child process ended. Write process number in the FIFO open (FIFO, ">$FIFOname") || die "can't write prgfifo: $!"; print FIFO "$$"; sleep 2; # to avoid dup signals print "Exiting child 1\n"; exit(0) # Remember to cleanly close the child process } else { print "This is the parent process after forking 1 with pid: $pid[0]\n"; } # defined ($pid[1]=fork) or die "Cannot fork (2): $!"; # unless ($pid[1]) { print "fork2 pid: $pid[1]\n"; print "fork2 ps: $$\n"; sleep 5; # Sleeeeeeeping system "cat /etc/fstab"; # Child process ended. Write process number in the FIFO open (FIFO, ">$FIFOname") || die "can't write prgfifo: $!"; print FIFO "$$"; sleep 1; # to avoid dup signals print "Exiting child 2\n"; exit(0); # Remember to cleanly close the child process } else { print "This is the parent process after forking 2 with pid: $pid[1]\n"; } # print "This are the pids from the parent process after forking: $pid[0], $pid[1]\n"; # my $iprocess = 0; open (FIFO, "<$FIFOname") || die "can't read prgfifo: $!"; while (1) { my $kidpid = <FIFO>; if (defined $kidpid) { print "child process $kidpid ended\n"; sleep 2; # To avoid dup signals again waitpid($kidpid, 0); last if ((++$iprocess) == 2); } } # print "The two child processes have finished. Closing the parent process.\n"; # unlink("$FIFOname"); # Remove the named pipe
For more information: man perlipc and references below.
Emacs
The CPerlMode can be set as the standard mode for editing
Perl
adding the following line to the .emacs
configuration file:
(defalias 'perl-mode 'cperl-mode)
To access the documentation about the mode use the describe-mode function by typing C-h m when in CPerlMode. When not in CPerlMode use M-x describe-function RET cperl-mode or C-h f cperl-mode.
Perl
to benchmark code.
The Benchmark
module included in the base Perl
distribution includes a series of procedures to to benchmark running times of
code.
Some of the available procedures are the following[10]
timethis: run a chunk of code several times.
timethese: run several chunks of code several times.
cmpthese: print results of timethese as a comparison chart (*).
timeit: run a chunk of code and see how long it goes (*).
countit: see how many times a chunk of code runs in a given time (*).
The procedures marked with an asterisk (*) are not included by default and should be explicitly loaded.
Two of the most useful options are timethese and cmpthese.
The timethese procedure runs several chunks of code several times. The syntax is
timethese($count, { 'Name1' => sub { ...code1... }, 'Name2' => sub { ...code2... }, });
If the argument $count is a positive integer it gives the number of times the code is run, a negative intiger indicates the minimum number of CPU seconds to run[11]. The minimum in this case is 0.1 sec. If $count is zero a default value of 3 CPU seconds is assumed.
The output of timethese is an object that can be used as an input for cmpthese.
We apply this to the following example, comparing different ways of calculating the square of a number.
#!/usr/bin/perl use strict; use warnings; use Benchmark qw( timethese cmpthese ) ; my $x = 3.1; my $CNT = -6; my $r = timethese( $CNT, { a => sub{$x*$x}, b => sub{$x**2}, c => sub{exp(2*log($x))} } ); cmpthese $r; $CNT=40_000_000; $r = timethese( $CNT, { a => sub{$x*$x}, b => sub{$x**2}, c => sub{exp(2*log($x))} } ); cmpthese $r;
The procedures are run twice, the first with $count=-6 and the second with $count=40_000_000.
In the first case the timethese output is the following
Benchmark: running a, b, c for at least 6 CPU seconds... a: 8 wallclock secs ( 7.07 usr + 0.00 sys = 7.07 CPU) @ 17313412.45/s (n=122405826) b: 7 wallclock secs ( 6.13 usr + -0.02 sys = 6.11 CPU) @ 12221032.41/s (n=74670508) c: 6 wallclock secs ( 6.39 usr + 0.00 sys = 6.39 CPU) @ 3914053.68/s (n=25010803)
In these case the real (wallclock) time is given, and also the distribution of the addition of the time spent by the user and the system to accomplish the CPU time goal[12]. In case the program spawns one or more children processes the cusr and csys times are also given. The number after the @ symbol is the number of iterations per second and n is the total number of iterations. Thus the larger the better in these two last cases. The first version of the code can be concluded to be more efficient. This is more easily denoted using the cmpthese output. It gives in increasing order the number of iterations per second and the percentage of improvement (positive) or worsening (negative value) compared to the other options.
Rate c b a c 3914054/s -- -68% -77% b 12221032/s 212% -- -29% a 17313412/s 342% 42% --
In this case the codes are ordered starting on the slowest (c in this case), giving in Rate the iterations per second and the percentages of comparison of the rate with the rate of the other codes under evaluation.
If the $count argument is positive the code is executed the number of times indicated by the argument. If this number is high enough the results should coincide with the previously obtained.
enchmark: timing 40000000 iterations of a, b, c... a: 2 wallclock secs ( 1.90 usr + -0.01 sys = 1.89 CPU) @ 21164021.16/s (n=40000000) b: 3 wallclock secs ( 3.25 usr + 0.00 sys = 3.25 CPU) @ 12307692.31/s (n=40000000) c: 10 wallclock secs (10.16 usr + 0.00 sys = 10.16 CPU) @ 3937007.87/s (n=40000000) Rate c b a c 3937008/s -- -68% -81% b 12307692/s 213% -- -42% a 21164021/s 438% 72% --
The output will vary even for the same box, and several runnings are sometimes necessary to get a final answer. Also the output vary from box to box. If the same code is run in a different computer we obtain
Benchmark: running a, b, c for at least 6 CPU seconds... a: 7 wallclock secs ( 6.18 usr + 0.00 sys = 6.18 CPU) @ 19332920.23/s (n=119477447) b: 7 wallclock secs ( 7.25 usr + 0.00 sys = 7.25 CPU) @ 10521698.76/s (n=76282316) c: 8 wallclock secs ( 6.96 usr + 0.00 sys = 6.96 CPU) @ 4018543.53/s (n=27969063) Rate c b a c 4018544/s -- -62% -79% b 10521699/s 162% -- -46% a 19332920/s 381% 84% -- Benchmark: timing 40000000 iterations of a, b, c... a: 2 wallclock secs ( 0.77 usr + 0.00 sys = 0.77 CPU) @ 51948051.95/s (n=40000000) b: 2 wallclock secs ( 2.40 usr + 0.00 sys = 2.40 CPU) @ 16666666.67/s (n=40000000) c: 9 wallclock secs ( 9.40 usr + 0.00 sys = 9.40 CPU) @ 4255319.15/s (n=40000000) Rate c b a c 4255319/s -- -74% -92% b 16666667/s 292% -- -68% a 51948052/s 1121% 212% --
Perl
Added on November 22nd, 2012.
The easiest way in Perl
to access files and directories
recursively is making use of the File::Find module. For example
let's assume that we want to, recursively, change the permissions of a given
directory contents in such a way that files have rw-r----
permision and directories rwxr-x--.
We can do this with the following script, than makes use of the File::Find module.
#!/usr/bin/perl # # script to process recursively a directory. # by Currix TM. use strict; use warnings; # use File::Find; # sub process_files { my $permission_dir = 0750; my $permission_file = 0740; if (-d $_) { #print "processing dir $_\n"; chmod $permission_dir, $_; } elsif (-f $_) { #print "\tprocessing file $_\n"; chmod $permission_file, $_; } } @ARGV = qw(.) unless @ARGV; find(\&process_files, @ARGV);
Notice that the chmod function in Perl
needs that the
permission are expressed in octal values. Note also the lack of apostrophes in
the permisision variables definition.
Perl
Added on December 17th, 2018.
If we need to open for reading a file whose name is encoded in the variable $filename, we create a variable for the filehandle name $fh, and we want to stop the program if the file does not exist, we should use something similar to the following example:
use strict; use warnings; my $filename = 'data.txt'; open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!"; while (my $row = <$fh>) { chomp $row; print "$row\n"; }
Notice that the open function has three arguments and instead of using only the symbol \< for reading the encoding is provided.
In early Perl
versions (prior to 2000) the standard way of doing
this would be
use strict; use warnings; my $filename = 'data.txt'; open(IN, "<$filename") or die "Could not open file '$filename' $!"; while (my $row = <IN>) { chomp $row; print "$row\n"; }
In this case only two arguments are given for open. The filehandle is IN, a global bareword that is not catched by the use stric pragma. The problem with this old fashion way is
IN is global to all the script with possible clashes.
It is difficult to pass the filehandle to functions and subroutines as an argument.
Having two parameters the filename variable can make possible to experience unexpected side effects. For example defining $filename = ">/etc/crontab" and making possible to modify this file (hopefully you do not run as root your scripts...).
Same apply for writing...
[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ next ]
Some Mini-Howtos of Interest
Curro Perez-Bernalmailto:francisco.perez@dfaie.uhu.es