Uk Casinos Not On Gamstop Non Gamstop Bookies Casinos Not On Gamstop Non Gamstop Casinos Non Gamestop Casinos

A Beautifier for the Perl Programming Language

Speech presented at The Perl Conference 2.0

>	Publications

Site Menu

Publications
-
-
-
-

A Beautifier for the Perl Programming Language

Tim Maher, Ph.D

Head Software Instructor, CONSULTIX POB 70563, Seattle WA 98107 t i m (AT)teachmeperl.com

NOTE: This paper was presented at The Perl Conference (TPC) 2.0, July, 1998.
A talk on an enhanced version of this program and other developments in Perl beautification was presented at TPC 5.0, in July of 2001. The slides are at http://teachmeperl.com/pb2001.html

I'm into Beauty. So much so, that when I first started learning Perl, I immediately began looking for a "Perl Program Beautifier," and was surprised when I couldn't find one.

What is a beautifier? A language-specific utility that reformats a program to conform to a standard of presentation. For example, blank lines might be inserted after procedure bodies and declarations, indentation might be adjusted to properly reflect nesting levels, excessively long lines might be split into shorter ones, and matching parentheses and braces might be vertically aligned to show program structure.

Why do I worry about program Beauty? For one thing, people like me, who use many languages, occasionally mix up their vocabularies, and end up speaking Language A to Interpreter B. For instance, /* introduces a comment in C, but generates pathnames with a UNIX shell, and // introduces an in-line comment in C++, but applies the matching operator in Perl. When you're lucky, this type of "language crosstalk" will immediately generate an error. But in less fortunate cases, the alien code might turn out to be acceptable but dysfunctional, leading to trouble down the road.

So one benefit of a good beautifier is that it can highlight, in some distinctive manner, material that might be syntactically acceptable but inappropriate. In this way, a beautifier can act as a debugging aid.

Another benefit of beautification comes from its imposition of a standard manner of depiction on programs. This makes it much easier for a programmer to read the writings of another, as well as his own programs in the future. In this way, a beautifier acts as an aid to communication and software maintenance.

Because of the importance of beautification utilities to programmers, I was shocked to learn that a language as mature and popular as Perl would be lacking one. But when I asked around, I was told by "Perl Gurus" that a beautifier would be very difficult to write because of Perl's complicated syntax, as stated in the Perl Frequently Asked Questions List (under Part 3, Programming Aids):

3.4 Is there a pretty-printer (similar to indent(1) ) for Perl?

. . . If what you mean is whether there is a program that will reformat the program such as indent(1) will do for C, then the answer is no. The complex feedback between the scanner and the parser (as in the things that confuse vgrind) make [sic] it challenging at best to write a standalone Perl parser.

But I figured it could not be as difficult as this makes it sound. For one thing, Perl knows how to parse Perl programs, and the Perl source code is freely available, so Perl could conceivably be reworked into a Perl beautifier.

Alternatively, the GNU source code for the popular C language beautifier "indent" is also available, so another approach would be to rework its 7k lines of C to handle the Perl language.

But I knew there was an easier approach, which would not require reworking anybody else's existing code, or the use of a language other than Perl. I knew this because my quest for beauty had already led me to write a rudimentary C++ beautifier in a three command (sed | indent| sed) shell script (UNIX/World, August 1991, p. 134.), and later a more robust C++ beautifier in 140 lines of C and shell code (Dr. Dobbs' Journal, Dec. 1992, pp. 23s-27s).

These beautifiers certainly don't qualify as "standalone parsers" for C++, because they don't classify the program elements into meaningful units. But that doesn't prevent them from doing for C++ everything that indent and cb do for C! The trick is realizing that programs written in Language B can be successfully processed by beautifiers for Language A, if Language B bears a syntactic similarity to Language A, and if the Language B program can be temporarily disguised as Language A.

So with this C++ beautification experience under my belt, and a stubborn determination to prove that "Perl Beautification" could be accomplished if sufficient Hubris, Impatience, and Laziness could be mustered, I began writing in Perl the first fully functional "Perl Beautifier", pbeaut, in April of 1998 [1].

Beautification Strategy

As with its C++ predecessors, I approached the problem of writing pbeaut by capitalizing on the existence of mature beautification utilities for the C language, which has some fortunate syntactic similarities to Perl, and milking the UNIX filter model for all it's worth.

The basic approach, borrowed from my C++ beautifiers, is to use a pre-processor to disguise Perl code as C code, effect the beautification using standard C tools, and then convert the disguised Perl back to its original form using a post-processor.

The basic model is therefore:

PERL CODE

Perl-to-C Encoder ->

Standard C beautifier ->

C-to-Perl Decoder

BEAUTIFIED PERL CODE

Here is a listing of the first pbeaut program:

$ cat pbeaut
#! /bin/sh
# Tim Maher, Consultix. www.teachmeperl.com
# pbeaut, v .1
# these indent options work pretty well
pencode $* | indent -npro -bl -bli0 -nce -npcs | pdecode
$

The encoder, pencode, examines every character of the Perl program, from first to last, and rewrites certain character sequences as necessary to disguise the Perl code as C.

The C beautifier, in this case GNU indent, inserts tabs to properly represent nesting levels, aligns parentheses and braces, inserts newline characters to split long lines into shorter ones, and generally fools around with the layout of the code to make it look more orderly and to emphasize the program's structure.

The decoder, pdecode, undoes the disguises crafted by pencode to reveal the hidden Perl program elements in their newly beautified context.

The current "production" version of pbeaut (version .62; 412 lines of code) communicates various types of information to and from the encoder and decoder and does extensive error checking, but its basic function is the same as the simple version shown above.

De-Obfuscation Testing

Where should one look to find the ugliest Perl code on the planet? Why the archives of past Obfuscated Perl Contests, of course, where contestants are rewarded for making their programs as inscrutable as possible (http://www.tpj.com/tpj/contest).

In this section, we'll examine the effects on Perl programs of C-style beautification using indent, as well as Perl-style beautification using pbeaut.

Here's a prize-winning entry from the 1996 contest:

$ cat caton
#F. First place: Russell Caton
# (Reduced in size to fit on one line.)
$-=100;while((($@)=(getpwent())[2])){push(@@,$@);}foreach(sort{$a<=>$b}@@){(($_<=$-)||($_==($-+++1)))?next:die"$-\n";}

After beautification with indent, it looks like this:

$ indent -npro -br -nce -npcs  < caton  # -br: brace on line with keyword

#F. First place: Russell Caton
$ -= 100;
while ((($ @) = (getpwent())[2])) {
        push(@@, $ @);
}
foreach(sort
        {
        $a <=> $b
        }
        @@) {
        (($_  <= $ -) || ($_  == ($ - +++1))) ? next : die "$-\n";
}

Perhaps surprisingly, the program layout looks pretty good, owing to the fact that Perl inherited many of its basic features from C (brace-delimited blocks, &&/|| conjunctions, semicolon line-termination, operator syntax, etc.).

On the other hand, the representations of two variables ( $- , $@ ) were altered by the insertion of a space between the symbols. Does this bother Perl?

$ perl -c caton
caton syntax OK
$ indent -npro -br -nce -npcs  < caton > caton.ind; perl -c caton.ind
caton.ind syntax OK
$ perl -w caton.ind
101
$

It doesn't bother Perl a bit! The program still produces the next available number from the /etc/passwd file (or NIS database). However, having the depictions of those variables messed up is definitely likely to annoy most (non-obfuscatory) Perl programmers!

After beautification with pbeaut, the program looks like this:

$- = 100;
while ((($@) = (getpwent())[2])) {
    push(@@, $@);
}
foreach(sort{$a<=>$b} @@) {
    (($_ <= $-)  || ($_ == ($-++ + 1))) ? next : die "$-\n";
}

As you can see, the preservation of variable names has been achieved, along with a much more Perlish representation of the foreach loop.

Here's another unattractive winning entry from the 1996 contest:

#D. First place: Robert Klep
# (Line breaks in original exactly as shown)
$Y=-1.2;for(0..24){$X=-2;for(0..79){($r,$i)=(0,0);for(0..15){$n=$_;$r=($x=$
r)*$x-($y=$i)*$y+$X;$i=2*$x*$y+$Y;$x*$x+$y*$y>4&&last}print unpack("\@$n a"
,".,:;=+itIYVXRBM ");$X+=3/80}$Y+=2.4/25}

Let's try beautifying this one with indent:

$ indent -npro -br -nce -npcs < klep > klep.ind
Standard input:1: Warning: old style assignment ambiguity in "=-". Assuming "= -"
Standard input:1: Warning: old style assignment ambiguity in "=-". Assuming "= -"

$ indent -npro -br -nce -npcs < klep 2>/dev/null | tee klep.ind
     1  $Y = -1.2;
     2  for (0. .24) {
     3          $X = -2;
     4          for (0. .79) {
     5                  ($r, $i) = (0, 0);
     6                  for (0. .15) {
     7                          $n = $_;
     8                          $r = ($x = $
     9                                r) * $x - ($y = $i) * $y + $X;
    10                          $i = 2 * $x * $y + $Y;
    11                          $x *$x + $y * $y > 4 && last
    12                  }
    13                  print unpack("\@$n a"
    14                               ,".,:;=+itIYVXRBM ");
    15                  $X += 3 / 80
    16          }
    17          $Y += 2.4 / 25
    18  }

$ perl -c klep.ind
klep.ind syntax OK
$

However, instead of drawing the default Mandelbrot fractal on the screen in ASCII characters, here's what the program does:

$ klep.ind
$

0. .79

for

The lesson to be learned here is that a faulty beautification process can produce code that is attractive but dysfunctional!

pbeaut, on the other hand, produces the following reworked version of this program, which functions correctly:

$ pbeaut -b 2 klep | cat -n     # -b 2: opening brace on line with keyword
    
1  $Y = -1.2;
2  for (0 .. 24) {
3          $X = -2;
4          for (0 .. 79) {
5                  ($r, $i) = (0, 0);
6                  for (0 .. 15) {
7                          $n = $_;
8                          $r = ($x = $  # NOTE: extraneous newline here
9                                            r) * $x - ($y = $i) * $y + $X;
10                          $i = 2 * $x * $y + $Y;
11                          $x * $x + $y * $y > 4 && last
12                  }
13                  print unpack("\@$n a" # NOTE: extraneous newline here
14                               ,".,:;=+itIYVXRBM ");
15                  $X += 3 / 80
16          }
17          $Y += 2.4 / 25
18  }

Although pbeaut tries to preserve programmer newlines by default, on the assumption they were sensibly placed, this behavior can be disabled through use of the -n (ignore newlines) invocation option:

$ pbeaut -n -b 2 klep | cat -n  # -n: ignore newlines in original
     1  $Y = -1.2;
     2  for (0 .. 24) {
     3          $X = -2;
     4          for (0 .. 79) {
     5                  ($r, $i) = (0, 0);
     6                  for (0 .. 15) {
     7                          $n = $_;
     8                          $r = ($x = $r) * $x - ($y = $i) * $y + $X;
     9                          $i = 2 * $x * $y + $Y;
    10                          $x * $x + $y * $y > 4 && last
    11                  }
    12                  print unpack("\@$n a", ".,:;=+itIYVXRBM ");
    13                  $X += 3 / 80
    14          }
    15          $Y += 2.4 / 25
    16  }

Testing with Perl Modules

pbeaut

Most of these programs can be successfully beautified. The others ( diagnostics.pm, getcwd.pl, perl5db.pl, CPAN.pm, English.pm ) cannot, due to their use of currently unsupported syntax features. (For example, semicolon delimiters with the substitution operator; see Current Limitations, below.)

How pbeaut Works

pencode

encapsulating within C comments
enclosing in quotes
encoding symbols alphabetically; e.g., { -> LB
leaving code unaltered (for material that can be properly handled by C-oriented rules)

We'll use a small program called fix to illustrate the encoding, beautification, and decoding processes.

$ cat fix
#! /usr/bin/perl -wn
# modify shebang pathname in input script
# Written strangely to involve certain syntax features
${prog}='perl';
if ($. =~ /^1$/) {$_=s|^#! /usr/bin/$prog|#! /usr/local/bin/$prog|o;print;}
else {print;} # other lines unaltered

$ pencode fix | cat -n  # show line numbers     
1  /* _a_C #! _a_RMa\/usr_a_RMa\/bin_a_RMa\/perl -wn C_a_ */     
2  /* _a_C # modify shebang pathname in input script C_a_ */     
3  /* _a_C # written strangely to involve certain syntax features C_a_ */
4       
5  "_a_F_${prog}_F_a_"='perl';
6       
7  if ("_a_F_$._F_a_" _a_EQ_a_TD "_a_M_/^1$/_M_a_")  # line wrapped for display
     {"_a_F_$__F_a_"="_a_S_s|^#! /usr/bin/$prog|#! /usr/local/bin/$prog|o_S_a_";print;}
8  else {print;} /* _a_C # other lines unaltered C_a_ */

/ .

The basic encoding sequence, _a_ , and an ancillary preceding or following syntax-specific code, mark the sequences added by pencode to facilitate their later removal by pdecode. (NOTE: if _a_ appears in the input program, pbeaut will select a different sequence for use.)

To prevent indent from making bad decisions about formatting Perl symbol sequences, such as inserting spaces or newlines between them, certain ones are encoded with an alphabetic representation, such as EQ...TD for =~ (line 7).

For related reasons, in line 5, ${prog} gets quoted to prevent the C beautifier from splitting apart its components.

The job of pdecode (currently about 50 lines of Perl5) is to reverse the effects of pencode, by removing the C-comment "wrappers" encapsulating Perl comments, decoding the various encoded strings to their original forms, and so forth.

Let's look at the result of subjecting fix to the full beautification process:

$ nl -ba fix  # original, pre-beautification
    1  #! /usr/bin/perl -wn
    2  # modify shebang pathname in input script
    3  # written strangely to involve certain syntax features
    4  
    5  ${prog}='perl';
    6  
    7  if ($. =~ /^1$/) {$_=s|^#! /usr/bin/$prog|#! /usr/local/bin/$prog|o;print;}
    8  else {print;} # other lines unaltered
$ pbeaut < fix | cat -n
    1 #! /usr/bin/perl -wn
    2 # modify shebang pathname in input script
    3 # written strangely to involve certain syntax features
    4
    5 ${prog} = 'perl';
    6
    7 if ($. =~ /^1$/) {
    8       $_ = s|^#! /usr/bin/$prog|#! /usr/local/bin/$prog|o;
    9       print;
    10 }
    11 else {
    12      print;
    13 } # other lines unaltered

Current Limitations

pbeaut

-l

$ pbeaut -l
     LIMITATIONS OF pbeaut VERSION 0.62:
     Delimiters for match and substitution operators known to work: <{[(#/|!?
             (others might work, but they haven't been tested)
     Whitespace before some delimiters not supported (e.g., m  /a/;
     s/// and tr/// are not allowed to change delimiters for the second part
     // sometimes formatted better if explicitly tagged as match, via m//
     q{}, qq(), etc., cannot have embedded }/) unless backslashed
     split must have white-space before first / (split //, not split//)
     Keyword "sub" and following subroutine name assumed to be on same line
     Here-Docs cannot omit Framing-Word (can't use blank line as terminator)
     TIPS: To disable beautification for a line, put "#LIT" at its end
           If you don't like the formatting of subs without parentheses,
             try adding them: sub foo() rather than sub foo
     (c) Tim Maher, CONSULTIX.  www.teachmeperl.com  (206) 781-UNIX
        Contact author for restrictions on distribution and usage

indent

pencode/pdecode

For this reason, the current pbeaut only offers two choices of indent options: "-b 1" (braces on lines following keywords) and "-b 2" (braces on same lines as keywords).

Perl Beautifier Demo Page

pbeaut

Status Report

pbeaut

Fortunately, pbeaut incorporates safeguards to prevent the accidental corruption of the original program, including:

presenting the beautified result to the standard output, rather than altering the original program
running perl -c (syntax check) on the beautified code and reporting any syntax errors to the user
reporting statistics on the input program and its beautified output, to facilitate detection of code loss

Development and Testing Platforms

pbeaut

GNU indent 1.9.1

/usr/ucb/indent

/usr/bin/cb

indent

Summary (2001 update)

pbeaut

PerlTidy

Footnotes

[1]: I am aware of the 5/20/96 pbprogram by P. Lutus Ashland. But unlike pbeaut, it does nothing but adjust indentation to properly reflect nesting levels, rather than offering the full power of C's indent and cb utilities.

A Beautifier for the Perl Programming Language

A Beautifier for the Perl Programming Language

Beautification Strategy

De-Obfuscation Testing

Testing with Perl Modules

How pbeaut Works

Current Limitations

Perl Beautifier Demo Page

Status Report

Development and Testing Platforms

Summary (2001 update)

Footnotes

Staff favorites