As you probably know, this statement:

local $|=1;

disables buffering of the currently select( )ed file handle (the default is STDOUT). Under mod_perl, the STDOUT file handle is automatically tied to the output socket. If STDOUT buffering is disabled, each print( ) call also calls ap_rflush( ) to flush Apache's output buffer.

When multiple print( ) calls are used (bad style in generating output), or if there are just too many of them, you will experience a degradation in performance. The severity depends on the number of print( ) calls that are made.

Many old CGI scripts were written like this:

print "<body bgcolor=\"black\" text=\"white\">";
print "<h1>Hello</h1>";
print "<a href=\"foo.html\">foo</a>";
print "</body>";

This example has multiple print( ) calls, which will cause performance degradation with $|=1. It also uses too many backslashes. This makes the code less readable, and it is more difficult to format the HTML so that it is easily readable as the script's output. The code below solves the problems:

print qq{
  <body bgcolor="black" text="white">
    <a href="foo.html">foo</a>

You can easily see the difference. Be careful, though, when printing an <html> tag. The correct way is:

print qq{<html>

You can also try the following:

print qq{

but note that some older browsers expect the first characters after the headers and empty line to be <html> with no spaces before the opening left angle bracket. If there are any other characters, they might not accept the output as HTML might and print it as plain text. Even if this approach works with your browser, it might not work with others.

Another approach is to use the here document style:

print <<EOT;

Performance-wise, the qq{ } and here document styles compile down to exactly the same code, so there should not be any real difference between them.

Remember that the closing tag of the here document style (EOT in our example) must be aligned to the left side of the line, with no spaces or other characters before it and nothing but a newline after it.

Yet another technique is to pass the arguments to print( ) as a list:

print "<body bgcolor=\"black\" text=\"white\">",
      "<a href=\"foo.html\">foo</a>",

This technique makes fewer print( ) calls but still suffers from so-called backslashitis (quotation marks used in HTML need to be prefixed with a backslash). Single quotes can be used instead:

'<a href="foo.html">foo</a>'

but then how do we insert a variable? The string will need to be split again:

'<a href="',$foo,'.html">', $foo, '</a>'

This is ugly, but it's a matter of taste. We tend to use the qq operator:

print qq{<a href="$foo.html">$foo</a>
         Some text
         <img src="bar.png" alt="bar" width="1" height="1">

What if you want to make fewer print( ) calls, but you don't have the output ready all at once? One approach is to buffer the output in the array and then print it all at once:

my @buffer = ( );
push @buffer, "<body bgcolor=\"black\" text=\"white\">";
push @buffer, "<h1>Hello</h1>";
push @buffer, "<a href=\"foo.html\">foo</a>";
push @buffer, "</body>";
print @buffer;

An even better technique is to pass print( ) a reference to the string. The print( ) used under Apache overloads the default CORE::print( ) and knows that it should automatically dereference any reference passed to it. Therefore, it's more efficient to pass strings by reference, as it avoids the overhead of copying.

my $buffer = "<body bgcolor=\"black\" text=\"white\">";
$buffer .= "<h1>Hello</h1>";
$buffer .= "<a href=\"foo.html\">foo</a>";
$buffer .= "</body>";
print \$buffer;

If you print references in this way, your code will not be backward compatible with mod_cgi, which uses the CORE::print( ) function.

Now to the benchmarks. Let's compare the printing techniques we have just discussed. The benchmark that we are going to use is shown in Example 13-6.

Example 13-6. benchmarks/

use Benchmark;
use Symbol;

my $fh = gensym;
open $fh, ">/dev/null" or die;

my @text = (
    "  <HEAD>\n",
    "    <TITLE>\n",
    "      Test page\n",
    "    </TITLE>\n",
    "  </HEAD>\n",
    "  <BODY BGCOLOR=\"black\" TEXT=\"white\">\n",
    "    <H1>\n",
    "      Test page \n",
    "    </H1>\n",
    "    <A HREF=\"foo.html\">foo</A>\n",
    "text line that emulates some real output\n" x 100,
    "    <HR>\n",
    "  </BODY>\n",

my $text = join "", @text;

sub multi {
    my @copy = @text;
    my_print($_) for @copy;

sub single {
    my $copy = $text;

sub array {
    my @copy = @text;

sub ref_arr {
    my @refs = \(@text);

sub concat {
    my $buffer;
    $buffer .= $_ for @text;

sub my_join {
    my $buffer = join '', @text;

sub my_print {
    for (@_) {
        print $fh ref($_) ? $$_ : $_;

timethese(100_000, {
    join    => \&my_join,
    array   => \&array,
    ref_arr => \&ref_arr,
    multi   => \&multi,
    single  => \&single,
    concat  => \&concat,

timethese(100_000, {
    'array  /b' => sub {my $ofh=select($fh);$|=0;select($ofh); array( )  },
    'array  /u' => sub {my $ofh=select($fh);$|=1;select($ofh); array( )  },
    'ref_arr/b' => sub {my $ofh=select($fh);$|=0;select($ofh); ref_arr( )},
    'ref_arr/u' => sub {my $ofh=select($fh);$|=1;select($ofh); ref_arr( )},
    'multi  /b' => sub {my $ofh=select($fh);$|=0;select($ofh); multi( )  },
    'multi  /u' => sub {my $ofh=select($fh);$|=1;select($ofh); multi( )  },
    'single /b' => sub {my $ofh=select($fh);$|=0;select($ofh); single( ) },
    'single /u' => sub {my $ofh=select($fh);$|=1;select($ofh); single( ) },
    'concat /b' => sub {my $ofh=select($fh);$|=0;select($ofh); concat( ) },
    'concat /u' => sub {my $ofh=select($fh);$|=1;select($ofh); concat( ) },
    'join   /b' => sub {my $ofh=select($fh);$|=0;select($ofh); my_join( )},
    'join   /u' => sub {my $ofh=select($fh);$|=1;select($ofh); my_join( )},

Under Perl 5.6.0 on Linux, the first set of results, sorted by CPU clocks, is:

Benchmark: timing 100000 iterations of array, concat, multi, ref_array...
   single:  6 wallclock secs ( 5.42 usr + 0.16 sys =  5.58 CPU)
     join:  8 wallclock secs ( 8.63 usr + 0.14 sys =  8.77 CPU)
   concat: 12 wallclock secs (10.57 usr + 0.31 sys = 10.88 CPU)
  ref_arr: 14 wallclock secs (11.92 usr + 0.13 sys = 12.05 CPU)
    array: 15 wallclock secs (12.95 usr + 0.26 sys = 13.21 CPU)
    multi: 38 wallclock secs (34.94 usr + 0.25 sys = 35.19 CPU)

single string print is obviously the fastest; join, concatination of string, array of references to string, and array of strings are very close to each other (the results may vary according to the length of the strings); and print call per string is the slowest.

Now let's look at the same benchmark, where the printing was either buffered or not:

Benchmark: timing 100000 iterations of ...
single /b: 10 wallclock secs ( 8.34 usr + 0.23 sys =  8.57 CPU)
single /u: 10 wallclock secs ( 8.57 usr + 0.25 sys =  8.82 CPU)
join   /b: 13 wallclock secs (11.49 usr + 0.27 sys = 11.76 CPU)
join   /u: 12 wallclock secs (11.80 usr + 0.18 sys = 11.98 CPU)
concat /b: 14 wallclock secs (13.73 usr + 0.17 sys = 13.90 CPU)
concat /u: 16 wallclock secs (13.98 usr + 0.15 sys = 14.13 CPU)
ref_arr/b: 15 wallclock secs (14.95 usr + 0.20 sys = 15.15 CPU)
array  /b: 16 wallclock secs (16.06 usr + 0.23 sys = 16.29 CPU)
ref_arr/u: 18 wallclock secs (16.85 usr + 0.98 sys = 17.83 CPU)
array  /u: 19 wallclock secs (17.65 usr + 1.06 sys = 18.71 CPU)
multi  /b: 41 wallclock secs (37.89 usr + 0.28 sys = 38.17 CPU)
multi  /u: 48 wallclock secs (43.24 usr + 1.67 sys = 44.91 CPU)

First, we see the same picture among different printing techniques. Second, we can see that the buffered print is always faster, but only in the case where print( ) is called for each short string does it have a significant speed impact.

Now let's go back to the $|=1 topic. You might still decide to disable buffering, for two reasons:

An even better solution is to keep buffering enabled and call $r->rflush( ) to flush the buffers when needed. This way you can place the first part of the page you are sending in the buffer and flush it a moment before you perform a lengthy operation such as a database query. This kills two birds with the same stone: you show some of the data to the user immediately so she will see that something is actually happening, and you don't suffer from the performance hit caused by disabling buffering. Here is an example of such code:

use CGI ( );
my $r = shift;
my $q = new CGI;
print $q->header('text/html');
print $q->start_html;
print $q->p("Searching...Please wait");

# imitate a lengthy operation
for (1..5) {
    sleep 1;

print $q->p("Done!");

The script prints the beginning of the HTML document along with a nice request to wait by flushing the output buffer just before it starts the lengthy operation.

Now let's run the web benchmark and compare the performance of buffered versus unbuffered printing in the multi-printing code used in the last benchmark. We are going to use two identical handlers, the first handler having its STDOUTstream (tied to socket) unbuffered. The code appears in Example 13-7.

Example 13-7. Book/

package Book::UnBuffered;
use Apache::Constants qw(:common);
local $|=1; # Switch off buffering.
sub handler {
    my $r = shift;
    print "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">\n";
    print "<html>\n";
    print "  <head>\n";
    print "    <title>\n";
    print "      Test page\n";
    print "    </title>\n";
    print "  </head>\n";
    print "  <body bgcolor=\"black\" text=\"white\">\n";
    print "    <h1> \n";
    print "      Test page \n";
    print "    </h1>\n";
    print "    <a href=\"foo.html\">foo</a>\n" for 1..100;
    print "    <hr>\n";
    print "  </body>\n";
    print "</html>\n";
    return OK;

The following httpd.conf configuration is used:

### Buffered output 
<Location /buffering>
    SetHandler perl-script
    PerlHandler +Book::Buffered

### UnBuffered output 
<Location /unbuffering>
    SetHandler perl-script
    PerlHandler +Book::UnBuffered

Now we run the benchmark, using ApacheBench, with concurrency set to 50, for a total of 5,000 requests. Here are the results:

name        |   avtime completed failed  RPS
unbuffering |     56      5000      0    855
buffering   |     55      5000      0    865

As you can see, there is not much difference when the overhead of other processing is added. The difference was more significant when we benchmarked only the Perl code. In real web requests, a few percent difference will be felt only if you unbuffer the output and print thousands of strings one at a time.