Friday, March 21, 2008

Cache it! Solve PHP Performance Problems

In the good old days when building web sites was as easy as knocking
up a few [1] pages, the delivery of a web page to a browser was
a simple matter of having the web server fetch a file. A site's visitors
would see its small, text-only pages almost immediately, unless they
were using particularly slow modems. Once the page was
downloaded, the browser would [2] it somewhere on the local
computer so that, should the page be requested again, after
performing a quick check with the server to ensure the page hadn't
been updated, the browser could display the locally cached version.
Pages were served as quickly and efficiently as possible, and
everyone was happy.
Then dynamic web pages came along and spoiled the party by introducing two
problems:
When a request for a dynamic web page is received by the server, some
intermediate processing must be completed, such as the execution of
scripts by the [3] engine. This processing introduces a delay before
the web server begins to deliver the output to the browser. This may not be
a significant delay where simple PHP scripts are concerned, but for a more
complex application, the PHP engine may have a lot of work to do before
the page is finally ready for delivery. This extra work results in a
noticeable time lag between the user's requests and the actual display of
pages in the browser.
A typical web server, such as [4], uses the time of file modification
to inform a web browser of a requested page's age, allowing the browser to
take appropriate [5] action. With dynamic web pages, the actual
PHP script may change only occasionally; meanwhile, the content it
displays, which is often fetched from a database, will change frequently.
The web server has no way of discerning updates to the database, so it
doesn't send a last modified date. If the client (that is, the user's browser)
has no indication of how long the data will remain valid, it will take a
guess. This is problematic if the browser decides to use a locally cached
Cache it! Solve PHP Performance
Problems
HTML
cache
PHP
Apache
caching
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
2 of 21 2/17/2008 9:39 PM
Ben Balbo
Ben Balbo was
born in
Germany, grew
up in the UK,
lives in Melbourne, and likes
Guinness. While he isn't
drinking Guinness (which is
most of the time in
Melbourne, as it just doesn't
taste the same), he earns a
living as a PHP developer
and trainer, security
consultant, and Open Source
developer. He has been
known to talk in public about
web development-related
topics, which comes as part
of the package of being on the
committees of both the
Melbourne PHP User Group
and Open Source Developers'
Club. Although he wouldn't
admit this, he participates at
this level only in order to go
to restaurants or pubs after
the meetings.
Ben Balbo has written 2
articles for SitePoint with an
average reader rating of 8.8.
View all articles by Ben
Balbo...
version of the page which is now out of date, or if the browser decides to
request from the server a fresh copy of the page, which actually has no
new content, making the request redundant. The web server will always
respond with a freshly constructed version of the page, regardless of
whether or not the data in the database has actually changed.
To avoid the possibility of a web site visitor viewing out-of-date content, most
web developers use a meta tag or HTTP headers to tell the browser never to use a
cached version of the page. However, this negates the web browser's natural
ability to cache web pages, and entails some serious disadvantages. For example,
the content delivered by a dynamic page may only change once a day, so there's
certainly a benefit to be gained by having the browser cache a page--even if only
for 24 hours.
If you're working with a small PHP application, it's usually possible to live with
both issues. But as your site increases in complexity--and attracts more
traffic--you'll begin to run into performance problems. Both these issues can be
solved, however: the first with server-side caching; the second, by taking control
of [6] caching from within your application. The exact approach you
use to solve these problems will depend on your application, but in this chapter,
we'll consider both PHP and a number of class libraries from [7] as
possible panaceas for your web page woes.
Note that in this chapter's discussions of caching, we'll look at only those
solutions that can be implemented in PHP. For a more general introduction, the
definitive discussion of web caching is represented by Mark Nottingham's
tutorial [8].
Furthermore, the solutions in this chapter should not be confused with some of
the script caching solutions that work on the basis of optimizing and caching
compiled PHP scripts, such as Zend Accelerator [9] and ionCube PHP
Accelerator [10].
This chapter is excerpted from The PHP Anthology: 101 Essential Tips, Tricks & Hacks, 2nd Edition [11]. Download
this chapter plus two others, covering PDO and Databases, and Access Control [12], in PDF format to read offline.
How do I prevent web browsers from caching a page?
If timely information is crucial to your web site and you wish to prevent out-of-date content from ever being visible,
you need to understand how to prevent web browsers--and proxy servers--from caching pages in the first place.
Solutions
There are two possible approaches we could take to solving this problem: using HTML meta tags, and using HTTP
headers.
Using HTML Meta Tags
The most basic approach to the prevention of page caching is one that utilizes HTML meta tags:

client-side
PEAR
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
3 of 21 2/17/2008 9:39 PM

The insertion of a date that's already passed into the Expires meta tag tells the browser that the cached copy of the
page is always out of date. Upon encountering this tag, the browser usually won't cache the page. Although the
Pragma: no-cache meta tag isn't guaranteed, it's a fairly well-supported convention that most web browsers
follow. However, the two issues associated with this approach, which we'll discuss below, may prompt you to look at
the alternative solution.
Using HTTP Headers
A better approach is to use the HTTP protocol itself, with the help of PHP's header function, to produce the
equivalent of the two HTML meta tags above:
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Pragma: no-cache');
?>
We can go one step further than this, using the Cache-Control header that's supported by HTTP 1.1-capable
browsers:
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Cache-Control: post-check=0, pre-check=0', FALSE);
header('Pragma: no-cache');
?>
For a precise description of HTTP 1.1 Cache-Control headers, have a look at the W3C's HTTP 1.1 RFC [13]. Another
great source of information about HTTP headers, which can be applied readily to PHP, is mod_perl's documentation
on issuing correct headers [14].
Discussion
Using the Expires meta tag sounds like a good approach, but two problems are associated with it:
The browser first has to download the page in order to read the meta tags. If a tag wasn't present when the
page was first requested by a browser, the browser will remain blissfully ignorant and keep its cached copy of
the original.
Proxy servers that cache web pages, such as those common to ISPs, generally won't read the HTML documents
themselves. A web browser might know that it shouldn't cache the page, but the proxy server between the
browser and the web server probably doesn't--it will continue to deliver the same out-of-date page to the client.
On the other hand, using the HTTP protocol to prevent page caching essentially guarantees that no web browser or
intervening proxy server will cache the page, so visitors will always receive the latest content. In fact, the first header
should accomplish this on its own; this is the best way to ensure a page is not cached. The Cache-Control and
Pragma headers are added for some degree of insurance. Although they don't work on all browsers or proxies, the
Cache-Control and Pragma headers will catch some cases in which the Expires header doesn't work as
intended--if the client computer's date is set incorrectly, for example.
Of course, to disallow caching entirely introduces the problems we discussed at the start of this chapter: it negates
the web browser's natural ability to cache pages, and can create unnecessary overhead, as new versions of pages are
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
4 of 21 2/17/2008 9:39 PM
always requested, even though those pages may not have been updated since the browser's last request. We'll look at
the solution to these issues in just a moment.
How do I control client-side caching?
We addressed the task of disabling client-side caching in "How do I prevent web browsers from caching a page?", but
disabling the cache is rarely the only (or best) option.
Here we'll look at a mechanism that allows us to take advantage of client-side caches in a way that can be controlled
from within a PHP script.
Apache Required!
This approach will only work if you're running PHP as an Apache web server module, because it requires use of the
function getallheaders--which only works with Apache--to fetch the HTTP headers sent by a web browser.
Solutions
In controlling client-side caching you have two alternatives. You can set a date on which the page will expire, or
respond to the browser's request headers. Let's see how each of these tactics is executed.
Setting a Page Expiry Header
The header that's easiest to implement is the Expires header--we use it to set a date on which the page will expire,
and until that time, web browsers are allowed to use a cached version of the page. Here's an example of this header at
work:
expires.php (excerpt)
function setExpires($expires) {
header(
'Expires: '.gmdate('D, d M Y H:i:s', time()+$expires).'GMT');
}
setExpires(10);
echo ( 'This page will self destruct in 10 seconds
' );
echo ( 'The GMT is now '.gmdate('H:i:s').'
' );
echo ( 'View Again
' );
?>
In this example, we created a custom function called setExpires that sets the HTTP Expires header to a point
in the future, defined in seconds. The output of the above example shows the current time in GMT, and provides a
link that allows us to view the page again. If we follow this link, we'll notice the time updates only once every ten
seconds. If you like, you can also experiment by using your browser's Refresh button to tell the browser to refresh the
cache, and watching what happens to the displayed date.
Acting on the Browser's Request Headers
A more useful approach to client-side cache control is to make use of the Last-Modified and
If-Modified-Since headers, both of which are available in HTTP 1.0. This action is known technically as
performing a conditional GET request; whether your script returns any content depends on the value of the incoming
If-Modified-Since request header.
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
5 of 21 2/17/2008 9:39 PM
If you use PHP version 4.3.0 and above on Apache, the HTTP headers are [15] with the functions
apache_request_headers and apache_response_headers. Note that the function
getallheaders has become an alias for the new apache_request_headers function.
This approach requires that you send a Last-Modified header every time your PHP script is accessed. The next
time the browser requests the page, it sends an If-Modified-Since header containing a time; your script can
then identify whether the page has been updated since that time. If it hasn't, your script sends an HTTP 304 status
code to indicate that the page hasn't been modified, and exits before sending the body of the page.
Let's see these headers in action. The example below uses the modification date of a text file. To simulate updates, we
first need to create a way to randomly write to the file:
ifmodified.php (excerpt)
$file = 'ifmodified.txt';
$random = [16] (0,1,1);
shuffle($random);
if ( $random[0] == 0 ) {
$fp = fopen($file, 'w');
fwrite($fp, 'x');
fclose($fp);
}
$lastModified = filemtime($file);
Our simple randomizer provides a one-in-three chance that the file will be updated each time the page is requested.
We also use the filemtime function to obtain the last modified time of the file.
Next, we send a Last-Modified header that uses the modification time of the text file. We need to send this
header for every page we render, to cause visiting browsers to send us the If-Modifed-Since header upon
every request:
ifmodified.php (excerpt)
header('Last-Modified: ' .
gmdate('D, d M Y H:i:s', $lastModified) . ' GMT');
Our use of the getallheaders function ensures that PHP gives us all the incoming request headers as an array.
We then need to check that the If-Modified-Since header actually exists; if it does, we have to deal with a special case
caused by older [17] browsers (earlier than version 6), which appended an illegal extra field to their
If-Modified-Since headers. We use PHP's strtotime function to generate a timestamp from the date the
browser sent us. If there's no such header, we set this timestamp to zero, which forces PHP to give the visitor an
up-to-date copy of the page:
ifmodified.php (excerpt)
$request = getallheaders();
if (isset($request['If-Modified-Since']))
{
$modifiedSince = explode(';', $request['If-Modified-Since']);
$modifiedSince = strtotime($modifiedSince[0]);
accessible
array
Mozilla
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
6 of 21 2/17/2008 9:39 PM
}
else
{
$modifiedSince = 0;
}
Finally, we check to see whether or not the cache has been modified since the last time the visitor received this page.
If it hasn't, we simply send a 304 Not Modified response header and exit the script, saving [18] and
processing time by prompting the browser to display its cached copy of the page:
ifmodified.php (excerpt)
if ($lastModified <= $modifiedSince)
{
header('HTTP/1.1 304 Not Modified');
exit();
}
echo ( 'The GMT is now '.gmdate('H:i:s').'
' );
echo ( 'View Again
' );
?>
Remember to use the "View Again" link when you run this example (clicking the Refresh button usually clears your
browser's cache). If you click on the link repeatedly, the cache will eventually be updated; your browser will throw out
its cached version and fetch a new page from the server.
If you combine the Last-Modified header approach with time values that are already available in your
application--for example, the time of the most recent news article--you should be able to take advantage of web
browser caches, saving bandwidth and improving your application's perceived performance in the process.
Be very careful to test any caching performed in this manner, though; if you get it wrong, you may cause your visitors
to consistently see out-of-date copies of your site.
Discussion
HTTP dates are always calculated relative to Greenwich Mean Time (GMT). The PHP function gmdate is exactly the
same as the date function, except that it automatically offsets the time to GMT based on your server's system clock
and regional settings.
When a browser encounters an Expires header, it caches the page. All further requests for the page that are made
before the specified expiry time use the cached version of the page--no request is sent to the web server. Of course,
client-side caching is only truly effective if the system time on the computer is accurate. If the computer's time is out
of sync with that of the web server, you run the risk of pages either being cached improperly, or never being updated.
The Expires header has the advantage that it's easy to implement; in most cases, however, unless you're a highly
organized person, you won't know exactly when a given page on your site will be updated. Since the browser will only
contact the server after the page has expired, there's no way to tell browsers that the page they've cached is out of
date. In addition, you also lose some knowledge of the traffic visiting your web site, since the browser will not make
contact with the server when it requests a page that's been cached.
How do I examine HTTP headers in my browser?
bandwidth
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
7 of 21 2/17/2008 9:39 PM
How can you actually check that your application is running as expected, or debug your code, if you can't actually see
the HTTP headers? It's worth knowing exactly which headers your script is sending, particularly when you're dealing
with HTTP cache headers.
Solution
Several worthy tools are available to help you get a closer look at your HTTP headers:
LiveHTTPHeaders [19]
This add-on to the [20] browser is a simple but very handy tool for examining request and response headers
while you're browsing.
Firebug [21]
Another useful Firefox add-on, Firebug is a tool whose interface offers a dedicated tab for examining HTTP request
information.
HTTPWatch [22]
This add-on to Internet Explorer for HTTP viewing and debugging is similar to LiveHTTPHeaders above.
Charles Web Debugging Proxy [23]
Available for Windows, Mac OS X, and [24] or Unix, the Charles Web Debugging Proxy is a proxy server that
allows developers to see all the HTTP traffic between their browsers and the web servers to which they connect.
Any of these tools will allow you to inspect the communication between the server and browser.
How do I cache file downloads with Internet Explorer?
If you're developing file download scripts for Internet Explorer users, you might notice a few issues with the
download process. In particular, when you're serving a file download through a PHP script that uses headers such as
Content-Disposition: attachment, filename=myFile.pdf or Content-Disposition:
inline, filename=myFile.pdf, and that tells the browser not to cache pages, Internet Explorer won't
deliver that file to the user.
Solutions
Internet Explorer handles downloads in a rather unusual manner: it makes two requests to the web site. The first
request downloads the file and stores it in the cache before making a second request, the response to which is not
stored. The second request invokes the process of delivering the file to the end user in accordance with the file's
type--for instance, it starts Acrobat Reader if the file is a PDF document. Therefore, if you send the cache headers
that instruct the browser not to cache the page, Internet Explorer will delete the file between the first and second
requests, with the unfortunate result that the end user receives nothing!
If the file you're serving through the PHP script won't change, one solution to this problem is simply to disable the
"don't cache" headers, pragma and cache-control, which we discussed in "How do I prevent web browsers
from caching a page?", for the download script.
If the file download will change regularly, and you want the browser to download an up-to-date version of it, you'll
need to use the Last-Modified header that we met in "How do I control client-side caching?", and ensure that
the time of modification remains the same across the two consecutive requests. You should be able to achieve this
goal without affecting users of browsers that handle downloads correctly.
Firefox
Linux
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
8 of 21 2/17/2008 9:39 PM
One final solution is to write the file to the file system of your web server and simply provide a link to it, leaving it to
the web server to report the cache headers for you. Of course, this may not be a viable option if the file is supposed to
be secured.
How do I use output buffering for server-side caching?
Server-side processing delay is one of the biggest bugbears of dynamic web pages. We can reduce server-side delay by
caching output. The page is generated normally, performing database queries and so on with PHP; however, before
sending it to the browser, we capture and store the finished page somewhere--in a file, for instance. The next time the
page is requested, the PHP script first checks to see whether a cached version of the page exists. If it does, the script
sends the cached version straight to the browser, avoiding the delay involved in rebuilding the page.
Solution
Here, we'll look at PHP's in-built caching mechanism, the output buffer, which can be used with whatever page
rendering system you prefer (templates or no templates). Consider situations in which your script displays results
using, for example, echo or print, rather than sending the data directly to the browser. In such cases, you can use
PHP's output control functions to store the data in an in-memory buffer, which your PHP script has both access to
and control over.
Here's a simple example that demonstrates how the output buffer works:
buffer.php (excerpt)
ob_start();
echo '1. Place this in the buffer
';
$buffer = ob_get_contents();
ob_end_clean();
echo '2. A normal echo
';
echo $buffer;
?>
The buffer itself stores the output as a string. So, in the above script, we commence buffering with the
ob_startfunction, and use echo to display a piece of text which is stored in the output buffer automatically.
We then use the ob_get_contents function to fetch the data the echo statement placed in the buffer, and store
it in the $buffer variable. The ob_end_clean function stops the output buffer and empties the contents; the
alternative approach is to use the ob_end_flushfunction, which displays the contents of the buffer.
The above script displays the following output:
2. A normal echo
1. Place this in the buffer
In other words, we captured the output of the first echo, then sent it to the browser after the second echo. As this
simple example suggests, output buffering can be a very powerful tool when it comes to building your site; it provides
a solution for caching, as we'll see in a moment, and is also an excellent way to hide errors from your site's visitors, as
is discussed in Chapter 9. Output buffering even provides a possible alternative to browser redirection in situations
such as user authentication.
In order to improve the performance of our site, we can store the output buffer contents in a file. We can then call on
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
9 of 21 2/17/2008 9:39 PM
this file for the next request, rather than having to rebuild the output from scratch again. Let's look at a quick
example of this technique. First, our example script checks for the presence of a cache file:
sscache.php (excerpt)
if (file_exists('./cache/page.cache'))
{
readfile('./cache/page.cache');
exit();
}
If the script finds the cache file, we simply output its contents and we're done! If the cache file is not found, we
proceed to output the page using the output buffer:
sscache.php (excerpt)
ob_start();
?>
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


Cached Page


This page was cached with PHP's
>Output Control Functions


$buffer = ob_get_contents();
ob_end_flush();
Before we flush the output buffer to display our page, we make sure to store the buffer contents in the $buffer
variable.
The final step is to store the saved buffer contents in a text file:
sscache.php (excerpt)
$fp = fopen('./cache/page.cache','w');
fwrite($fp,$buffer);
fclose($fp);
?>
The page.cache file contents are exactly same as the HTML that was rendered by the script:
cache/page.cache (excerpt)
XHTML
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
10 of 21 2/17/2008 9:39 PM
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


Cached Page


This page was cached with PHP's
>Output Control Functions


Discussion
For an example that shows how to use PHP's output buffering capabilities to handle errors more elegantly, have a
look at the PHP Freaks article Introduction to Output Buffering, by Derek Ford [26].
What About Template Caching?
Template engines often include template caching features--Smarty [27] is a case in point. Usually, these engines offer
a built-in mechanism for storing a compiled version of a template (that is, the native PHP generated from the
template), which prevents us developers from having to recompile the template every time a page is requested.
This process should not be confused with output--or content--caching, which refers to the caching of the rendered
HTML (or other output) that PHP sends to the browser. In addition to the content cache mechanisms discussed in
this chapter, Smarty can cache the contents of the HTML page. Whether you use Smarty's content cache or one of the
alternatives discussed in this chapter, you can successfully use both template and content caching together on the
same site.
HTTP Headers and Output Buffering
Output buffering can help solve the most common problem associated with the header function, not to mention
the issues surrounding session_start and set_cookie. Normally, if you call any of these functions after
page output has begun, you'll get a nasty error message. When output buffering's turned on, the only output types
that can escape the buffer are HTTP headers. If you use ob_start at the very beginning of your application's
execution, you can send headers at whichever point you like, without encountering the usual errors. You can then
write out the buffered page content all at once, when you're sure that no more HTTP headers are required.
Use Output Buffering Responsibly
While output buffering can helpfully solve all our header problems, it should not be used solely for that reason. By
ensuring that all output is generated after all the headers are sent, you'll save the time and resource overheads
involved in using output buffers.
How do I cache just the parts of a page that change infrequently?
Caching an entire page is a simplistic approach to output buffering. While it's easy to implement, that approach
negates the real benefits presented by PHP's output control functions to improve your site's performance in a manner
that's relevant to the varying lifetimes of your content.
No doubt, some parts of the page that you send to visitors will change very rarely, such as the page's header, menus,
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
11 of 21 2/17/2008 9:39 PM
and footer. But other parts--for example, the list of comments on your blog posts--may change quite often.
Fortunately, PHP allows you to cache sections of the page separately.
Solution
Output buffering can be used to cache sections of a page in separate files. The page can then be rebuilt for output
from these files.
This technique eliminates the need to repeat database queries, while loops, and so on. You might consider assigning
each block of the page an expiry date after which the cache file is recreated; alternatively, you may build into your
application a mechanism that deletes the cache file every time the content it stores is changed.
Let's work through an example that demonstrates the principle. Firstly, we'll create two helper functions,
writeCache and readCache. Here's the writeCache function:
smartcache.php (excerpt)
function writeCache($content, $filename)
{
$fp = fopen('./cache/' . $filename, 'w');
fwrite($fp, $content);
fclose($fp);
}
The writeCache function is quite simple; it just writes the content of the first argument to a file with the name
specified in the second argument, and saves that file to a location in the cache directory. We'll use this function to
write our HTML to the cache files.
The readCache function will return the contents of the cache file specified in the first argument if it has not
expired--that is, the file's last modified time is not older than the current time minus the number of seconds specified
in the second argument. If it has expired or the file does not exist, the function returns false:
smartcache.php (excerpt)
function readCache($filename, $expiry)
{
if (file_exists('./cache/' . $filename))
{
if ((time() - $expiry) > filemtime('./cache/' . $filename))
{
return false;
}
$cache = file('./cache/' . $filename);
return implode('', $cache);
}
return false;
}
For the purposes of demonstrating this concept, I've used a procedural approach. However, I wouldn't recommend
doing this in practice, as it will result in very messy code and is likely to cause issues with file locking. For example,
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
12 of 21 2/17/2008 9:39 PM
what happens when someone accesses the cache at the exact moment it's being updated? Better solutions will be
explained later on in the chapter.
Let's continue this example. After the output buffer is started, processing begins. First, the script calls readCache
to see whether the file header.cache exists; this contains the top of the page--the HTML tag and the
start tag. We've used PHP's date function to display the time at which the page was actually rendered, so
you'll be able to see the different cache files at work when the page is displayed:
smartcache.php (excerpt)
ob_start();
if (!$header = readCache('header.cache', 604800))
{
?>
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


Chunked Cached Page
content="text/html; charset=iso-8859-1"/>


The header time is now:


$header = ob_get_contents();
ob_clean();
writeCache($header,'header.cache');
}
Note what happens when a cache file isn't found: the header content is output and assigned to a variable, $header,
with ob_get_contents, after which the ob_clean function is called to empty the buffer. This allows us to
capture the output in "chunks" and assign them to individual cache files with the writeCache function. The
header of the page is now stored as a file, which can be reused without our needing to rerender the page. Look back to
the start of the if condition for a moment. When we called readCache, we gave it an expiry time of 604800
seconds (one week); readCache uses the file modification time of the cache file to determine whether the cache is
still valid.
For the body of the page, we'll use the same process as before. However, this time, when we call readCache, we'll
use an expiry time of five seconds; the cache file will be updated whenever it's more than five seconds old:
smartcache.php (excerpt)
if (!$body = readCache('body.cache', 5))
{
echo 'The body time is now: ' . date('H:i:s') . '
';
$body = ob_get_contents();
ob_clean();
writeCache($body, 'body.cache');
}
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
13 of 21 2/17/2008 9:39 PM
The page footer is effectively the same as the header. After the footer, the output buffering is stopped and the
contents of the three variables that hold the page data are displayed:
smartcache.php (excerpt)
if (!$footer = readCache('footer.cache', 604800)) {
?>

The footer time is now:




$footer = ob_get_contents();
ob_clean();
writeCache($footer, 'footer.cache');
}
ob_end_clean();
echo $header . $body . $footer;
?>
The end result looks like this:
The header time is now: 17:10:42
The body time is now: 18:07:40
The footer time is now: 17:10:42
The header and footer are updated on a weekly basis, while the body is updated whenever it is more than five seconds
old. If you keep refreshing the page, you'll see the body time updating.
Discussion
Note that if you have a page that builds content dynamically, based on a number of variables, you'll need to make
adjustments to the way you handle your cache files. For example, you might have an online shopping catalog whose
listing pages are defined by a URL such as:
http://example.com/catalogue/view.php?category=1&page=2
This URL should show page two of all items in category one; let's say this is the category for socks. But if we were to
use the caching code above, the results of the first page of the first category we looked at would be cached, and shown
for any request for any other page or category, until the cache expiry time elapsed. This would certainly confuse the
next visitor who wanted to browse the category for shoes--that person would see the cached content for socks!
To avoid this issue, you'll need to incorporate the category ID and page number in to the cache file name like so:
$cache_filename = 'catalogue_' . $category_id . '_' .
$page . '.cache';
if (!$catalogue = readCache($cache_filename, 604800))
{
...display the category HTML...
}
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
14 of 21 2/17/2008 9:39 PM
This way, the correct cached content can be retrieved for every request.
Nesting Buffers
You can nest one buffer within another practically ad infinitum simply by calling ob_startmore than once. This can
be useful if you have multiple operations that use the output buffer, such as one that catches the PHP error
messages, and another that deals with caching. Care needs to be taken to make sure that ob_end_flush or
ob_end_clean is called every time ob_start is used.
How do I use PEAR::Cache_Lite for server-side caching?
The previous solution explored the ideas behind output buffering using the PHP ob_* functions. Although we
mentioned at the time, that approach probably isn't the best way to meet to dual goals of keeping your code
maintainable and having a reliable caching mechanism. It's time to see how we can put a caching system into action
in a manner that will be reliable and easy to maintain.
Solution
In the interests of keeping your code maintainable and having a reliable caching mechanism, it's a good idea to
delegate the responsibility of caching logic to classes you trust. In this case, we'll use a little help from
PEAR::Cache_Lite (version 1.7.2 is used in the examples here [28]). Cache_Lite provides a solid yet
easy-to-use library for caching, and handles issues such as: file locking; creating, checking for, and deleting cache
files; controlling the output buffer; and directly caching the results from function and class method calls. More to the
point, Cache_Lite should be relatively easy to apply to an existing application, requiring only minor code
modifications.
Cache_Lite has four main classes. First is the base class, Cache_Lite, which deals purely with creating and
fetching cache files, but makes no use of output buffering. This class can be used alone for caching operations in
which you have no need for output buffering, such as storing the contents of a template you've parsed with PHP.
The examples here will not use Cache_Lite directly, but will instead focus on the three subclasses.
Cache_Lite_Function can be used to call a function or class method and cache the result, which might prove
useful for storing a [29] query result set, for example. The Cache_Lite_Output class uses PHP's output
control functions to catch the output generated by your script and store it in cache files; it allows you to perform tasks
such as those we completed in "How do I cache just the parts of a page that change infrequently?". The
Cache_Lite_File class bases cache expiry on the timestamp of a master file, with any cache file being deemed
to have expired if it is older than the timestamp.
Let's work through an example that shows how you might use Cache_Lite to create a simple caching solution.
When we're instantiating any child classes of Cache_Lite, we must first provide an array of options that
determine the behavior of Cache_Lite itself. We'll look at these options in detail in a moment. Note that the
cacheDir directory we specify must be one to which the script has read and write access:
cachelite.php (excerpt)
require_once 'Cache/Lite/Output.php';
$options = array(
'cacheDir' => './cache/',
'writeControl' => 'true',
'readControl' => 'true',
MySQL
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
15 of 21 2/17/2008 9:39 PM
'fileNameProtection' => false,
'readControlType' => 'md5'
);
$cache = new Cache_Lite_Output($options);
For each chunk of content that we want to cache, we need to set a lifetime (in seconds) for which the cache should
live before it's refreshed. Next, we use the start method, available only in the Cache_Lite_Output class, to turn
on output buffering. The two arguments passed to the start method are an identifying value for this particular cache
file, and a cache group. The group is an identifier that allows a collection of cache files to be acted upon; it's possible
to delete all cache files in a given group, for example (more on this in a moment). The start method will check to see if
a valid cache file is available and, if so, it will begin outputting the cache contents. If a cache file is not available, start
will return false and begin caching the following output.
Once the output for this chunk has finished, we use the end method to stop buffering and store the content as a file:
cachelite.php (excerpt)
$cache->setLifeTime(604800);
if (!$cache->start('header', 'Static')) {
?>
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


PEAR::Cache_Lite example
content="text/html; charset=iso-8859-1"/>


PEAR::Cache_Lite example


The header time is now:


$cache->end();
}
To cache the body and footer, we follow the same procedure we used for the header. Note that, again, we specify a
five-second lifetime when caching the body:
cachelite.php (excerpt)
$cache->setLifeTime(5);
if (!$cache->start('body', 'Dynamic')) {
echo 'The body time is now: ' . date('H:i:s') . '
';
$cache->end();
}
$cache->setLifeTime(604800);
if (!$cache->start('footer', 'Static')) {
?>

The footer time is now:



Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
16 of 21 2/17/2008 9:39 PM

$cache->end();
}
?>
On viewing the page, Cache_Lite creates cache files in the cache directory. Because we've set the
fileNameProtection option to false, Cache_Lite creates the files with these names:
- ./cache/cache_Static_header
- ./cache/cache_Dynamic_body
- ./cache/cache_Static_footer
You can read about the fileNameProtection option--and many more--in "What configuration options does
Cache_Lite support?". When the same page is requested later, the code above will use the cached file if it is valid
and has not expired.
Protect your Cache Files
Make sure that the directory in which you place the cache files is not publicly available, or you may be offering
your site's visitors access to more than you realize.
What configuration options does Cache_Lite support?
When instantiating Cache_Lite (or any of its subclasses, such as Cache_Lite_Output), you can use any of a
number of approaches to controlling its behavior. These options should be placed in an array and passed to the
constructor as shown below (and in the previous section):
$options = array(
'cacheDir' => './cache/',
'writeControl' => true,
'readControl' => true,
'fileNameProtection' => false,
'readControlType' => 'md5'
);
$cache = new Cache_Lite_Output($options);
Solution
The options available in the current version of Cache_Lite (1.7.2) are:
cacheDir
This is the directory in which the cache files will be placed. It defaults to /tmp/.
caching
This option switches on and off the caching behavior of Cache_Lite. If you have numerous Cache_Lite calls
in your code and want to disable the cache for debugging, for example, this option will be important. The default
value is true (caching enabled).
lifeTime
This option represents the default lifetime (in seconds) of cache files. It can be changed using the setLifeTime
method. The default value is 3600 (one hour), and if it's set to null, the cache files will never expire.
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
17 of 21 2/17/2008 9:39 PM
fileNameProtection
With this option activated, Cache_Lite uses an MD5 encryption hash to generate the filename for the cache file.
This option protects you from error when you try to use IDs or group names containing characters that aren't valid
for filenames; fileNameProtection must be turned on when you use Cache_Lite_Function. The
default is true (enabled).
fileLocking
This option is used to switch the file locking mechanisms on and off. The default is true (enabled).
writeControl
This option checks that a cache file has been written correctly immediately after it has been created, and throws a
PEAR::Error if it finds a problem. Obviously, this facility would allow your code to attempt to rewrite a cache file that
was created incorrectly, but it comes at a cost in terms of performance. The default value is true (enabled).
readControl
This option checks any cache files that are being read to ensure they're not corrupt. Cache_Lite is able to place inside
the file a value, such as the string length of the file, which can be used to confirm that the cache file isn't corrupt.
There are three alternative mechanisms for checking that a file is valid, and they're specified using the
readControlType option. These mechanisms come at the cost of performance, but should help to guarantee
that your visitors aren't seeing scrambled pages. The default value is true (enabled).
readControlType
This option lets you specify the type of read control mechanism you want to use. The available mechanisms are a
cyclic redundancy check (crc32, the default value) using PHP's crc32 function, an MD5 hash using PHP's md5
function (md5), or a simple and fast string length check (strlen). Note that this mechanism is not intended to
provide security from people tampering with your cache files; it's just a way to spot corrupt files.
pearErrorMode
This option tells Cache_Lite how it should return PEAR errors to the calling script. The default is
CACHE_LITE_ERROR_RETURN, which means Cache_Lite will return a PEAR::Error object.
memoryCaching
With memory caching enabled, every time a file is written to the cache, it is stored in an array in Cache_Lite. The
saveMemoryCachingState and getMemoryCachingState methods can be used to store and access the
memory cache data between requests. The advantage of this facility is that the complete set of cache files can be
stored in a single file, reducing the number of disk read/write operations by reconstructing the cache files straight
into an array to which your code has access. The memoryCaching option may be worth further investigation if
you run a large site. The default value is false (disabled).
onlyMemoryCaching
If this option is enabled, only the memory caching mechanism will be used. The default value is false (disabled).
memoryCachingLimit
This option places a limit on the number of cache files that will be stored in the memory caching array. The more
cache files you have, the more memory will be used up by memory caching, so it may be a good idea to enforce a limit
that prevents your server from having to work too hard. Of course, this option places no restriction on the size of each
cache file, so just one or two massive files may cause a problem. The default value is 1000.
automaticSerialization
If enabled, this option will automatically serialize all data types. While this approach will slow down the caching
system, it is useful for caching nonscalar data types such as objects and arrays [30]. For higher performance, you
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
18 of 21 2/17/2008 9:39 PM
might consider serializing nonscalar data types yourself. The default value is false (disabled).
automaticCleaningFactor
This option will automatically clean old cache entries--on average, one in x cache writes, where x is the value set for
this option. Therefore, setting this value to 0 will indicate no automatic cleaning, and a value of 1will cause cache
clearing on every cache write. A value of 20 to 200 is the recommended starting point if you wish to enable this
facility; it causes cache cleaning to happen, on average, 0.5% to 5% of the time. The default value is 0 (disabled).
hashedDirectoryLevel
When set to a nonzero value, this option will enable a hashed directory structure. A hashed directory structure will
improve the performance of sites that have thousands of cache files. If you choose to use hashed directories, start by
setting this value to 1, and increasing it as you test for performance improvements. The default value is 0 (disabled).
errorHandlingAPIBreak
This option was added to enable backwards compatibility with code that uses the old API. When the old API was run
in CACHE_LITE_ERROR_RETURN mode (see the pearErrorMode option earlier in this list), some functions
would return a Boolean value to indicate success, rather than returning a PEAR_Error object. By setting this value
to true, the PEAR_Error object will be returned instead. The default value is false (disable).
How do I purge the Cache_Lite cache?
The built-in lifetime mechanism for Cache_Lite cache files provides a good foundation for keeping your cache
files up to date, but there will be some circumstances in which you need the files to be updated immediately.
Solution
In cases in which you need immediate updates, the methods remove and clean come in handy. The remove method is
designed to delete a specific cache file; it takes as arguments the cache ID and group name of the file. To delete the
page body cache file we created in "How do I use PEAR::Cache_Lite for server-side caching?", we'd use this code:
$cache->remove('body', 'Dynamic');
If we use the clean method, we can delete all the files in our cache directory simply by calling the method with no
arguments; alternatively, we can specify a group of cache files to delete. If we wanted to delete both the header and
footer cache files we created in "How do I use PEAR::Cache_Lite for server-side caching?", we could do so like this:
$cache->clean('Static');
Discussion
The remove and clean methods should obviously be called in response to events that arise within an application. For
example, if you have a discussion forum application, you probably want to remove the relevant cache files when a
visitor posts a new message.
Although it may seem like this solution entails a lot of code modifications, with some care it can be applied to your
application in a global manner. If you have a central script that's included in every page, your script can simply watch
for incoming events--for example, a variable like $_GET['newPost']--and respond by deleting the required
cache files. This keeps the cache file removal mechanism central and easier to maintain. You might also consider
using the php.ini setting auto_prepend_file to include this code in every PHP script.
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
19 of 21 2/17/2008 9:39 PM
How do I cache function calls?
Many web sites provide access to their data via web services such as SOAP and [31]-RPC. (You can read all
about web services in Chapter 12.) As web services are accessed over a network, it's often a very good idea to cache
results so that they can be fetched locally, rather than repeating the same slow request to the server multiple times. A
simple approach might be to use PHP sessions, but as that solution operates on a per-visitor basis, the opening
requests for each visitor will still be slow.
Solution
Let's assume you wish to create a web page that lists all the SitePoint books available on Amazon. The actual list is
not likely to change from moment to moment, so why would we make the request to the Amazon web service every
time the web page is displayed? We won't! Instead, we can take advantage of Cache_Lite by caching the results of
the XML-RPC request.
Requires PEAR::SOAP Version 0.11.0
The following solution uses the PEAR::SOAP library version 0.11.0 to access the Amazon web service. You can find
this package on the PEAR web site [32].
Here's some hypothetical code that fetches the data from the remote Amazon server:
$results = $amazonClient->ManufacturerSearchRequest($params);
Using Cache_Lite_Function, we can cache the results so the data returned from the service can be reused;
this will avoid unnecessary network calls and significantly improve performance.
The following example code focuses on the caching aspect to prevent us from getting bogged down in the details of
using the Amazon web service. You can see the complete script if you download this book's code archive from the
SitePoint web site.
The Cache_Lite_Function requires the inclusion of the following file:
cachefunction.php (excerpt)
require_once 'Cache/Lite/Function.php';
We instantiate the Cache_Lite_Function class with some options:
cachefunction.php (excerpt)
$options = array(
'cacheDir' => './cache/',
'fileNameProtection' => true,
'writeControl' => true,
'readControl' => true,
'readControlType' => 'strlen',
'defaultGroup' => 'SOAP'
);
$cache = new Cache_Lite_Function($options);
It's important that the fileNameProtection option is set to true (this is in fact the default value, but in this
XML
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
20 of 21 2/17/2008 9:39 PM
case I've set it manually to emphasize the point). If it were set to false, the filename would be invalid, so the data
will not be cached.
Here's how we make the calls to our SOAP client class:
cachefunction.php (excerpt)
$results = $cache->call('amazonClient->ManufacturerSearchRequest',
$params);
If the request is being made for the first time, Cache_Lite_Function will store the results as a serialized array
or object in a cache file (not that you need to worry about this), and this file will be used for future requests until it
expires. The setLifeTime method can again be used to specify how long the cache files should survive before
they're refreshed; currently, the default value of 3600 seconds (one hour) is being used. You can then use the
$results variable exactly as if you were calling the web service method directly. The output of our example script
can be seen in Figure 11.1.
Summary
Caching is an important and often overlooked aspect of web site development. Many factors that affect the
performance of today's web sites weren't a problem for their predecessors--from complex, dynamic page generation,
to a reliance on third-party data over the network. In this chapter, we've examined HTML meta tags, HTTP headers,
PHP output buffering and PEAR::Cache_Lite, and we've seen how you can use them to control the caching of
your web site content and improve the site's reliability and performance.
Implementing a caching system for your site might be simple, but ultimately, it depends on your requirements. If you
have a busy and predominantly static web site--such as a blog--that's managed through a content management
system, it will likely require little alteration, yet may benefit from huge performance improvements resulting from a
small investment of your time. Setting up caching for a more complex site that generates content on a per-user basis,
such as a portal or shopping cart system, will prove a little more tricky and time consuming, but the benefits are still
clear.
Regardless, I hope the information in this chapter has given you a good grasp of the options available, and will help
you determine which techniques are most suitable for your application. Don't forget to download this chapter, plus
Cache it! Solve PHP Performance Problems http://www.sitepoint.com/print/caching-php-performance
21 of 21 2/17/2008 9:39 PM
two others [33] -- PDO and Databases, and Access Control -- to enjoy offline. For information on the contents of the
book's other chapters, check out the full Table of Contents [34].
Back to SitePoint.com
[1] /glossary.php?q=H#term_75
[2] /glossary.php?q=C#term_21
[3] /glossary.php?q=P#term_1
[4] /glossary.php?q=A#term_19
[5] /glossary.php?q=C#term_21
[6] /glossary.php?q=C#term_15
[7] /glossary.php?q=P#term_50
[8] http://www.mnot.net/cache_docs/
[9] http://www.zend.com/
[10] http://www.php-accelerator.co.uk/
[11] http://www.sitepoint.com/books/phpant2/
[12] www.sitepoint.com/launch/108ef2/2/120
[13] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
[14] http://perl.apache.org/docs/general/correct_headers/correct_headers.html
[15] /glossary.php?q=A#term_61
[16] /glossary.php?q=%23#term_72
[17] /glossary.php?q=M#term_31
[18] /glossary.php?q=B#term_56
[19] http://livehttpheaders.mozdev.org/
[20] /glossary.php?q=F#term_45
[21] http://getfirebug.org/
[22] http://www.httpwatch.com/
[23] http://getcharles.com/
[24] /glossary.php?q=L#term_18
[25] /glossary.php?q=X#term_63
[26] http://www.phpfreaks.com/tutorials/59/0.php
[27] http://smarty.php.net/
[28] http://pear.php.net/package/Cache_Lite/
[29] /glossary.php?q=M#term_12
[30] /glossary.php?q=%23#term_72
[31] /glossary.php?q=X#term_3
[32] http://pear.php.net/package/soap/
[33] http://www.sitepoint.com/launch/108ef2/2/120
[34] http://www.sitepoint.com/books/phpant2/toc.php

No comments: