Drupal Performance Mantra: Crawl, Boost, Expire
Avoiding Drupal
The Boost module is a surprisingly effective static file caching tool to dramatically speed up pretty much any Drupal website. The trick it employs is in that it actually tries to avoid running Drupal as often as possible.
It provides caching on the level of whole web pages. If a particular version of a page happens not to be cached, Drupal will of course be asked to run through its convoluted logic of permissions, menus, database calls, and all that. Heavy pages often take several seconds to complete. Getting under the magic limit of 1 second is a prized UX and SEO art — but that's another story.
Thanks to Boost, even if your page takes several seconds to load, it will do that just once. The next time it will be fired blazing fast as a little compressed HTML file directly by the Apache web server, completely avoiding not just Drupal but any MySQL or PHP processing, easily delivering a page in a matter of milliseconds.
There indeed are even more performant caching layers. On dedicated servers, we surely may want to consider Varnish instead. In my experience, however, the various Varnish caching recipes out there are actually much less customizable than Boost, and they lack the sweet sense of completeness provided by Boost and its companion modules described further in this article.
Expiring Intelligently
Boost is good at making sure that a page request does not ask Drupal for the page content more than one single time. The problem is that without talking to Drupal it never learns when a page is updated. It will tirelessly continue its rapid delivery of pages — but their content will become stale.
Not forever, of course — Boost has settings for cache expiration time. We can set it to get a fresh version of the pages after a given time period, e.g. after 6 or 12 hours after the initial caching.
However, that is rarely good enough for any larger or busier website. Imagine a site with 300,000 pages that has 3,000 page views every hour, and with 30 page updates every hour. Some of the pages are more popular than others, so let's say that after 6 hours there are some 10,000 pages cached. In the 6 hours, 6 x 30 = 180 pages have been updated. Say 100 of them had been initially cached, meaning Boost is now serving stale content for those pages. Then the configured expiration time starts elapsing for the cached pages and Drupal is again tasked to slowly compute each of them. It happens gradually, since each page is cached at a bit different time and so they do not expire simultaneously either. But in the end, 10,000 pages will be expired and re-computed even though 9,900 of them remained unchanged. What a waste of resources and more, what a waste of time our users have to spend while Drupal re-computes the identical pages!
Enter the Cache Expiration module, a nifty little module that allows us to configure what pages get expired when something happens. We can set whether node pages get expired on node insert, update or delete, and whether the front page should also be expired at that moment (which is often needed), and possibly also the taxonomy term pages related to nodes, or any number of other custom URLs. Even better, we can do this per content type — making it possible that e.g. adding an image node, which anyway does not have a page display of its own, does not expire the cache of all nodes! And we can configure similar behaviour for comments, files, and for user pages. After a page is expired, it will be re-generated by Drupal the very next time it is requested.
Let's consider our above example benefitting from the Cache Expiration module. For simplicity, say that all of the 100 cached pages happen to be nodes with content imported or updated from an RSS feed. In such case, only the content type for feed items will get expired but all other content — e.g. articles, news, products, links, etc. will remain cached. Only 100 instead of 10,000 pages will have to be re-generated.
With Cache Expiration module we can set the Boost's blanket expiration time much higher, e.g. to a few days or even weeks, knowing that the pages that are changed will have their cache expired automatically.
Crawling in the Background
The world of caching scenarios is complex because of the number of variables we need to take into consideration. Our performance improvement above may look impressive but we can still do much better. Let's do another thought experiment on our example site from above.
On our site of 300,000 pages there are several content types. One of them is called "Article", with 50,000 nodes. Our Boost expiration is set to 1 week because we rely on the more selective Cache Expiration functionality. So after 3 days our website visitors (including search engine spiders) happened to click on all 50,000 articles. Then somebody notices a small typo in one of the articles, we fix it and save the page, but by doing that we expire all 50,000 articles. That means our users will collectively have to endure 50,000 slow page loads again in the coming days before all articles are cached again. They won't be happy. And anyway, why do users have to spend their time to generate Boost cache on our website?!
Enter Boost Crawler, which is in fact a new sub-module of the Boost module. First of all, its name is incorrect — it does not crawl the way search engine spiders do, discovering and following links. What it does is pre-caching pages that happen to expire from Boost cache. And it does that in the backround, using another excellent helper module called HTTP Parallel Request & Threading Library. If our 50,000 articles get expired, at each cron run the HTTPRL module spawns several background processes and invisibly caches a number of the pages without bothering the user. Obviously, shortly after the mass expiration, it will still happen that our users will request a page that is not yet re-cached by HTTPRL, but over time the chances of that happening will continue to diminish.
The Recipe
Here's a complete set of steps to configure the magic bullet for your Drupal 7 performance.
Get Boost, Cache Expiration, and HTTP Parallel Request & Threading Library. On the Modules page (/admin/modules), enable "Boost", "Boost Crawler" and "Cache Expiration". You may immediately enable also "HTTP Parallel Request & Threading Library" — or it will be enabled automatically since it is required by Boost Crawler.
On the Boost Settings tab (/admin/config/system/boost), adjust the Maximum Cache Lifetime to a long period, e.g. 1 week, or even longer. Make sure the Minimum Cache Lifetime remains zero. (Leave XML and Ajax/JSON file caching off for the time being — you can activate those later if needed.)
Boost will not function properly while Drupal core cache is enabled. Go to the Performance page (/admin/config/development/performance) and make sure that "Cache pages for anonymous users" is NOT checked.
Review the .htaccess Settings subtab on the .htaccess tab (/admin/config/system/boost/htaccess). For most Apache servers, you should not need to change any of them.
Time to adjust your .htaccess file on the server. Go to the .htaccess Generation subtab on the .htaccess tab (/admin/config/system/boost/htaccess/generator), copy the generated rules, and paste them in the required location in your .htaccess file (see instructions at the end of the page).
Go to the Status report page (/admin/reports/status) and check for any problems related to Boost. For example, you may see something like "Boost Cache path cache/normal: does not exist". The problem is that Boost expects to have a folder called "cache" (to hold all the cached files) not in the usual folder (under /files) but in the root of your Drupal installation. In such case, create folder "cache" manually in the root (where your index.php file is) and give it such permissions that Apache can read and write to it (tip: it needs the same permissions as the generated folders in your files directory have). (Better still, create a cache folder out of your web path and symlink to it.)
Let's test whether your anonymous pages are getting cached. Go to the Blocks page (/admin/structure/block) and show block "Boost: Pages cache status" in one of the regions — I often put it below the footer. Save the blocks page. Make sure you permit this block to show only for trusted roles (/admin/structure/block/manage/boost/status/configure).
List of Ingredients
Not all of the following modules are necessary, but all of them contribute to the smoothest possible intelligent file caching experience.
Magic potion ingredients
Recommended seasoning
See also the Related Modules tab at /admin/config/system/boost/listmodules
Friday 16. August, 2013, Brussel, Belgium |
Drupal Drupal 7 cache performance speed Boost Crawler Cache Expiration expire file user experience search engine optimization Varnish