How To: Optimize Your Apache Site with Mod Deflate

The title of this post might be a little cryptic to those not familiar with the Apache webserver, but this post is a sort of followup to Paul Buchheit's recent post "Make your site faster and cheaper to operate in one easy step" as well as a response to a recent Skribit suggestion. The step he's referring to is getting your web server to utilize gzip encoding.

PaulStamatiou.com is gzipped!
Check to see if your site is gzipped with gzipcheck.

Paul Buchheit goes over the reasons why you should use gzip encoding — from 4-to-1 compression of HTML files to the reduced costs associated with serving smaller files. However he doesn't mention the specific ways of how you can get that running on your site or blog.

Using these numbers, we can estimate that it would cost 1.88 to gzip 1TB of data on Amazon EC2, and 174 to transfer 1TB of data. If you instead compress your data (and get 4-to-1 compression, which is not unusual for html), the bandwidth will only cost 43.52

There are a myriad of server software setups but I'll address one of the most popular HTTP web servers: Apache.

So just a run-through of why you should consider enabling mod_deflate:

  • Enabling gzip compression will reduce file sizes at the expense of slightly increased CPU utilization (I find that to be negligible).
  • Smaller files served to your clients means less bandwidth used, as well as faster transfer time which means the client gets the page faster and your server can proceed to serving the next client.

Notice

This article might not work for you without some tinkering. Apache install locations can vary by your setup or that of your webhost. For the purposes of this article, I am using a Media Temple (dv) server which has a Cent OS and Plesk setup with Apache installed in /etc/httpd.

Enter mod_deflate

As defined by Apache documentation, the deflate module "provides the DEFLATE output filter that allows output from your server to be compressed before being sent to the client over the network." In other words, it compresses files without you having to explicitly compress individual files on your own. However, this could become a problem if Apache ends up compressing files you have already compressed or if it decides to compress images in your blog posts, potentially making them look worse. For that reason, it's important that mod_deflate is configured properly.

Configuring

First we need to load up the actual mod_deflate.so module. If you are using Apache 2, then you likely already have mod_deflate installed. Just to be sure though, go to your Apache httpd.conf file (/etc/httpd/conf/httpd.conf for me) and place the following line if it's not already there:

LoadModule deflate_module modules/mod_deflate.so

Apache httpd.conf - place modules here
Look for this section and place the line above anywhere within.

The next step is telling mod_deflate how to work. Instead of working with httpd.conf, we will want to place the upcoming lines in the appropriate vhost.conf file, if your server uses a vhost configuration. For example, I created my vhost file in /var/www/vhosts/paulstamatiou.com/conf/vhost.conf.

If you're not sure, you can put it in httpd.conf, but the custom mod_deflate logging I created won't work due to this issue outlined by Apache documentation:

If CustomLog or ErrorLog directives are placed inside a <VirtualHost> section, all requests or errors for that virtual host will be logged only to the specified file. Any virtual host which does not have logging directives will still have its requests sent to the main server logs.

We'll start by placing these lines in the appropriate vhost.conf file to configure mod_deflate:

<IfModule mod_deflate.c>
SetOutputFilter DEFLATE

example of how to compress ONLY html, plain text and xml

AddOutputFilterByType DEFLATE text/plain text/html text/xml

Don't compress binaries

SetEnvIfNoCase Request_URI .(?:exe|t?gz|zip|iso|tar|bz2|sit|rar) no-gzip dont-vary

Don't compress images

SetEnvIfNoCase Request_URI .(?:gif|jpe?g|jpg|ico|png) no-gzip dont-vary

Don't compress PDFs

SetEnvIfNoCase Request_URI .pdf no-gzip dont-vary

Don't compress flash files (only relevant if you host your own videos)

SetEnvIfNoCase Request_URI .flv no-gzip dont-vary

Netscape 4.X has some problems

BrowserMatch ^Mozilla/4 gzip-only-text/html

Netscape 4.06-4.08 have some more problems

BrowserMatch ^Mozilla/4.0[678] no-gzip

MSIE masquerades as Netscape, but it is fine

BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

Make sure proxies don't deliver the wrong content

Header append Vary User-Agent env=!dont-vary

Setup custom deflate log

DeflateFilterNote Input instr DeflateFilterNote Output outstr DeflateFilterNote Ratio ratio LogFormat '"%r" %{outstr}n/%{instr}n %{ratio}n%%' DEFLATE CustomLog logs/deflate_log DEFLATE </IfModule>

Before you save the file, I'll explain what these lines do. There are two ways of settings up deflate filtering:

  • allowing ONLY certain types of files (AddOutputFilterByType)
  • OR
  • allowing ALL except certain file extensions (SetEnvIfNoCase)
If you aren't too sure what types of files you're serving, it's a safe bet to use AddOutputFilterByType and only compress a few known filetypes. Otherwise, keep those lines as is and alter what file types you don't want compressed with the SetEnvIfNoCase lines. I have it setup for my server to exclude common image file types, PDFs, FLVs as well as common binaries, so this will likely be fine for your uses as well.

As for those BrowserMatch lines, they are recommended client compression exclusions outlined by Apache documentation, but if you ask me I doubt you really have to worry about breaking Netscape 4 users' experiences.

The last bit of those lines deals with a custom log for mod_deflate. While not necessary I find it to be one of the more interesting things you can do with mod_deflate. The log shows all HTTP requests and displays the file sizes before and after compression, as well as listing that ratio. If you're so inclined, you can do cool things like run through your logs with a perl script and find out how much bandwidth you've been saving each month by using mod_deflate.

Example deflate log snippet from PaulStamatiou.com
Example deflate log snippet. (sudo tail -f /etc/httpd/logs/deflate_log)

Save that file when you're done tinkering and restart Apache for the changes to take effect.

/etc/init.d/httpd restart

Visit gzipcheck to make sure Apache accepted the changes and is serving up compressed files! From there, you can tinker with some more interesting mod_deflate configurations. For example, if you have a beefy server you can set a higher compression level with DeflateCompressionLevel and save even more bandwidth.

Tip of the Iceberg

This post was meant to highlight an easy way to speed up your site and save money, but mod_deflate is not the be-all and end-all site optimization trick. There are tons of ways to speed up your site from both the server side of things and by optimizing the actual website itself. If you want to read up on some Apache tuning tips, O'Reilly has some good books worth checking out.

Related Resources:

Do you use any compression tool like mod_deflate with your current server setup? What else do you do to ensure your server and site run efficiently?

Update: Grzegorz Daniluk suggests an alternate BrowserMatch setup to avoid compressing files for Internet Explorer 6 as it sometimes has an issue with compressed files. However, I did some testing on my own and wasn't able to reproduce any issues where a gzipped site loaded a blank page in IE6.