Wednesday, February 17, 2010

Configure or Remove ETags in Apache/HTTP

There are various steps you can perform to optimize your site.

One of them is to put some sort of cache/expiry mechanism so that if some client visits your site the second time then they need not to download the whole data, instead of that they just need to download the data which is changed respective of their last visit.

If you want to put on some sort of cache/expiry mechanism on the content served from your web server then you can use following methods:
  1. Using mod_expires module
  2. Use ETAGs (Entity Tags)
Here i will be explaining you how ETags works and how to configure/remove then according to your needs.

What are ETags?
Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser’s cache matches the one on the origin server.

ETag is a validator which can be used instead of, or in addition to, the Last-Modified header.

By sending a ETag, the server promises that the content is not changed until the ETag changes for a specific resource.

How ETags works:
The origin server specifies the component’s ETag using the ETag response header.
  1. The client requests a static resource (for ex: /a/test.gif) from server. Following is the header of a file client will receive from the server:
  2. HTTP/1.1 200 OK Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT ETag: "10c24bc-4ab-457e1c1f" Content-Length: 12195
  3. Later, when the client requests same file from the server and browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server. If the ETag match, a 304 status code is returned reducing the response by 12195 bytes for this example.
    Following is the header of the file requested again from the same server:
    GET /a/test.gif HTTP/1.1
    Host: www.geekride.com
    If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT
    If-None-Match: "10c24bc-4ab-457e1c1f"
    HTTP/1.1 304 Not Modified
    *** 304 status code means that the file you are requesting for, is the same which is present in the cache of the browser. (For more details about apache status codes please refer to: http-apache-status-error-codes)

Where not to use:

The problem with the ETags is that they are generated with attributes that make them unique to a server.

By default, Apache will generate an Etag based on the file’s inode number, last-modified date, and size.

So, if you have one file on multiple servers with same file size, permissions, timestamp, etc., even after that their ETag won’t be same as they can’t have the same inode number.

So, This creates the problem in the scenarios where you are having a cluster of web servers to serve the same content.

When a file is served from one server and later on validated from another server then the ETags for that file won’t match and hence complete file will be fetched again.

That means if you are having a cluster serving as a web server, then you shouldn’t use ETags.

Configure ETags:
ETags are configured by default, so you don’t need to do anything to configure them.

Remove ETags:
You need to follow following steps to remove the ETags from respective web servers.
  1. Apache: In apache you need to add this entry to the apache config file and restart the apache.
    Header unset
    ETagFileETag none
  2. IIS: This article explains how to remove ETags in IIS.
  3. Lighttpd: For lighttpd server you can disable ETags by putting this in the config file.
    static-file.etags = 'disable'

References:

  1. http://httpd.apache.org/docs/2.0/mod/core.html
  2. http://developer.yahoo.net/blog/archives/2007/07/high_performanc_11.html
  3. http://www.webscalingblog.com/performance/caching-http-headers-last-modified-and-etag.html

No comments:

Post a Comment