Showing posts with label etags. Show all posts
Showing posts with label etags. Show all posts

Tuesday, August 26, 2008

Broken ETags in the Wild

I've had a good teacher. Just noting some of the things I've come across.

Default Apache httpd configuration in a cluster with multiple filesystems

By default, Apache httpd generates an ETag of the format inode-filesize-timestamp. A symptom of this sort of physical architecture is seeing ETags for the same resource and representation that have the same last part; e.g. "518854-3504d-ce290380" and "c8578-3504d-ce290380", and probably the same Last-Modified value too. This can be fixed by changing the Apache configuration or ignoring the ETag and just using Last-Modified. I would normally recommend retaining the ETag and just using the MTime and Size parts to calculate the ETag value.

EScenic CMS

I'm not sure whether this is an application developer issue, or a problem with the EScenic server itself, but the ETag values that I've seen from this server aren't quoted.

Etag: 20080523124147BST-39-6

The ETag value should be a quoted string


Django on Google App Engine 1.0

The version of Django 0.96.1 that shipped with Google App Engine had a similar problem to the EScenic Server, so I'd recommend bundling Django with your app until Google update the bundled version. I went to fix this in Django trunk, only to find someone beat me to it!

Friday, July 20, 2007

Good ETag support requires thinking about it up-front

I posted a comment on this but I thought it worthwhile going a little deeper.

Blogger doesn't support Trackback so I'll just post and link.

My point was not to argue about how little code is required to implement sending an ETag and checking an ETag based on the MD5 hash of your content (that's pretty much a library issue which should level out to be equal over time) but to go a little deeper into ETags.

I've been reading Sam Ruby long enough to have had the benefit of ETags drummed into me. The posts that Bill links to are focused on the network savings aspect of conditional GET. But you can also save server processing power, if you put a little more thought into your application model.

So we come back to the requirements for Java frameworks to support ETags such that it is possible to avoid doing the bulk of the server side processing. Caveat this could well be premature optimization, and is merely me thinking out loud. Struts is the one I'm most familiar with and I think with the struts-chain RequestProcessor, this approach could be used, but anything that works as a chain would do for this (so pure Filters would also work).



public void doFilter(ServletRequest request, ServletResponse response,
FilterChain filterChain) throws IOException, ServletException {

HttpServletRequest httpRequest = (HttpServletRequest) request;

/*
* Do something that works out what is required to render a response
* for this request and generate an ETag based on that. So here we
* have moved away from the approach of generating ETags from MD5
* hashes of the response body.
*/
ETag currentResourceETag = calculateETag(httpRequest);
ETag incomingETag = extractETag(httpRequest);

if (currentResourceETag.equals(incomingETag)) {
response.sendStatus(HttpServletResponse.SC_NOT_MODIFIED);
} else {
filterChain.doFilter(request, response);
response.addHeader("Etag", currentResourceETag.stringValue());
}

}

You would need to be able to obtain the items responsible for determining the ETag value reasonably early in the request processing, before any really expensive operations. Not sure what implications that has for the layers in your application, or if you were using strict MVC how disruptive / worthwhile it would be to try this approach...