I’ve spent I don’t know how long trying to unravel the–very very cool–mystery of htaccess and mod_rewrite on Apache. I thought I had a rule that would take old-stype URLs on my site and rewrite them to a new style. And I thought I had it working. This led me to spend about 3 hours going through some trace logs fixing 404 errors. Don’t ask me why I didn’t see it sooner, but most of the URLs I was messing with were internal.
After enough of that, I decided I had messed up a rule so I started looking some more. Finally, I found this page that was very helpful. Specifically, I learned what the difference was between $N and %N and how they worked. %N in a .htaccess file gives you a condition matched by regex in a RewriteCond line. The $N in a .htaccess file gives you access to the regex match in the first part of the RewriteRule line.
What do I mean? Here is an example for us to look at:
RewriteCond %{QUERY_STRING} &foo=([0-9]+) [NC]
RewriteCond %{QUERY_STRING} !foo= [NC]
RewriteCond %{QUERY_STRING} !bar=([0-9]+) [NC]
RewriteRule .* http://XYZedDomain.com/%1? [R=permanent]
This will look in the query string and the first test will match &foo= followed by a number. The [0-9] means match digits and the + means match one or more instances. So, &foo= must be followed by 1 digit or more. The [NC] means case doesn’t matter so FOO and Foo and fOO all match.
The second test makes sure we are not matching things like barfoo=. The ! means we NOT whatever comes after.
The third test checks that the string does not contain bar= followed by digits. I had some cases where my query string contained both foo=1234 and bar=0987. I had to handle the bar=0987 part separately (and it overrulled the foo part of the string).
But what about the %1? Well, the first test has a regex (the part within the () ). Whatever that matches gets put into the % variables. In my case, the first test has a wild match and the third test has a wild match. so %1 contains whatever was matched after &id= (in my example above it would be 1234). %2 would get the numbers from the third line (0987). There is no % variable prduced on the second test because there is no regex matching.
Unfortunately, I still don’t understand too much about plain old RewriteRule. You can do the same sort of regex matching there and use the matched part when you actually do the rewrite. However, instead of using %N you use $N. You still have to match with a regex though, as I understand it.
Why did I go through all this work? Well, have a look at the following graph:
Yep, that shows server errors from my site. At the start of this process, I had none. It then spiked. And the reason it went back down was me using htaccess and modrewrite to take any old URLs that were erroring and redirect them to where they were supposed to go. I expect a similar drop in page errors; however, I’ll have to wait a few days to see that graph as google doesn’t crawl all of my pages every day.
Here’s the graph of page errors. The long gentle slope up is from these errors here that were a Google problem. But, have a look at the page errors. I’ve got almost all of them eliminated simply by analyzing what was going on and putting a bit of work into the problem to help you guys out.
Don’t forget you can only have 1-9 in each case.
[Update 2012-12-27 07:17:13] I thought I would add an example…just in case.
[Update 2012-12-27 15:45:15] Added a graph and showed how my errors were eliminated.
[Update 2012-12-30 19:11:55] Added graph of page errors
Image from david anderson via flickr


