Migrating Drupal To Blogofile

I have a Drupal site with nearly a thousand nodes, several having over 100,000 hits. I wanted to migrate to Blogofile but absolutely did not want to start over or make this a major hassle. Instead, I used some Apache RewriteRules to gradually and seamlessly switch from Drupal to Blogofile one post at a time. Here’s how I did it, using my site’s real name of http://honeypot.net/ to give concrete examples:

  1. Migrated my comments to Disqus. Drupal’s own disqus module has built-in export functionality to make this a snap.
  2. Set up internal DNS to add a new A record, drupal.honeypot.net, for my Drupal site. Since this is a site that no one ever needs to visit directly, an entry in /etc/hosts should work just as well.
  3. Renamed my Apache’s honeypot.net.conf to drupal.honeypot.net.conf and changed the ServerName to “drupal.honeypot.net”.
  4. Created a new honeypot.net.conf to serve only static Blogofile files while passing all other requests through to drupal.honeypot.net. Here’s the shortened version of it:

    <VirtualHost *:80>
        ServerName honeypot.net
        ServerAlias www.honeypot.net
        CustomLog /var/log/httpd/honeypot.net-access.log combined
        DirectoryIndex index.html
        DocumentRoot /usr/local/www/honeypot.net/honeypot/_site
        <Directory /usr/local/www/honeypot.net/honeypot/_site>
            Order Deny,Allow
            Allow from All
        </Directory>
    
    
    
    # Serve some stuff locally but pass everything else through to Drupal
    RewriteEngine On
    Include /usr/local/www/honeypot.net/honeypot/_generatedfiles/rewriterules.txt
    RewriteRule ^/?(.*)$ http://drupal.honeypot.net/$1 [P,L]
    

    </VirtualHost>

The last lines are where the magic happens. I created a new Blogofile controller: drupalmigrate.py. It creates the file mentioned above (rewriterules.txt) that tells Apache not to tamper with any file generated by Blogofile. That file has entries like:

RewriteRule ^/$ - [L]
RewriteRule ^/2011(/|$) - [L]
RewriteRule ^/archive(/|$) - [L]
RewriteRule ^/category(/|$) - [L]
RewriteRule ^/favicon.ico$ - [L]
RewriteRule ^/feed(/|$) - [L]
RewriteRule ^/filtering-spam-postfix(/|$) - [L]
RewriteRule ^/index.html$ - [L]
RewriteRule ^/my-ecco-shoes-are-junk(/|$) - [L]
RewriteRule ^/page(/|$) - [L]
RewriteRule ^/robots.txt$ - [L]
RewriteRule ^/scam-calls-card-services(/|$) - [L]
RewriteRule ^/theme(/|$) - [L]

for every file in Blogofile’s _site directory. When an incoming web request matches one of those patterns, Apache stops processing any further RewriteRules and serves the file directly from the DocumentRoot directory. If none of those pattern matches, the final RewriteRule in my honeypot.net.conf file makes a proxy request for the same path from http://drupal.honeypot.net/ .

That is, if a visitor requests http://honeypot.net/robots.txt, the RewriteRule in rewriterules.txt will cause Apache to serve the file from /usr/local/www/honeypot.net/honeypot/_site/robots.txt. If a visitor requests http://honeypot.net/something-else, Apache will request the file from http://drupal.honeypot.net/something-else and return the results. None of this is visible to the user. They don’t receive any redirects, links to drupal.honeypot.net, or any other indication that they’re seeing content from two different systems.

The controller also creates a list of every node in my Drupal database so I can <%include /> it into my site’s index.html.mako file like this:

<hr />
<p>These posts haven't been converted to the new system but are still available:</p>
<%include file="_templates/olddrupalindex.mako" />

When drupalmigrate.py is generating the site index, it skips any Drupal nodes that have the same permalink as a Blogofile post. As I take my time converting my Drupal content, more and more will be served by Blogofile until eventually none is left.

By starting with the most popular content and working my way down, I can make sure that all my heavy-traffic pages are served as lightning-fast static pages. This could even work as a form of website accelerator where popular pages are “compiled” by Blogofile for fast access. By fast, I mean that my untuned home server can sustain about 9,100 hits and 240MB of traffic per second. Until Google decides to use me for their home page, I think that’ll be sufficient for my little site.