More On Migrating From Drupal To Blogofile

I wrote an earlier post on converting a Drupal site to Blogofile. I couldn’t be happier with how that turned out as it allowed me to immediately start using Blogofile while still keeping my old Drupal content online. Sweet! But who wants to continue running the two systems in parallel forever? I certainly don’t, and after a code sprint I came up with a way to completely transition off Drupal.

First, I manually converted the most popular nodes on my Drupal site to Blogofile. I moved all of my Project content to GitHub, and wrote a Blogofile photo album that replaced 99% of the Drupal photo album functionality that I actually used. At the end of this process, there were only Blog, Story, and Page nodes left in Drupal.

Next, I changed drupalmigrate.py to generate a bunch of Blogofile post files for each of those remaining Drupal nodes. I added these settings to my _config.py:

controllers.drupalmigrate.makeposts = False
controllers.drupalmigrate.mainusername = 'kirk'
controllers.drupalmigrate.startpostnum = 1000

then ran “../bin/blogofile build -v” to create about 70 new post files. When that finished successfully, I removed those settings.

Finally, I added new code to drupalmigrate.py to generate a different set of RewriteRules that redirect the old Drupal node permalinks to their new Blogofile locations. It makes rules like:

RewriteRule ^/yet-another-python-map(/|$) /2008/04/16/yet-another-python-map/ [R=301,L]

where /yet-another-python-map is the node’s old location in Drupal and /2008/04/16/yet-another-python-map/ is the new Blogofile URL. To activate that, I added this to my _config.py:

controllers.drupalmigrate.makepermalinkredirs = True
controllers.drupalmigrate.redirectrulefile = '_generatedfiles/redirectrewriterules.txt'

With this in place, there was nothing left in Drupal so I decommissioned it and changed my Apache configuration to something like:

    <VirtualHost *:80>
        ServerName honeypot.net
        ServerAlias www.honeypot.net
        CustomLog /var/log/httpd/honeypot.net-access.log combined
        DirectoryIndex index.html
        DocumentRoot /usr/local/www/honeypot.net/honeypot/_site
        <Directory /usr/local/www/honeypot.net/honeypot/_site>
            Order Deny,Allow
            Allow from All
        </Directory>

        RewriteEngine On
        Include /usr/local/www/honeypot.net/honeypot/_generatedfiles/redirectrewriterules.txt
    </VirtualHost>

I’ll leave the Drupal node redirects in there permanently so that all the links to my site will continue to work.

Migrating Drupal To Blogofile

I have a Drupal site with nearly a thousand nodes, several having over 100,000 hits. I wanted to migrate to Blogofile but absolutely did not want to start over or make this a major hassle. Instead, I used some Apache RewriteRules to gradually and seamlessly switch from Drupal to Blogofile one post at a time. Here’s how I did it, using my site’s real name of http://honeypot.net/ to give concrete examples:

  1. Migrated my comments to Disqus. Drupal’s own disqus module has built-in export functionality to make this a snap.
  2. Set up internal DNS to add a new A record, drupal.honeypot.net, for my Drupal site. Since this is a site that no one ever needs to visit directly, an entry in /etc/hosts should work just as well.
  3. Renamed my Apache’s honeypot.net.conf to drupal.honeypot.net.conf and changed the ServerName to “drupal.honeypot.net”.
  4. Created a new honeypot.net.conf to serve only static Blogofile files while passing all other requests through to drupal.honeypot.net. Here’s the shortened version of it:

    <VirtualHost *:80>
        ServerName honeypot.net
        ServerAlias www.honeypot.net
        CustomLog /var/log/httpd/honeypot.net-access.log combined
        DirectoryIndex index.html
        DocumentRoot /usr/local/www/honeypot.net/honeypot/_site
        <Directory /usr/local/www/honeypot.net/honeypot/_site>
            Order Deny,Allow
            Allow from All
        </Directory>
    
    
    
    # Serve some stuff locally but pass everything else through to Drupal
    RewriteEngine On
    Include /usr/local/www/honeypot.net/honeypot/_generatedfiles/rewriterules.txt
    RewriteRule ^/?(.*)$ http://drupal.honeypot.net/$1 [P,L]
    

    </VirtualHost>

The last lines are where the magic happens. I created a new Blogofile controller: drupalmigrate.py. It creates the file mentioned above (rewriterules.txt) that tells Apache not to tamper with any file generated by Blogofile. That file has entries like:

RewriteRule ^/$ - [L]
RewriteRule ^/2011(/|$) - [L]
RewriteRule ^/archive(/|$) - [L]
RewriteRule ^/category(/|$) - [L]
RewriteRule ^/favicon.ico$ - [L]
RewriteRule ^/feed(/|$) - [L]
RewriteRule ^/filtering-spam-postfix(/|$) - [L]
RewriteRule ^/index.html$ - [L]
RewriteRule ^/my-ecco-shoes-are-junk(/|$) - [L]
RewriteRule ^/page(/|$) - [L]
RewriteRule ^/robots.txt$ - [L]
RewriteRule ^/scam-calls-card-services(/|$) - [L]
RewriteRule ^/theme(/|$) - [L]

for every file in Blogofile’s _site directory. When an incoming web request matches one of those patterns, Apache stops processing any further RewriteRules and serves the file directly from the DocumentRoot directory. If none of those pattern matches, the final RewriteRule in my honeypot.net.conf file makes a proxy request for the same path from http://drupal.honeypot.net/ .

That is, if a visitor requests http://honeypot.net/robots.txt, the RewriteRule in rewriterules.txt will cause Apache to serve the file from /usr/local/www/honeypot.net/honeypot/_site/robots.txt. If a visitor requests http://honeypot.net/something-else, Apache will request the file from http://drupal.honeypot.net/something-else and return the results. None of this is visible to the user. They don’t receive any redirects, links to drupal.honeypot.net, or any other indication that they’re seeing content from two different systems.

The controller also creates a list of every node in my Drupal database so I can <%include /> it into my site’s index.html.mako file like this:

<hr />
<p>These posts haven't been converted to the new system but are still available:</p>
<%include file="_templates/olddrupalindex.mako" />

When drupalmigrate.py is generating the site index, it skips any Drupal nodes that have the same permalink as a Blogofile post. As I take my time converting my Drupal content, more and more will be served by Blogofile until eventually none is left.

By starting with the most popular content and working my way down, I can make sure that all my heavy-traffic pages are served as lightning-fast static pages. This could even work as a form of website accelerator where popular pages are “compiled” by Blogofile for fast access. By fast, I mean that my untuned home server can sustain about 9,100 hits and 240MB of traffic per second. Until Google decides to use me for their home page, I think that’ll be sufficient for my little site.