More On Migrating From Drupal To Blogofile

I wrote an earlier post on converting a Drupal site to Blogofile. I couldn’t be happier with how that turned out as it allowed me to immediately start using Blogofile while still keeping my old Drupal content online. Sweet! But who wants to continue running the two systems in parallel forever? I certainly don’t, and after a code sprint I came up with a way to completely transition off Drupal.

First, I manually converted the most popular nodes on my Drupal site to Blogofile. I moved all of my Project content to GitHub, and wrote a Blogofile photo album that replaced 99% of the Drupal photo album functionality that I actually used. At the end of this process, there were only Blog, Story, and Page nodes left in Drupal.

Next, I changed drupalmigrate.py to generate a bunch of Blogofile post files for each of those remaining Drupal nodes. I added these settings to my _config.py:

controllers.drupalmigrate.makeposts = False
controllers.drupalmigrate.mainusername = 'kirk'
controllers.drupalmigrate.startpostnum = 1000

then ran “../bin/blogofile build -v” to create about 70 new post files. When that finished successfully, I removed those settings.

Finally, I added new code to drupalmigrate.py to generate a different set of RewriteRules that redirect the old Drupal node permalinks to their new Blogofile locations. It makes rules like:

RewriteRule ^/yet-another-python-map(/|$) /2008/04/16/yet-another-python-map/ [R=301,L]

where /yet-another-python-map is the node’s old location in Drupal and /2008/04/16/yet-another-python-map/ is the new Blogofile URL. To activate that, I added this to my _config.py:

controllers.drupalmigrate.makepermalinkredirs = True
controllers.drupalmigrate.redirectrulefile = '_generatedfiles/redirectrewriterules.txt'

With this in place, there was nothing left in Drupal so I decommissioned it and changed my Apache configuration to something like:

    <VirtualHost *:80>
        ServerName honeypot.net
        ServerAlias www.honeypot.net
        CustomLog /var/log/httpd/honeypot.net-access.log combined
        DirectoryIndex index.html
        DocumentRoot /usr/local/www/honeypot.net/honeypot/_site
        <Directory /usr/local/www/honeypot.net/honeypot/_site>
            Order Deny,Allow
            Allow from All
        </Directory>

        RewriteEngine On
        Include /usr/local/www/honeypot.net/honeypot/_generatedfiles/redirectrewriterules.txt
    </VirtualHost>

I’ll leave the Drupal node redirects in there permanently so that all the links to my site will continue to work.

On Generated Versus Random Passwords

I was reading a story about a hacked password database and saw this comment where the poster wanted to make a little program to generate non-random passwords for every site he visits:

I was thinking of something simpler such as “echo MyPassword69! slashdot.org|md5sum” and then
“aaa53a64cbb02f01d79e6aa05f0027ba” using that as my password since many sites will take 32-character long passwords or they
will truncate for you. More generalized than PasswordMaker and easier to access but no alpha-num+symbol translation and only
(32) 0-9af characters but that should be random enough, or you can do sha1sum instead for a little longer hash string.

I posted a reply but I wanted to repeat it here for the sake of my friends who don’t read Slashdot. If you’ve ever cooked up your own scheme for coming up with passwords or if you’ve used the PasswordMaker system (or ones like it), you need to read this:


DO NOT DO THIS. I don’t mean this disrespectfully, but you don’t know what you’re doing. That’s OK! People not named “Bruce” generally suck at secure algorithms. Crypto is hard and has unexpected implications until you’re much more knowledgeable on the subject than you (or I) currently are. For example, suppose that hypothetical site helpfully truncates your password to 8 chars. By storing only 8 hex digits, you’ve reduced your password’s keyspace to just 32 bits. If you used an algorithm with base64 encoding instead, you’d get the same complexity in only 5.3 chars.

Despite what you claim, you’re really much better off using a secure storage app that creates truly random passwords for you and stores them in a securely encrypted file. In another post here I mention that I use 1Password, but really any reputable app will get you the same protections. Your algorithm is a “security by obscurity” system; if someone knows your algorithm, gaining your master password gives them full access to every account you have. Contrast with a password locker where you can change your master password before the attacker gets access to the secret store (which they may never be able to do if you’ve kept it secure!), and in the worst case scenario provides you with a list of accounts you need to change.

I haven’t used PasswordMaker but I’d apply the same criticisms to them. If an attacker knows that you use PasswordMaker, they can narrow down the search space based on the very few things you can vary:

  • URL (the attacker will have this)
  • character set (dropdown gives you 6 choices)
  • which of nine hash algorithms was used (actually 13 – the FAQ is outdated)
  • modifier (algorithmically, part of your password)
  • username (attacker will have this or can likely guess it easily)
  • password length (let’s say, likely to be between 8 and 20 chars, so 13 options)
  • password prefix (stupid idea that reduces your password’s complexity)
  • password suffix (stupid idea that reduces your password’s complexity)
  • which of nine l33t-speak levels was used
  • when l33t-speak was applied (total of 28 options: 9 levels each at three different “Use l33t” times, plus “not at all”)

My comments about the modifier being part of your password? Basically you’re concatenating those strings together to create a longer password in some manner. There’s not really a difference, and that’s assuming you actually use the modifier.

So, back to our attack scenario where a hacker has your master password, username, and a URL they want to visit: disregarding the prefix and suffix options, they have 6 * 13 * 13 * 28 = 28,392 possible output passwords to test. That should keep them busy for at least a minute or two. And once they’ve guessed your combination, they can probably use the same settings on every other website you visit. Oh, and when you’ve found out that your password is compromised? Hope you remember every website you’ve ever used PasswordMaker on!

Finally, if you’ve ever used the online version of PasswordMaker, even once, then you have to assume that your password is compromised. If their site has ever been compromised – and it’s hosted on a content delivery network with a lot of other websites – the attacker could easily have placed a script on the page to submit everything you type into the password generation form to a server in a distant country. Security demands that you have to assume this has happened.

Seriously, please don’t do this stuff. I’d much rather see you using pwgen to create truly random passwords and then using something like GnuPG to store them all in a strongly-encrypted file.


The summary version is this: use a password manager like 1Password to use a different hard-to-guess password on every website you visit. Don’t use some invented system to come up with passwords on your own because there’s a very poor chance that we mere mortals will get it right.

Migrating Drupal To Blogofile

I have a Drupal site with nearly a thousand nodes, several having over 100,000 hits. I wanted to migrate to Blogofile but absolutely did not want to start over or make this a major hassle. Instead, I used some Apache RewriteRules to gradually and seamlessly switch from Drupal to Blogofile one post at a time. Here’s how I did it, using my site’s real name of http://honeypot.net/ to give concrete examples:

  1. Migrated my comments to Disqus. Drupal’s own disqus module has built-in export functionality to make this a snap.
  2. Set up internal DNS to add a new A record, drupal.honeypot.net, for my Drupal site. Since this is a site that no one ever needs to visit directly, an entry in /etc/hosts should work just as well.
  3. Renamed my Apache’s honeypot.net.conf to drupal.honeypot.net.conf and changed the ServerName to “drupal.honeypot.net”.
  4. Created a new honeypot.net.conf to serve only static Blogofile files while passing all other requests through to drupal.honeypot.net. Here’s the shortened version of it:

    <VirtualHost *:80>
        ServerName honeypot.net
        ServerAlias www.honeypot.net
        CustomLog /var/log/httpd/honeypot.net-access.log combined
        DirectoryIndex index.html
        DocumentRoot /usr/local/www/honeypot.net/honeypot/_site
        <Directory /usr/local/www/honeypot.net/honeypot/_site>
            Order Deny,Allow
            Allow from All
        </Directory>
    
    
    
    # Serve some stuff locally but pass everything else through to Drupal
    RewriteEngine On
    Include /usr/local/www/honeypot.net/honeypot/_generatedfiles/rewriterules.txt
    RewriteRule ^/?(.*)$ http://drupal.honeypot.net/$1 [P,L]
    

    </VirtualHost>

The last lines are where the magic happens. I created a new Blogofile controller: drupalmigrate.py. It creates the file mentioned above (rewriterules.txt) that tells Apache not to tamper with any file generated by Blogofile. That file has entries like:

RewriteRule ^/$ - [L]
RewriteRule ^/2011(/|$) - [L]
RewriteRule ^/archive(/|$) - [L]
RewriteRule ^/category(/|$) - [L]
RewriteRule ^/favicon.ico$ - [L]
RewriteRule ^/feed(/|$) - [L]
RewriteRule ^/filtering-spam-postfix(/|$) - [L]
RewriteRule ^/index.html$ - [L]
RewriteRule ^/my-ecco-shoes-are-junk(/|$) - [L]
RewriteRule ^/page(/|$) - [L]
RewriteRule ^/robots.txt$ - [L]
RewriteRule ^/scam-calls-card-services(/|$) - [L]
RewriteRule ^/theme(/|$) - [L]

for every file in Blogofile’s _site directory. When an incoming web request matches one of those patterns, Apache stops processing any further RewriteRules and serves the file directly from the DocumentRoot directory. If none of those pattern matches, the final RewriteRule in my honeypot.net.conf file makes a proxy request for the same path from http://drupal.honeypot.net/ .

That is, if a visitor requests http://honeypot.net/robots.txt, the RewriteRule in rewriterules.txt will cause Apache to serve the file from /usr/local/www/honeypot.net/honeypot/_site/robots.txt. If a visitor requests http://honeypot.net/something-else, Apache will request the file from http://drupal.honeypot.net/something-else and return the results. None of this is visible to the user. They don’t receive any redirects, links to drupal.honeypot.net, or any other indication that they’re seeing content from two different systems.

The controller also creates a list of every node in my Drupal database so I can <%include /> it into my site’s index.html.mako file like this:

<hr />
<p>These posts haven't been converted to the new system but are still available:</p>
<%include file="_templates/olddrupalindex.mako" />

When drupalmigrate.py is generating the site index, it skips any Drupal nodes that have the same permalink as a Blogofile post. As I take my time converting my Drupal content, more and more will be served by Blogofile until eventually none is left.

By starting with the most popular content and working my way down, I can make sure that all my heavy-traffic pages are served as lightning-fast static pages. This could even work as a form of website accelerator where popular pages are “compiled” by Blogofile for fast access. By fast, I mean that my untuned home server can sustain about 9,100 hits and 240MB of traffic per second. Until Google decides to use me for their home page, I think that’ll be sufficient for my little site.