XBaseToPg

REPLACEMENT NOTICE!

I'm dropping my work on this project in favor of PgDBF, which is functionally identical, faster, and easier to maintain. If this is your first time here, use that project instead. If you're already using XBaseToPg in production, you should probably switch anyway.

The code is still online at https://github.com/kstrauser/pgdbf/tree/master/ancient for anyone who really wants it.

Introduction

So, you want to convert some FoxPro tables to PostgreSQL, huh? You're in the right place. Although this project works well for one-time data migrations, it's designed and highly optimized for regularly scheduled jobs.

Installation

  1. Download the "xbase64-3.1.2.tar.gz" archive from the Xbase project at Sourceforge.
  2. Extract it and apply the XbaseToPg patch below.
  3. Do the usual configure/make/install shuffle. If you can guarantee that your datafiles will not change while the conversion program is running (eg you've copied them from a fileserver to a local file), consider using the --without-xbase-locking to get drastic performance improvements.

Usage

Use dumprecspg to convert your FoxPro tables into a format suitable for piping into the psql client. dumprecspg will generate the commands necessary to create near-exact replicas of your tables. Its output looks something like:

begin;
drop table invoice;
create table invoice (invoiceid integer, note char(50), billdate date, submitted timestamp, memofield text, approved boolean);
\copy invoice from stdin
...
...
...
\.
commit;

Note that the entire process is wrapped inside a transaction so that other clients will have access to the old data until the transaction is completed.

Indices are automatically created if you specify the columns (or expressions!) you want indexed on the command line. For example,

dumprecspg foo.dbf rowid "substr(textfield,1,4)" price

will create three indices on the new foo table: one each for the rowid and price columns, and one for the substr() expression. It tries to give each index a reasonable name.

Notes

We use this code in production to process several gigabytes of data daily. While I obviously can't guarantee that it will work perfectly for you, my boss would choke me if I couldn't keep it running here.

Some of the code is ugly. When choosing between performance and elegance, speed won every time. I still tried to make it as clean and maintainable as possible, but what you're getting is the result of many profiling runs. I no longer have that data laying around, but I do remember shaving about 90% off its runtime so I thought it was an acceptable compromise.

Bugs

It doesn't check that a table exists before dropping it. This causes an error in PostgreSQL if the table doesn't exist. Workaround: create a trivial table with the same name, ala:

CREATE TABLE invoice (foo integer);

If a table has memo fields, they'll be stored in tablename.fpt. Be sure that the case matches between that table and your main .dbf table! For example, if your data table is named invoice.dbf, be sure your memo table is named invoice.fpt, not Invoice.FPT or iNvOiCe.FpT! Windows doesn't care about case. Unix isn't so accepting.

blog comments powered by Disqus