This site is archived.

Sanitary migrations with XMLRPC

From Drupal to Drupal over XMLRPC proxies
Code & Development

Sanitary migrations with XMLRPC

45 minutes (+15 minutes Q&A)

Room:

tags

XML | migration | database | unicode | data | fidelity

This is a case study of the work involved migrating a Drupal 5 site to Drupal 6.

If your D5 site only has only core modules enabled, migration is pretty straightforward. Start adding contributed modules to the mix and - as long as you don't use unknown or unsupported code - you can still migrate with a minimum amount of pain. But what about when:

  • The migration will seriously break the theme
  • The client decides that migration is a good time to completely re-theme
  • ... and add new functionality
  • ... and change to new, funky, exciting modules only available on D6 or only recently available
  • Image management changes
  • ... or there's even no existing image management at all
  • ... or document management
  • The site history is lost in the mists of time
  • ... with occasional zombie configuration that springs to life, from before your time
  • ... where you don't know what configuration you won't need, or might mess up later
  • A staged migration is required, keeping the two sites in sync until the switch-over

Sometimes there's nothing else to do but start with a fresh Drupal 6 site, and begin to migrate the content. The obvious choice for doing this is a framework like Cyrve's Migrate module, which sits on top of Schema, Views and Table Wizard and provides a semi-graphical route for planning raw database: we'll explain the limitations of this with Drupal's multiply distributed internal table structure. We'll also explain why stepping outside Migrate and converting from database to database doesn't help.

Instead, we'll present a case study of how we used an XMLRPC bridge - D5 providing, D6 consuming - to map our data onto D6 internal objects and ensure high fidelity, and to provide a staged, synced migration with Batch API. This will include:

  • Vagaries of the Drupal services module
  • Consuming webservices in PHP
  • Importing nodes, vocabularies and users
  • Writing new webservices
  • Batch API for asynchronous running (and the problems you might still get)
  • Tracking what's already been imported
  • Reporting errors including Unicode problems
  • Introducing or migrating image management to ImageField/ImageCache
  • Mapping URLs when images and other resources have moved

While our approach hasn't been perfect, the eventual high fidelity of the import has been excellent, and we hope that a webservice approach it will lead to healthy discussion about the problems and pitfalls of migration.

Resources

Thanks for mentioning migrate

7. July 2010 - 20:07

Thanks for mentioning migrate module. Note that version2 no longer depends on TW and Schema and Views. So some of the limitations you mention may no longer be present.

The primary limitation with

8. July 2010 - 16:23

The primary limitation with wiring D6 Migrate into a D5 database isn't really anything to do with Migrate!

D5 CCK will have already spread its content over many, many tables - the necessary evil of ORM, I guess. That means that trying to reconstruct your content in terms of collections of tables is very much a guessing game, as only the D5 site knows for sure how your D5 content types are "really" meant to be reassembled. You can maybe migrate the content type configuration first - not sure if Migrate 2 supports taking a D6 content type and looking for tables which resemble it - but typically the new content will need to be sufficiently different (e.g. moving to imagecache) or an old D5 CCK submodule won't exist for D6, so any D6 content type reconstruction won't help so much.

If you've got a handful of content types then you can: attempt an import; check the fidelity; tweak your config; then re-attempt, but if you've got e.g. 20 or 30 content types, with multiple CCK fields on each, then you can easily run to a hundred or more content_* tables, all of which need to be wired up. And there's no guarantee you haven't missed a table, whereas if you get a node over XMLRPC then you can treat every single field_* and if need be raise errors when one is missed out.