GAP School Module 10 — Production Hardening Lesson 10.2

Most WordPress operations that feel risky are actually safe if you do them in the right order. The naive approach — rename a meta key in place, update all callers in one deploy — produces reads that return empty for every record created before the rename. The safe approach is a multi-phase migration that leaves the site running correctly throughout.


The situation

The Anchor build needed a schema migration at month 4: the lead source tracking field needed to expand from a text meta value to a taxonomy. The naive approach was to rename the meta key and update all callers in one deploy. The safe approach was a 3-phase migration. The migration took 2 deploys and a background cron job — 4 days elapsed, 0 customer-facing impact.


What I did

The expand-then-contract pattern

For any rename or structural migration, the pattern is: add new alongside old → background-migrate → switch reads → remove old. Never combine the first and third steps:

PHP — Phase 1: write to both
function [client]_save_lead_source( int $lead_id, string $source ): void { // Legacy field — keep writing during migration update_post_meta( $lead_id, '[client]_lead_source', $source ); // New field — start populating wp_set_object_terms( $lead_id, $source, '[client]_lead_source_tax' ); }
PHP — Phase 2: background migration
function [client]_migrate_lead_sources(): void { $leads = get_posts( [ 'post_type' => '[client]_lead', 'posts_per_page' => 100, 'meta_key' => '[client]_lead_source', 'meta_compare' => 'EXISTS', 'fields' => 'ids', ] ); foreach ( $leads as $lead_id ) { $source = get_post_meta( $lead_id, '[client]_lead_source', true ); if ( $source && ! wp_get_object_terms( $lead_id, '[client]_lead_source_tax' ) ) { wp_set_object_terms( $lead_id, $source, '[client]_lead_source_tax' ); } } }
PHP — Phase 3: read from new with fallback
function [client]_get_lead_source( int $lead_id ): string { $terms = wp_get_object_terms( $lead_id, '[client]_lead_source_tax' ); if ( ! empty( $terms ) && ! is_wp_error( $terms ) ) { return $terms[0]->slug; } // Fallback for records not yet migrated return get_post_meta( $lead_id, '[client]_lead_source', true ) ?: ''; }

Batch operations with sleep intervals

Bulk operations on large datasets run in batches with a pause between passes to avoid locking the database under live traffic:

PHP
function [client]_backfill_in_batches( callable $process, int $batch_size = 50 ): int { $offset = 0; $processed = 0; do { $ids = [client]_get_next_batch( $offset, $batch_size ); if ( empty( $ids ) ) { break; } foreach ( $ids as $id ) { $process( $id ); $processed++; } $offset += $batch_size; usleep( 100000 ); // 100ms pause between batches } while ( count( $ids ) === $batch_size ); return $processed; }

Feature flags for risky changes

Changes that can’t be cleanly rolled back go behind a WP option flag. Flip the flag, watch metrics, flip it back if something breaks — no deploy required:

PHP
function [client]_feature_enabled( string $flag ): bool { static $flags = null; if ( $flags === null ) { $flags = get_option( '[client]_feature_flags', [] ); } return ! empty( $flags[ $flag ] ); }

Why it matters

The expand-then-contract pattern’s cost is one extra deploy cycle. The benefit is that the site continues functioning correctly throughout the migration. If the migration fails partway through phase 2, the site still runs on the old structure — no data is lost, no readers are broken.

The batch sleep pattern matters under real load. A loop that updates 1,000 rows without breathing will lock the wp_postmeta table long enough to cause 504 timeouts on inventory queries running concurrently.


The Anchor build

3 schema migrations using the expand-then-contract pattern. 0 downtime windows required for any of them. The lead source migration took 2 deploys and a background cron job — total elapsed time 4 days, 0 customer-facing impact. The batch backfill ran at 50-row batches with 100ms sleeps: 4,200 lead records processed in 14 minutes with no lock-induced errors in the error log.


Do this, not that

  • Never rename a meta key in place. Add the new key alongside the old, migrate, then remove. Rename-in-place produces reads that return empty for every record created before the rename.
  • Run bulk operations in batches with sleep between passes. A loop with no pause is a table lock waiting to happen. 50–100 rows per batch with 100ms sleep is the pattern.
  • Use feature flags for changes you can’t test on real traffic. The flag costs one get_option call. The ability to flip a bad change off without a deploy is worth more than that.
  • Don’t combine the “add new structure” deploy with the “remove old structure” deploy. They are two deploys. The migration sits between them. Time pressure is the only reason to collapse them and it’s never worth it.
When you’re ready to build

The lessons are yours. When you want it built, we’re here.

Every lesson stays free — no account, no paywall, no email gate, ever. But if you’d rather have this system standing on your business than wire all 48 lessons yourself, leave your email. We’ll send you a direct line to a build — and you’ll be first to hear when we add new tools to the curriculum.

None of this gates a single lesson. The curriculum was free before you got here and it stays that way.

We’ll use your email to send you a fast-track to a GAP build and occasional notes on how GAP builds digital sales departments. Lessons stay 100% free — no email required to read any of them. We never share or sell your information. Unsubscribe any time. Privacy policy at gapindustriesllc.com/privacy.html.

Done learning how it’s built? We’ll build it.

You came here to understand the system, and now you do. If you’d rather have it standing on your business than spend the next three months wiring it yourself, GAP Concierge is the same architecture from these lessons — a white-label AI agent that knows your catalog and captures your leads — set up for you, from $97/mo.

See GAP Concierge →