Wednesday, June 11, 2025

GCP: PostgreSQL Replication Slot

 

What is a PostgreSQL Replication Slot?

A replication slot in PostgreSQL is a mechanism that ensures the WAL (Write-Ahead Log) files needed by a replica (subscriber or standby server) are retained on the primary server until they are no longer needed. It essentially tells PostgreSQL.Do not delete WAL data until all connected replicas have processed it.

Why Do We Use Replication Slots?
  1. Reliable Streaming Replication: They prevent WAL segments from being deleted before a replica has received them. This ensures data     consistency and avoids data loss during replication.
  2. Support for Logical Replication: In logical replication (e.g., streaming data to a different schema or database), slots track the replication   progress.
  3. Resilient Replica Recovery: If a replica is temporarily down, WAL files are preserved until the replica reconnects and catches up.
Types of Replication Slots:

1. Physical Replication Slot:
   * Used in streaming replication.
   * Tied to the physical binary structure of the database.
2. Logical Replication Slot:
   * Used in logical replication (e.g., pub/sub model).
   * Works at the level of SQL changes (INSERT/UPDATE/DELETE).

Privileges Needed for Replication Slots

To create or drop replication slots, a user needs:

* Superuser privileges OR
* The REPLICATION role attribute.

Required privileges for:
* Creating a replication slot: pg_create_physical_replication_slot() or pg_create_logical_replication_slot()
* Dropping a slot: pg_drop_replication_slot()

Common Errors and Resolutions:

ERROR: replication slot <name> is already active                                  
Cause: The Slot is already used by another process                    
Resolution: Disconnect the existing replication client or use a different slot
FATAL: could not receive data from WAL stream

ERROR: requested WAL segment has already been removed                   
Cause: WAL files were deleted before the replica could read them            Resolution: Recreate the replica from a fresh base backup or use WAL archiving

ERROR: could not write to file "pg_wal/...": No space left on device
          Cause: WALs are retained because the slot is not being consumed                
Resolution: Monitor replication lag, drop unused slots, or fix replica

ERROR: replication slot "<slot>" does not exist                                      Cause: Trying to access a non-existent slot                  
Resolution: Verify the slot name or recreate the replication slot
 
ERROR: Logical replication slot <name> inactive for too long                  Cause: The subscriber hasn't connected in a while                     
Resolution: Investigate the subscriber or consider dropping the slot


Managing Replication Slots:

Create Physical Slot:

SELECT * FROM pg_create_physical_replication_slot('my_physical_slot');

Create Logical Slot:

SELECT * FROM pg_create_logical_replication_slot('my_logical_slot', 'pgoutput');

List All Slots:

SELECT * FROM pg_replication_slots;

Drop a Slot:

SELECT pg_drop_replication_slot('my_slot');


Best Practices:

  1. Monitor Slot Activity: Use pg_replication_slots to ensure slots are being consumed.
  2. Avoid Orphan Slots: Unused slots can fill the disk by retaining WALs. Drop them if not needed.
  3. Configure Monitoring/Alerts: Watch for lagging replicas or inactive slots.
  4. Use Logical Decoding Carefully: Logical slots can grow WALs fast if the subscriber is slow.

A diagram showing how replication slots work in a streaming or logical setup: