I have been having a problem with my redundant PostgreSQL setup over the last couple of weeks. I run two Postgres servers with a WAL Logging Warm Standby system using pg_standby from 8.3. When the master server is under heavy load and generating a lot of WAL traffic the secondary server has been tripping and going active.
After checking all the logs and putting pg_standby in debug mode there was still no clue as to why this was happening.
The archive_command on my master server was using
cp -i "%p" /var/lib/pgsql/slave/pg_logrestore/"%f"
I have read somewhere (I can’t remember where or when) that when you use cp to copy the WAL log the file appears in the destination without it being a complete copy so this got me to wondering if this was causing my problem. Knowing that RSync will use a temp file when copying and then rename the file once completed I thought I would give that a go.
My new archive_command is
rsync -q "%p" /var/lib/pgsql/slave/pg_logrestore/"%f"
Since making this change the standby server hasn’t triggered without reason and the problem appears to be fixed.