Wiki source for SlonyTrouble
====Slony Trouble====
If updates aren't going through, check that external hosts can connect through ssh tunnel
%%telnet 127.0.0.1 20001
psql -h 127.0.0.1 -p20001 -Upgsql pvsadmin%%
If the connection hangs, then tunnel probably needs to be restarted from poprocks
%%systemctl restart tunnel-out%%
monitor slon progress by tailing logs
%%tail -F /var/lib/pgsql/9.2/data/pg_log/postgresql-* | grep slon%%
Check current slon progress. Each needs to run on their respective master DBs.
%%SELECT * FROM _admin_cluster.sl_status;
SELECT * FROM _whitefront_cluster.sl_status;%%
When at a total loss, try restarting slon, it can't hurt...
%%systemctl restart slony_admin
systemctl restart slony_whitefront%%
===== Unverified =====
Use the 'SYNC # processing' to gage where the slave is at, compare to _admin_cluster.sl_event.ev_seqno. This is a simple counter.
If slony failed while doing a slonik command involving ALTER/DROP TABLE, then check for deadlocks in /var/log/messages. If deadlocks are present, restart postgresql on deadlocked host. It's not a great idea to do DDL type commands while postgresql is doing a lot of write traffic, because of how slony messes with internals, so save such commands after hours or with admin down.
Continued log entries like "Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated" indicate that something is holding up the log switch - logs will not switch unless the nodes are in sync. If this has happened right after a slony upgrade, then make sure slony functions on all nodes have been updated. Otherwise, check the above (tunnel, logs, etc.)
----
CategoryITMisc
If updates aren't going through, check that external hosts can connect through ssh tunnel
%%telnet 127.0.0.1 20001
psql -h 127.0.0.1 -p20001 -Upgsql pvsadmin%%
If the connection hangs, then tunnel probably needs to be restarted from poprocks
%%systemctl restart tunnel-out%%
monitor slon progress by tailing logs
%%tail -F /var/lib/pgsql/9.2/data/pg_log/postgresql-* | grep slon%%
Check current slon progress. Each needs to run on their respective master DBs.
%%SELECT * FROM _admin_cluster.sl_status;
SELECT * FROM _whitefront_cluster.sl_status;%%
When at a total loss, try restarting slon, it can't hurt...
%%systemctl restart slony_admin
systemctl restart slony_whitefront%%
===== Unverified =====
Use the 'SYNC # processing' to gage where the slave is at, compare to _admin_cluster.sl_event.ev_seqno. This is a simple counter.
If slony failed while doing a slonik command involving ALTER/DROP TABLE, then check for deadlocks in /var/log/messages. If deadlocks are present, restart postgresql on deadlocked host. It's not a great idea to do DDL type commands while postgresql is doing a lot of write traffic, because of how slony messes with internals, so save such commands after hours or with admin down.
Continued log entries like "Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated" indicate that something is holding up the log switch - logs will not switch unless the nodes are in sync. If this has happened right after a slony upgrade, then make sure slony functions on all nodes have been updated. Otherwise, check the above (tunnel, logs, etc.)
----
CategoryITMisc