I have a table in my PostgreSQL (actually its mulitple tables, but for the sake of simplicity lets assume its just one) and multiple clients that periodically need to query the table to find changed items. These are updated or inserted items (deleted items are handled by first marking them for deletion and then actually deleting them after a grace period).
Now the obvious solution would be to just keep a “modified” timestamp column for each row, remember it for each each client and then simply fetch the changed ones
SELECT * FROM the_table WHERE modified > saved_modified_timestamp;
The modified column would then be kept up to date using triggers like:
CREATE FUNCTION update_timestamp() RETURNS trigger LANGUAGE ‘plpgsql’ AS $$ BEGIN NEW.modified = NOW(); RETURN NEW; END; $$; CREATE TRIGGER update_timestamp_update BEFORE UPDATE ON the_table FOR EACH ROW EXECUTE PROCEDURE update_timestamp(); CREATE TRIGGER update_timestamp_insert BEFORE INSERT ON the_table FOR EACH ROW EXECUTE PROCEDURE update_timestamp();
The obvious problem here is that
NOW() is the time the transation started. So it might happen that a transaction is not yet commited while fetching the updated rows and when its commited, the timestamp is lower than the saved_modified_timestamp, so the update is never registered.
I think I found a solution that would work and I wanted to see if you can find any flaws with this approach.
The basic idea is to use xmin (or rather
txid_current()) instead of the timestamp and then when fetching the changes, wrap them in an explicit transaction with
REPEATABLE READ and read
txid_snapshot() (or rather the three values it contains
txid_snapshot_xip()) from the transaction.
If I read the postgres documentation correctly, then all changes made transactions that are <
txid_snapshot_xmax() and not in
txid_snapshot_xip() should be returned in that fetch transaction. This information should then be all that is required to get all the update rows when fetching again. The select would then look like this, with
xmin_version replacing the
SELECT * FROM the_table WHERE xmin_version >= last_fetch_txid_snapshot_xmax OR xmin_version IN last_fetch_txid_snapshot_xip;
The triggers would then be simply like this:
CREATE FUNCTION update_xmin_version() RETURNS trigger LANGUAGE ‘plpgsql’ AS $$ BEGIN NEW.xmin_version = txid_current(); RETURN NEW; END; $$; CREATE TRIGGER update_timestamp_update BEFORE UPDATE ON the_table FOR EACH ROW EXECUTE PROCEDURE update_xmin_version(); CREATE TRIGGER update_timestamp_update_insert BEFORE INSERT ON the_table FOR EACH ROW EXECUTE PROCEDURE update_xmin_version();
Would this work? Or am I missing something?
Thank you for the clarification about the 64-bit return from
txid_current() and how the epoch rolls over. I am sorry I confused that epoch counter with the time epoch.
I cannot see any flaw in your approach but would verify through experimentation that having multiple client sessions concurrently in repeatable read transactions taking the
txid_snapshot_xip() snapshot does not cause any problems.
I would not use this method in practice because I assume that the client code will need to solve for handling duplicate reads of the same change (insert/update/delete) as well as periodic reconciliations between the database contents and the client's working set to handle drift due to communication failures or client crashes. Once that code is written, then using
now() in the client tracking table,
clock_timestamp() in the triggers, and a grace interval overlap when the client pulls changesets would work for use cases I have encountered.
If requirements called for stronger real-time integrity than that, then I would recommend a distributed commit strategy.
Ok, so I've tested it in depth now and haven't found any flaws so far. I had around 30 clients writing and reading to the database at the same time and all of them got consistent updates. So I guess this approach works.