On TokuMX Oplog, Tailable Cursors, and Concurrency

In a post last week, I described the difference in concurrency behavior between MongoDB’s oplog and TokuMX’s oplog. In short, here are the key differences:

MongoDB protects access to the oplog with a database level reader/writer lock, whereas TokuMX does not.
TokuMX can write data to the oplog concurrently, whereas MongoDB cannot.
As a result, because a cursor holds the read lock when reading from the MongoDB oplog, the cursor may safely read any and all data available at the moment without risk of missing any data.
TokuMX, on the other hand, needs to be aware of possible “gaps” in the oplog. That is, suppose transactions A, B, and C want to write to the oplog concurrently in that order. However, A and C commit before B writes anything. If a cursor tries to read the oplog at that point in time, it will see transactions A and C, but miss B. At this point, there is a “gap” between A and C in the oplog.

In MongoDB and TokuMX, secondaries use tailable cursors to pull data from the primary to run replication. With MongoDB, the tailable cursor algorithm is simple:

Grab the read lock on the oplog.
Read any new data that exists in the oplog.
If data exists, return, otherwise, sleep until new data is available.

This algorithm works because MongoDB’s cursors do not need to be aware of gaps. This algorithm does not work for TokuMX.

So, how do tailable cursors on the oplog work in TokuMX? That is, how are secondaries able to pull data from the primary without skipping over any gaps? Here is how.

As explained here, each oplog entry has a GTID, which is the _id field in the oplog entry. We have an object, called the GTIDManager whose job on the primary is to do the following:

Provide GTIDs to transactions requesting them.
Keep track of what GTIDs have been handed out.

Transactions and the GTIDManager have the following protocol:

When a transaction is ready to commit and wants to write its operations to the oplog (a process described here), it requests a GTID from the GTIDManager
The GTIDManager provides the GTID, call it GTID ‘A’, and stores ‘A’ in a map of live GTIDs. This map keeps track of what GTIDs have been handed out to transactions but have yet to commit. This step, done very quickly, are protected by a “GTID mutex” in the GTID manager
The tranaction proceeds to write its data to the oplog and commit the transaction
The transaciton then notifies the GTIDManager that GTID ‘A’ has committed (or, if something went wrong, aborted).
The GTIDManager, reacquires the GTID mutex, and removes ‘A’ from its map of live GTIDs

A key point is this. At all times, the GTIDManager knows the list of GTIDs that are in process of committing, and therefore, knows where each possible gap in the oplog may exist.

Tailable cursors that read from the oplog have the following behavior:

When the cursor goes to read data, it first asks the GTIDManager, “what is the minimum live GTID?”
The GTIDManager acquires the GTID mutex, and returns the minimum GTID in its live GTIDs map. If no GTID is live, it returns the next GTID it would hand out. The GTID mutex is then released.
The cursor then proceeds to read all data from the oplog that is less than this GTID.

The important invariant about the value that the GTIDManager gives the tailable cursor is that the cursor knows no gaps exist before this value. Therefore, the cursor can safely read up to this value. Also notice in each of these steps that the GTID mutex is held for a very short time. The mutex is held only to assign GTIDs, update a map, and read from the map. All of these operations are very fast.

And that is how tailable cursors over the oplog were changed to support a more concurrent oplog.

The post On TokuMX Oplog, Tailable Cursors, and Concurrency appeared first on Tokutek.

On TokuMX Oplog, Tailable Cursors, and Concurrency

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112