-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Multithreaded replication WIP #1454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Multiple `TableMap` events can happen inside of a single transaction, so we can't really use them to filter out events early or anything like that. Instead, we'll just skip them. Also adds some basic test cases for the transaction streaming.
Co-authored-by: Daniel Joos <[email protected]>
This reverts commit 2e78f6f.
* Remove error return value since we don't use it. * Lock the mutex whenever we plan to update the low watermark to avoid a race condition. * Check for data races in our unit tests. * Still return an error from ProcessEventsUntilDrained but actually check it in our code. * Make coordinator_test.go to check the err from ProcessEventsUntilDrained again * Remove unreachable return in ProcessEventsUntilDrained
…ark (#1531) * Notify waiting channels on completed transaction, not just the watermark. * Add checksum validation to coordinator test * Use errgroup to perform transactions concurrently in coordinator_test.go * Configure concurrency separate from total number of transactions. * Run similar number of txs to previous test and ignore context. * Have at least 1 child in a transaction. * Notify waiting channels for the current sequence number.
Despite promising performance results in testing, we stopped developing this branch since Nov 2024 after running into intermittent data inconsistency problems in internal replica tests. I believe I've tracked down the source of this issue. Below is the investigation for anyone interested. InvestigationThe data inconsistency appeared intermittently on several different ghost testing replicas running the MTR version. The test table looks like this: CREATE TABLE `sbtest1` (
`id` int NOT NULL AUTO_INCREMENT,
`k` int NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
); The testing result is checksum mismatch (usually only one row). In this case the row with -- 2025-04-12T03:28:44.209273Z 25 Execute
BEGIN;
DELETE FROM sbtest1 WHERE id=5025;
INSERT INTO sbtest1 (id, k, c, pad) VALUES (5025, 5046, '55585975399-51936995975-90609908571-88981758242-41639509045-49015163211-63909390173-09873895014-17528416149-59787710722', '90699347551-90936038435-69760642136-45340328341-67205199431');
COMMIT;
-- 2025-04-12T03:28:44.209695Z 28 Execute
UPDATE sbtest1 SET k=k+1 WHERE id=5025; And corresponding binlog events on the replica
So, the correct value is We can look at the dependency (sub)graph for the original transactions on graph LR;
89058--> 89053;
89053--> 89050;
89065--> 89062;
89062--> 89053;
This means once transaction Comparing to what the MySQL replication applier coordinator does (sql/rpl_rli_pbd.cc), I realized that a transaction should be scheduled if and only if The In the example, FixIn our Coordinator, the culprit is this line in gh-ost/go/logic/coordinator.go Lines 482 to 484 in ffef446
In the example it allowed 89065 to be applied after 89062 , but before 89058 completed.
After removing these lines the sysbench localtest is consistently passing. |
Description
This PR introduces multi-threaded replication for applying DML queries to the ghost table. The goal is to be able to migrate tables with high rate of DML queries (e.g. >5k rows/s). Currently gh-ost lags behind in these situations, taking a very long time to complete or not completing at all.
Similar to MySQL replication threads, gh-ost will stream binlog events from the source and group them into transactions. It then submits the transactions to a pool of workers to apply the transactions concurrently on the ghost table. We ensure that dependent transactions are applied in a consistent order (equivalent to MySQL multi-threaded replication with
replica_parallel_type=LOGICAL_CLOCK
andreplica_preserve_commit_order=0
).With
WRITESET
enabled on the source, this enables a great amount of parallelism in the transaction applier.Changes
TODO
Performance tests
TODO
script/cibuild
returns with no formatting errors, build errors or unit test errors.