From 115702716146c07e824c99d5d77360a02c558d11 Mon Sep 17 00:00:00 2001 From: Shlomi Noach Date: Sun, 8 Jan 2017 09:46:12 +0200 Subject: [PATCH] documenting --dml-batch-size --- RELEASE_VERSION | 2 +- doc/command-line-flags.md | 11 +++++++++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/RELEASE_VERSION b/RELEASE_VERSION index 15245f3..ffcbe71 100644 --- a/RELEASE_VERSION +++ b/RELEASE_VERSION @@ -1 +1 @@ -1.0.32 +1.0.34 diff --git a/doc/command-line-flags.md b/doc/command-line-flags.md index d7bc97b..83bfe05 100644 --- a/doc/command-line-flags.md +++ b/doc/command-line-flags.md @@ -65,6 +65,17 @@ At this time (10-2016) `gh-ost` does not support foreign keys on migrated tables See also: [`skip-foreign-key-checks`](#skip-foreign-key-checks) + +### dml-batch-size + +`gh-ost` reads event from the binary log and applies them onto the _ghost_ table. It does so in batched writes: grouping multiple events to apply in a single transaction. This gives better write throughput as we don't need to sync the transaction log to disk for each event. + +The `--dml-batch-size` flag controls the size of the batched write. Allowed values are `1 - 100`, where `1` means no batching (every event from the binary log is applied onto the _ghost_ table on its own transaction). Default value is `10`. + +Why is this behavior configurable? Different workloads have different characteristics. Some workloads have very large writes, such that aggregating even `50` writes into a transaction makes for a significant transaction size. On other workloads write rate is high such that one just can't allow for a hundred more syncs to disk per second. The default value of `10` is a modest compromise that should probably work very well for most workloads. Your mileage may vary. + +Noteworthy is that setting `--dml-batch-size` to higher value _does not_ mean `gh-ost` blocks or waits on writes. The batch size is an upper limit on transaction size, not a minimal one. If `gh-ost` doesn't have "enough" events in the pipe, it does not wait on the binary log, it just writes what it already has. This conveniently suggests that if write load is light enough for `gh-ost` to only see a few events in the binary log at a given time, then it is also light neough for `gh-ost` to apply a fraction of the batch size. + ### exact-rowcount A `gh-ost` execution need to copy whatever rows you have in your existing table onto the ghost table. This can, and often be, a large number. Exactly what that number is?