gh-ost/doc/subsecond-lag.md

20 lines
1.5 KiB
Markdown
Raw Permalink Normal View History

# Sub-second replication lag throttling
2016-09-01 11:44:30 +00:00
`gh-ost` is able to utilize sub-second replication lag measurements.
At GitHub, small replication lag is crucial, and we like to keep it below `1s` at all times.
2016-09-01 11:44:30 +00:00
`gh-ost` will do sub-second throttling when `--max-lag-millis` is smaller than `1000`, i.e. smaller than `1sec`.
Replication lag is measured on:
- The "inspected" server (the server `gh-ost` connects to; replica is desired but not mandatory)
- The `throttle-control-replicas` list
2016-12-27 07:15:32 +00:00
In both cases, `gh-ost` uses an internal heartbeat mechanism. It injects heartbeat events onto the utility changelog table, then reads those entries on replicas, and compares times. This measurement is on by default and by definition supports sub-second resolution.
2016-09-01 11:44:30 +00:00
You can explicitly define how frequently will `gh-ost` inject heartbeat events, via `heartbeat-interval-millis`. You should set `heartbeat-interval-millis <= max-lag-millis`. It still works if not, but loses granularity and effect.
2017-01-04 06:30:23 +00:00
In earlier versions, the `--throttle-control-replicas` list was subjected to `1` second resolution or to 3rd party heartbeat injections such as `pt-heartbeat`. This is no longer the case. The argument `--replication-lag-query` has been deprecated and is no longer needed.
2016-09-01 11:44:30 +00:00
2016-12-27 07:16:27 +00:00
Our production migrations use sub-second lag throttling and are able to keep our entire fleet of replicas well below `1sec` lag. We use `--heartbeat-interval-millis=100` on our production migrations with a `--max-lag-millis` value of between `300` and `500`.