Merge branch 'master' into reduce-minimum-max-lag

This commit is contained in:
Shlomi Noach 2016-08-30 09:47:33 +02:00
commit 75b2542f26
21 changed files with 368 additions and 10 deletions

View File

@ -29,6 +29,7 @@ In addition, it offers many [operational perks](doc/perks.md) that make it safer
- Dynamic control: you can [interactively](doc/interactive-commands.md) reconfigure `gh-ost`, even as migration still runs. You may forcibly initiate throttling.
- Auditing: you may query `gh-ost` for status. `gh-ost` listens on unix socket or TCP.
- Control over cut-over phase: `gh-ost` can be instructed to postpone what is probably the most critical step: the swap of tables, until such time that you're comfortably available. No need to worry about ETA being outside office hours.
- External [hooks](doc/hooks.md) can couple `gh-ost` with your particular environment.
Please refer to the [docs](doc) for more information. No, really, read the [docs](doc).

View File

@ -2,7 +2,7 @@
#
#
RELEASE_VERSION="1.0.14"
RELEASE_VERSION="1.0.15"
function build {
osname=$1

76
doc/hooks.md Normal file
View File

@ -0,0 +1,76 @@
# Hooks
`gh-ost` supports _hooks_: external processes which `gh-ost` executes at particular points of interest.
Use cases include:
- You wish to be notified by mail when a migration completes/fails
- You wish to be notified when `gh-ost` postpones cut-over (at your demand), thus ready to complete (at your leisure)
- RDS users who wish to `--test-on-replica`, but who cannot have `gh-ost` issue a `STOP SLAVE`, would use a hook to command RDS to stop replication
- Send a status message to your chatops every hour
- Perform cleanup on the _ghost_ table (drop/rename/nibble) once migration completes
- etc.
`gh-ost` defines certain points of interest (event types), and executes hooks at those points.
Notes:
- You may have more than one hook per event type.
- `gh-ost` will invoke relevant hooks _sequentially_ and _synchronously_
- thus, you would generally like the hooks to execute as fast as possible, or otherwise issue tasks in the background
- A hook returning with error code will propagate the error in `gh-ost`. Thus, you are able to force `gh-ost` to fail migration on your conditions.
- Make sure to only return an error code when you do indeed wish to fail the rest of the migration
### Creating hooks
All hooks are expected to reside in a single directory. This directory is indicated by `--hooks-path`. When not provided, no hooks are executed.
`gh-ost` will dynamically search for hooks in said directory. You may add and remove hooks to/from this directory as `gh-ost` makes progress (though likely you don't want to). Hook files are expected to be executable processes.
In an effort to simplify code and to standardize usage, `gh-ost` expects hooks in explicit naming conventions. As an example, the `onStartup` hook expects processes named `gh-ost-on-startup*`. It will match and accept files named:
- `gh-ost-on-startup`
- `gh-ost-on-startup--send-notification-mail`
- `gh-ost-on-startup12345`
- etc.
The full list of supported hooks is best found in code: [hooks.go](https://github.com/github/gh-ost/blob/master/go/logic/hooks.go). Documentation will always be a bit behind. At this time, though, the following are recognized:
- `gh-ost-on-startup`
- `gh-ost-on-validated`
- `gh-ost-on-rowcount-complete`
- `gh-ost-on-before-row-copy`
- `gh-ost-on-status`
- `gh-ost-on-interactive-command`
- `gh-ost-on-row-copy-complete`
- `gh-ost-on-stop-replication`
- `gh-ost-on-begin-postponed`
- `gh-ost-on-before-cut-over`
- `gh-ost-on-success`
- `gh-ost-on-failure`
### Context
`gh-ost` will set environment variables per hook invocation. Hooks are then able to read those variables, indicating schema name, table name, `alter` statement, migrated host name etc. Some variables are available on all hooks, and some are available on relevant hooks.
The following variables are available on all hooks:
- `GH_OST_DATABASE_NAME`
- `GH_OST_TABLE_NAME`
- `GH_OST_GHOST_TABLE_NAME`
- `GH_OST_OLD_TABLE_NAME`
- `GH_OST_DDL`
- `GH_OST_ELAPSED_SECONDS`
- `GH_OST_MIGRATED_HOST`
- `GH_OST_INSPECTED_HOST`
- `GH_OST_EXECUTING_HOST`
- `GH_OST_HOOKS_HINT`
The following variable are available on particular hooks:
- `GH_OST_COMMAND` is only available in `gh-ost-on-interactive-command`
- `GH_OST_STATUS` is only available in `gh-ost-on-status`
### Examples
See [sample hooks](https://github.com/github/gh-ost/tree/master/resources/hooks-sample), as `bash` implementation samples.

View File

@ -80,6 +80,8 @@ type MigrationContext struct {
PostponeCutOverFlagFile string
CutOverLockTimeoutSeconds int64
PanicFlagFile string
HooksPath string
HooksHintMessage string
DropServeSocket bool
ServeSocketFile string
@ -94,6 +96,7 @@ type MigrationContext struct {
InitiallyDropGhostTable bool
CutOverType CutOver
Hostname string
TableEngine string
RowsEstimate int64
RowsDeltaEstimate int64

View File

@ -91,6 +91,9 @@ func main() {
flag.StringVar(&migrationContext.ServeSocketFile, "serve-socket-file", "", "Unix socket file to serve on. Default: auto-determined and advertised upon startup")
flag.Int64Var(&migrationContext.ServeTCPPort, "serve-tcp-port", 0, "TCP port to serve on. Default: disabled")
flag.StringVar(&migrationContext.HooksPath, "hooks-path", "", "directory where hook files are found (default: empty, ie. hooks disabled). Hook files found on this path, and conforming to hook naming conventions will be executed")
flag.StringVar(&migrationContext.HooksHintMessage, "hooks-hint", "", "arbitrary message to be injected to hooks via GH_OST_HOOKS_HINT, for your convenience")
maxLoad := flag.String("max-load", "", "Comma delimited status-name=threshold. e.g: 'Threads_running=100,Threads_connected=500'. When status exceeds threshold, app throttles writes")
criticalLoad := flag.String("critical-load", "", "Comma delimited status-name=threshold, same format as `--max-load`. When status exceeds threshold, app panics and quits")
quiet := flag.Bool("quiet", false, "quiet")
@ -200,6 +203,7 @@ func main() {
migrator := logic.NewMigrator()
err := migrator.Migrate()
if err != nil {
migrator.ExecOnFailureHook()
log.Fatale(err)
}
log.Info("Done")

148
go/logic/hooks.go Normal file
View File

@ -0,0 +1,148 @@
/*
/*
Copyright 2016 GitHub Inc.
See https://github.com/github/gh-ost/blob/master/LICENSE
*/
package logic
import (
"fmt"
"os"
"os/exec"
"path/filepath"
"github.com/github/gh-ost/go/base"
"github.com/outbrain/golib/log"
)
const (
onStartup = "gh-ost-on-startup"
onValidated = "gh-ost-on-validated"
onRowCountComplete = "gh-ost-on-rowcount-complete"
onBeforeRowCopy = "gh-ost-on-before-row-copy"
onRowCopyComplete = "gh-ost-on-row-copy-complete"
onBeginPostponed = "gh-ost-on-begin-postponed"
onBeforeCutOver = "gh-ost-on-before-cut-over"
onInteractiveCommand = "gh-ost-on-interactive-command"
onSuccess = "gh-ost-on-success"
onFailure = "gh-ost-on-failure"
onStatus = "gh-ost-on-status"
onStopReplication = "gh-ost-on-stop-replication"
)
type HooksExecutor struct {
migrationContext *base.MigrationContext
}
func NewHooksExecutor() *HooksExecutor {
return &HooksExecutor{
migrationContext: base.GetMigrationContext(),
}
}
func (this *HooksExecutor) initHooks() error {
return nil
}
func (this *HooksExecutor) applyEnvironmentVairables(extraVariables ...string) []string {
env := os.Environ()
env = append(env, fmt.Sprintf("GH_OST_DATABASE_NAME=%s", this.migrationContext.DatabaseName))
env = append(env, fmt.Sprintf("GH_OST_TABLE_NAME=%s", this.migrationContext.OriginalTableName))
env = append(env, fmt.Sprintf("GH_OST_GHOST_TABLE_NAME=%s", this.migrationContext.GetGhostTableName()))
env = append(env, fmt.Sprintf("GH_OST_OLD_TABLE_NAME=%s", this.migrationContext.GetOldTableName()))
env = append(env, fmt.Sprintf("GH_OST_DDL=%s", this.migrationContext.AlterStatement))
env = append(env, fmt.Sprintf("GH_OST_ELAPSED_SECONDS=%f", this.migrationContext.ElapsedTime().Seconds()))
env = append(env, fmt.Sprintf("GH_OST_MIGRATED_HOST=%s", this.migrationContext.ApplierConnectionConfig.ImpliedKey.Hostname))
env = append(env, fmt.Sprintf("GH_OST_INSPECTED_HOST=%s", this.migrationContext.InspectorConnectionConfig.ImpliedKey.Hostname))
env = append(env, fmt.Sprintf("GH_OST_EXECUTING_HOST=%s", this.migrationContext.Hostname))
env = append(env, fmt.Sprintf("GH_OST_HOOKS_HINT=%s", this.migrationContext.HooksHintMessage))
for _, variable := range extraVariables {
env = append(env, variable)
}
return env
}
// executeHook executes a command, and sets relevant environment variables
// combined output & error are printed to gh-ost's standard error.
func (this *HooksExecutor) executeHook(hook string, extraVariables ...string) error {
cmd := exec.Command(hook)
cmd.Env = this.applyEnvironmentVairables(extraVariables...)
combinedOutput, err := cmd.CombinedOutput()
fmt.Fprintln(os.Stderr, string(combinedOutput))
return log.Errore(err)
}
func (this *HooksExecutor) detectHooks(baseName string) (hooks []string, err error) {
if this.migrationContext.HooksPath == "" {
return hooks, err
}
pattern := fmt.Sprintf("%s/%s*", this.migrationContext.HooksPath, baseName)
hooks, err = filepath.Glob(pattern)
return hooks, err
}
func (this *HooksExecutor) executeHooks(baseName string, extraVariables ...string) error {
hooks, err := this.detectHooks(baseName)
if err != nil {
return err
}
for _, hook := range hooks {
log.Infof("executing %+v hook: %+v", baseName, hook)
if err := this.executeHook(hook, extraVariables...); err != nil {
return err
}
}
return nil
}
func (this *HooksExecutor) onStartup() error {
return this.executeHooks(onStartup)
}
func (this *HooksExecutor) onValidated() error {
return this.executeHooks(onValidated)
}
func (this *HooksExecutor) onRowCountComplete() error {
return this.executeHooks(onRowCountComplete)
}
func (this *HooksExecutor) onBeforeRowCopy() error {
return this.executeHooks(onBeforeRowCopy)
}
func (this *HooksExecutor) onRowCopyComplete() error {
return this.executeHooks(onRowCopyComplete)
}
func (this *HooksExecutor) onBeginPostponed() error {
return this.executeHooks(onBeginPostponed)
}
func (this *HooksExecutor) onBeforeCutOver() error {
return this.executeHooks(onBeforeCutOver)
}
func (this *HooksExecutor) onInteractiveCommand(command string) error {
v := fmt.Sprintf("GH_OST_COMMAND='%s'", command)
return this.executeHooks(onInteractiveCommand, v)
}
func (this *HooksExecutor) onSuccess() error {
return this.executeHooks(onSuccess)
}
func (this *HooksExecutor) onFailure() error {
return this.executeHooks(onFailure)
}
func (this *HooksExecutor) onStatus(statusMessage string) error {
v := fmt.Sprintf("GH_OST_STATUS='%s'", statusMessage)
return this.executeHooks(onStatus, v)
}
func (this *HooksExecutor) onStopReplication() error {
return this.executeHooks(onStopReplication)
}

View File

@ -210,7 +210,7 @@ func (this *Inspector) validateGrants() error {
return nil
}
log.Debugf("Privileges: Super: %t, REPLICATION CLIENT: %t, REPLICATION SLAVE: %t, ALL on *.*: %t, ALL on %s.*: %t", foundSuper, foundReplicationClient, foundReplicationSlave, foundAll, sql.EscapeName(this.migrationContext.DatabaseName), foundDBAll)
return log.Errorf("User has insufficient privileges for migration. Needed: SUPER, REPLICATION SLAVE and ALL on %s.*", sql.EscapeName(this.migrationContext.DatabaseName))
return log.Errorf("User has insufficient privileges for migration. Needed: SUPER|REPLICATION CLIENT, REPLICATION SLAVE and ALL on %s.*", sql.EscapeName(this.migrationContext.DatabaseName))
}
// restartReplication is required so that we are _certain_ the binlog format and

View File

@ -55,8 +55,8 @@ type Migrator struct {
applier *Applier
eventsStreamer *EventsStreamer
server *Server
hooksExecutor *HooksExecutor
migrationContext *base.MigrationContext
hostname string
tablesInPlace chan bool
rowCopyComplete chan bool
@ -109,6 +109,15 @@ func (this *Migrator) acceptSignals() {
}()
}
// initiateHooksExecutor
func (this *Migrator) initiateHooksExecutor() (err error) {
this.hooksExecutor = NewHooksExecutor()
if err := this.hooksExecutor.initHooks(); err != nil {
return err
}
return nil
}
// shouldThrottle performs checks to see whether we should currently be throttling.
// It also checks for critical-load and panic aborts.
func (this *Migrator) shouldThrottle() (result bool, reason string) {
@ -276,7 +285,7 @@ func (this *Migrator) executeAndThrottleOnError(operation func() error) (err err
}
// consumeRowCopyComplete blocks on the rowCopyComplete channel once, and then
// consumers and drops any further incoming events that may be left hanging.
// consumes and drops any further incoming events that may be left hanging.
func (this *Migrator) consumeRowCopyComplete() {
<-this.rowCopyComplete
atomic.StoreInt64(&this.rowCopyCompleteFlag, 1)
@ -369,25 +378,42 @@ func (this *Migrator) countTableRows() (err error) {
log.Debugf("Noop operation; not really counting table rows")
return nil
}
countRowsFunc := func() error {
if err := this.inspector.CountTableRows(); err != nil {
return err
}
if err := this.hooksExecutor.onRowCountComplete(); err != nil {
return err
}
return nil
}
if this.migrationContext.ConcurrentCountTableRows {
go this.inspector.CountTableRows()
log.Infof("As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on")
go countRowsFunc()
// and we ignore errors, because this turns to be a background job
return nil
}
return this.inspector.CountTableRows()
return countRowsFunc()
}
// Migrate executes the complete migration logic. This is *the* major gh-ost function.
func (this *Migrator) Migrate() (err error) {
log.Infof("Migrating %s.%s", sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
this.migrationContext.StartTime = time.Now()
if this.hostname, err = os.Hostname(); err != nil {
if this.migrationContext.Hostname, err = os.Hostname(); err != nil {
return err
}
go this.listenOnPanicAbort()
if err := this.initiateHooksExecutor(); err != nil {
return err
}
if err := this.hooksExecutor.onStartup(); err != nil {
return err
}
if err := this.parser.ParseAlterStatement(this.migrationContext.AlterStatement); err != nil {
return err
}
@ -414,6 +440,11 @@ func (this *Migrator) Migrate() (err error) {
if err := this.inspector.InspectOriginalAndGhostTables(); err != nil {
return err
}
// Validation complete! We're good to execute this migration
if err := this.hooksExecutor.onValidated(); err != nil {
return err
}
if err := this.initiateServer(); err != nil {
return err
}
@ -433,6 +464,9 @@ func (this *Migrator) Migrate() (err error) {
return err
}
go this.initiateThrottler()
if err := this.hooksExecutor.onBeforeRowCopy(); err != nil {
return err
}
go this.executeWriteFuncs()
go this.iterateChunks()
this.migrationContext.MarkRowCopyStartTime()
@ -441,8 +475,14 @@ func (this *Migrator) Migrate() (err error) {
log.Debugf("Operating until row copy is complete")
this.consumeRowCopyComplete()
log.Infof("Row copy complete")
if err := this.hooksExecutor.onRowCopyComplete(); err != nil {
return err
}
this.printStatus(ForcePrintStatusRule)
if err := this.hooksExecutor.onBeforeCutOver(); err != nil {
return err
}
if err := this.cutOver(); err != nil {
return err
}
@ -450,10 +490,19 @@ func (this *Migrator) Migrate() (err error) {
if err := this.finalCleanup(); err != nil {
return nil
}
if err := this.hooksExecutor.onSuccess(); err != nil {
return err
}
log.Infof("Done migrating %s.%s", sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
return nil
}
// ExecOnFailureHook executes the onFailure hook, and this method is provided as the only external
// hook access point
func (this *Migrator) ExecOnFailureHook() (err error) {
return this.hooksExecutor.onFailure()
}
// cutOver performs the final step of migration, based on migration
// type (on replica? bumpy? safe?)
func (this *Migrator) cutOver() (err error) {
@ -477,8 +526,12 @@ func (this *Migrator) cutOver() (err error) {
}
if base.FileExists(this.migrationContext.PostponeCutOverFlagFile) {
// Throttle file defined and exists!
if atomic.LoadInt64(&this.migrationContext.IsPostponingCutOver) == 0 {
if err := this.hooksExecutor.onBeginPostponed(); err != nil {
return true, err
}
}
atomic.StoreInt64(&this.migrationContext.IsPostponingCutOver, 1)
//log.Debugf("Postponing final table swap as flag file exists: %+v", this.migrationContext.PostponeCutOverFlagFile)
return true, nil
}
return false, nil
@ -492,7 +545,7 @@ func (this *Migrator) cutOver() (err error) {
// the same cut-over phase as the master would use. That means we take locks
// and swap the tables.
// The difference is that we will later swap the tables back.
this.hooksExecutor.onStopReplication()
if this.migrationContext.TestOnReplicaSkipReplicaStop {
log.Warningf("--test-on-replica-skip-replica-stop enabled, we are not stopping replication.")
} else {
@ -678,6 +731,10 @@ func (this *Migrator) onServerCommand(command string, writer *bufio.Writer) (err
throttleHint := "# Note: you may only throttle for as long as your binary logs are not purged\n"
if err := this.hooksExecutor.onInteractiveCommand(command); err != nil {
return err
}
switch command {
case "help":
{
@ -887,7 +944,7 @@ func (this *Migrator) printMigrationStatusHint(writers ...io.Writer) {
fmt.Fprintln(w, fmt.Sprintf("# Migrating %+v; inspecting %+v; executing on %+v",
*this.applier.connectionConfig.ImpliedKey,
*this.inspector.connectionConfig.ImpliedKey,
this.hostname,
this.migrationContext.Hostname,
))
fmt.Fprintln(w, fmt.Sprintf("# Migration started at %+v",
this.migrationContext.StartTime.Format(time.RubyDate),
@ -1043,6 +1100,10 @@ func (this *Migrator) printStatus(rule PrintStatusRule, writers ...io.Writer) {
)
w := io.MultiWriter(writers...)
fmt.Fprintln(w, status)
if elapsedSeconds%60 == 0 {
this.hooksExecutor.onStatus(status)
}
}
// initiateHeartbeatReader listens for heartbeat events. gh-ost implements its own

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-before-cut-over
echo "$(date) gh-ost-on-before-cut-over $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-before-row-copy
echo "$(date) gh-ost-on-before-row-copy $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-begin-postponed
echo "$(date) gh-ost-on-begin-postponed $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-failure
echo "$(date) gh-ost-on-failure $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME; ghost: $GH_OST_OLD_TABLE_NAME" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-interactive-command
echo "$(date) gh-ost-on-interactive-command $GH_OST_COMMAND" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-row-copy-complete
echo "$(date) gh-ost-on-row-copy-complete $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-rowcount-complete
echo "$(date) gh-ost-on-rowcount-complete $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-startup
echo "$(date) gh-ost-on-startup $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-status
echo "$(date) gh-ost-on-status; elapsed: ${GH_OST_ELAPSED_SECONDS}; msg: ${GH_OST_STATUS}" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-stop-replication
echo "$(date) gh-ost-on-stop-replication $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME $GH_OST_MIGRATED_HOST" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-success
echo "$(date) gh-ost-on-success $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME" >> /tmp/gh-ost.log

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-success
echo "$(date) gh-ost-on-success $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME -- this message should show on the gh-ost log"

View File

@ -0,0 +1,5 @@
#!/bin/bash
# Sample hook file for gh-ost-on-validated
echo "$(date) gh-ost-on-validated $GH_OST_DATABASE_NAME.$GH_OST_TABLE_NAME" >> /tmp/gh-ost.log