We always need both values, except in a test, so we don't need to lock
twice and risk scheduling in between.
Also, removed the resetting in Done. This copied a mutex, which isn't
allowed. Static analyzers tend to trip over that.
The counter value needs to be aligned to 64 bit in memory for the
atomic functions to work on some platform (such as 32 bit ARM).
The atomic package says in its documentation:
> These functions require great care to be used correctly. Except for
> special, low-level applications, synchronization is better done with
> channels or the facilities of the sync package.
This commit replaces the atomic functions with a simple sync.Mutex, so
we don't have to care about alignment.