killasgroup will ensure that all child processes die even if gunicorn
fails to do it. This isn't truly required because stopwaitsecs and
gunicorn graceful timeout work fine. In any case, if supervisor has to
send SIGKILL, it better send it to entire process group.
Dev dependencies are not installed if developer_mode is not enabled.
When creating a new bench it's not possible to modify this config
upfront, so offer a flag to do it instead.
Note:
- Disabled by default.
- Very few people need this, those who write and run tests locally primarily.
- You mostly should not do this in CI, do `bench setup requirements --dev` explicitly instead.
Problem:
- Prerequisite: long running reqeust is going on.
- `bench restart` is requested.
- supervisord sends sigterm to gunicorn master process.
- gunicorn passes it to all child table and waits for graceful shutdown (30 seconds)
- supervisord has no chill, and sends sigkill to master process in 10 seconds
- gunicorn master proesss is dead, so now child workers will keep running until they complete request which can take a really long time.
- This entire time the sites will be down.
Fix:
- Explicitly encode default graceful_timeout in config - 30 seconds.
- Make supervisor wait 10 more seconds for gunicorn to do its thing, then only send sigkill.
WARNING: Just a POC. You're free to modify your setup to test this out.
Summary:
- After this change bench wont generate config for redis_socketio
instance.
- Instead we set the same port as redis_cache service for use in
socketio. i.e. both will share same instance.
Code changes:
- Config will have this difference:
```diff
- "redis_socketio": "redis://localhost:12000",
- "redis_cache": "redis://localhost:13000",
+ "redis_socketio": "redis://localhost:13000",
+ "redis_cache": "redis://localhost:13000",
```
- supervisord/systemd wont start redis_socketio service
- for development setups: redis_socketio service is removed from procfile.
Why?
- Currently this redis instance is used only to ship messages from python
to nodejs server... which is largely stateless (only pub/sub channels)
- This change reduces 1 running process. Less resources for this ~= more
resources for other running things.
TODO:
- [ ] decicisions - should we even do this? this should be optinal?
- [ ] refactor progressbar in frappe
- [ ] Tests - manual, FC, docker, automated
- [ ] "migration" plan
All dependent deployment projects will likely have to refactor their code to drop this service, shouldn't be too much work.
Example: Frappe cloud, Frappe Docker.
* fix: Give more meaningful context in subproc failures
* fix: Handle supervisor escalation if no exc is raised
* fix: only apply sudo if not already running as sudo