These changes make to possible to run:
tutor mounts add /path/to/my-xblock
The xblock directory with then be auto-magically bind-mounted in the
"openedx" image at build time, and the lms*/cms* containers at run time.
This makes it effectively possible to work as a developer on
edx-platform requirements.
We take the opportunity to move some openedx-specific code to a
dedicated module.
Close https://github.com/openedx/wg-developer-experience/issues/177
MySQL 8 defaults to a binlog expiry period of 2592000 seconds
(30 days), which for Tutor/Open edX purposes can be considered
excessive.
On the one hand, it is unlikely that a MySQL server configured for
Tutor uses MySQL replication at all (considering that up until Tutor
15 and MySQL 5.7, the binlog was disabled by default, rendering
replication impossible). Even if it does, a replica lagging more than
two days behind the primary server would be unacceptable.
Likewise, it is unlikely that an Open edX database is backed up less
than once a day, thus is is unlikely that Open edX admins would
benefit from the ability to do point-in-time restore over a 30-day
period.
On the other hand, having a 30-day binlog expiry period can
considerably increase the storage space requirements for the MySQL
container, particularly on busy Open edX platforms. When left
unchecked, this can even cause the MySQL container to run into "No
space left on device" situations, disabling the MySQL database
altogether. Thus, the MySQL default settings are likely to be a net
disadvantage for Open edX admins.
Finally, all of the above considerations apply only if the Open edX
administrator has chosen to run their own MySQL and not opted for a
DBaaS solution like AWS RDS.
Thus, it should be acceptable to run with a reduced binlog expiry
period of 3 days (rather than 30) by default.
Therefore, inject the --binlog-expire-logs-seconds=259200 argument
into the Tutor-generated command to start mysqld.
Reference:
https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_expire_logs_seconds
This fix is for a rather serious issue that affects users who upgrade
from Olive to Palm. The client mysql charset and collation was
incorrectly set to utf8mb4, while the server stil runs utf8mb3. Only
users who run the mysql container are affected.
To resolve this issue, we explicitely configure the client to use the
utf8mb3 charset/collation.
Important note: users who have somehow managed to upgrade from olive to
Palm before may find themselves in an undefined state. They might have
to fix their mysql data manually. Same thing for users who launched Palm
from scratch; although, according to my preliinary tests, they should be
able to downgrade their connection from utf8mb4 to utf8mb3 without
issue.
In addition, we upgrade to mysql 8.1.0. Among many other fixes, this
avoids a server restart after the upgrade:
> An in-place upgrade from MySQL 5.7 to MySQL 8.0, without a server
> restart, could result in unexpected errors when executing queries on
> tables. This fix eliminates the need to restart the server between the
> upgrade and queries. (Bug #35410528)
https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-34.html
See also the 8.1.0 release notes:
https://dev.mysql.com/doc/relnotes/mysql/8.1/en/news-8-1-0.html
Close #887.
This is an important change, where we get remove the previous `--mount`
option, and instead opt for persistent bind-mounts.
Persistent bind mounts have several advantages:
- They make it easier to remember which folders need to be bind-mounted.
- Code is *much* less clunky, as we no longer need to generate temporary
docker-compose files.
- They allow us to bind-mount host directories *at build time* using the
buildx `--build-context` option.
- The transition from development to production becomes much easier, as
images will automatically be built using the host repo.
The only drawback is that persistent bind-mounts are slightly less
portable: when a config.yml file is moved to a different folder, many
things will break if the repo is not checked out in the same path.
For instance, this is how to start working on a local fork of
edx-platform:
tutor config save --append MOUNTS=/path/to/edx-platform
And that's all there is to it. No, this fork will be used whenever we
run:
tutor images build openedx
tutor local start
tutor dev start
This change is made possible by huge improvements in the build time
performance. These improvements make it convenient to re-build Docker
images often.
Related issues:
https://github.com/openedx/wg-developer-experience/issues/71https://github.com/openedx/wg-developer-experience/issues/66https://github.com/openedx/wg-developer-experience/issues/166
It was useless to create a *-permissions job for every application.
Instead, we create a single "permissions" service. It can be extended
via the "docker-compose-permissions-command" patch.
- 💥 [Feature] Upgrade to Nutmeg: (by @regisb)
- 💥 [Feature] Persistent grades are now enabled by default.
- [Bugfix] Remove edX references from bulk emails ([issue](https://github.com/openedx/build-test-release-wg/issues/100)).
- [Improvement] For Tutor Nightly (and only Nightly), official plugins are now installed from their nightly branches on GitHub instead of a version range on PyPI. This will allow Nightly users to install all official plugins by running ``pip install -e ".[full]"``.
- [Bugfix] Start MongoDB when running migrations, because a new data migration fails if MongoDB is not running
The `--mount` option is available both with `tutor local`
and `tutor dev` commands. It allows users to easily bind-mount containers from
the host to containers. Yes, I know, we already provide that possibility with
the `bindmount` command and the `--volume=/path/` option. But these suffer from
the following drawbacks:
- They are difficult to understand.
- The "bindmount" command name does not make much sense.
- It's not convenient to mount an arbitrary folder from the host to multiple
containers, such as the many lms/cms containers (web apps, celery workers and
job runners).
To address this situation, we now recommend to make use of --mount:
1. `--mount=service1[,service2,...]:/host/path:/container/path`: manually mount
`/host/path` to `/container/path` in container "service1" (and "service2").
2. `--mount=/host/path`: use the new v1 plugin API to discover plugins that
will detect this option and select the right containers in which to bind-mount
volumes. This is really nifty...
Close https://github.com/overhangio/2u-tutor-adoption/issues/43
The entrypoint in the "openedx" Docker image was used only to define the
DJANGO_SETTINGS_MODULE environment variable, based on SERVICE_VARIANT and
SETTINGS. We ditch SETTINGS in favour of defining explicitely
DJANGO_SETTINGS_MODULE.
The problem with the Docker entrypoint is that it was bypassed whenever we ran
`tutor local exec` or `tutor k8s exec`. By removing it we make it simpler for
end-users to run manage.py commands in kubernetes.
Previously, it was possible to override settings by defining the
TUTOR_EDX_PLATFORM_SETTINGS environment variable. But let's face it:
- It was not very well supported.
- It was poorly explained.
- It was not very useful.
- It causes unnecessary code complexity.
For these reasons, we drop that feature.
In theory, we can assign ownership of mysql data to just any user. But in
Lilac, mysql was running with user 999. When upgrading to Maple, on Kubernetes,
the fsGroupChangePolicy was causing a change of the data *group* (to 1000) but
not of the user. This was causing a crash with the following error:
[ERROR] InnoDB: The error means mysqld does not have the access rights to the directory.
Forum is an optional feature, and as such it deserves its own plugin. Starting
from Maple, users will be able to install the forum from
https://github.com/overhangio/tutor-forum/
Close #450.
With this change, containers are no longer run as "root" but as unprivileged
users. This is necessary in some environments, notably some Kubernetes
clusters.
To make this possible, we need to manually fix bind-mounted volumes in
docker-compose. This is pretty much equivalent to the behaviour in Kubernetes,
where permissions are fixed at runtime if the volume owner is incorrect. Thus,
we have a consistent behaviour between docker-compose and Kubernetes.
We achieve this by bind-mounting some repos inside "*-permissions" services.
These services run as root user on docker-compose and will fix the required
permissions, as per build/permissions/setowner.sh These services simply do not
run on Kubernetes, where we don't rely on bind-mounted volumes. There, we make
use of Kubernete's built-in volume ownership feature.
With this change, we get rid of the "openedx-dev" Docker image, in the sense
that it no longer has its own Dockerfile. Instead, the dev image is now simply
a different target in the multi-layer openedx Docker image. This makes it much
faster to build the openedx-dev image.
Because we declare the APP_USER_ID in the dev/docker-compose.yml file, we need
to pass the user ID from the host there. The only way to achieve that is with a
tutor config variable. The downside of this approach is that the
dev/docker-compose.yml file is no longer portable from one machine to the next.
We consider that this is not such a big issue, as it affects the development
environment only.
We take this opportunity to replace the base image of the "forum" image. There
is now no need to re-install ruby inside the image. The total image size is
only decreased by 10%, but re-building the image is faster.
In order to run the smtp service as non-root, we switch from namshi/smtp to
devture/exim-relay. This change should be backward-compatible.
Note that the nginx container remains privileged. We could switch to
nginxinc/nginx-unprivileged, but it's probably not worth the effort, as we are
considering to get rid of the nginx container altogether.
Close #323.
In config.yml the new value FORUM_MONGO_DB_DATABASE was added with `cs_comments_service` as default value.
In docker-entrypoint.sh of forum I changed the hardcoded `cs_commecnts_service` with the new config value.
Multiple .yml files changed to handle the new config value.
lms-worker was configured to run CMS tasks instead of LMS tasks. I'm not
sure what tasks were being dismissed, and what is the actual production
impact.
Redis data was not actually persisted, because the redis configuration file was
not mounted from the right location. In order to mount redis data in a
host-mounted directory, the working directory has to be properly set.
The problem was occurring both with docker-compose and Kubernetes.
Close #404.
First, allow using custom Django settings on a development
environment (as documented but not implemented), setting it to the
correct value of `tutor.development`. Prior to this, `tutor dev
runserver lms` would default to `tutor.production` when on a custom edX
branch.
Second, fix the documentation so the correct environment variable is
described, at the same time removing an option that doesn't seem to work.
See discussion: https://discuss.overhang.io/t/koa-dev-lms-doesnt-find-static-content/1250
- 💥[Improvement] Upgrade Open edX to Koa
- 💥 Setting changes:
- The ``ACTIVATE_HTTPS`` setting was renamed to ``ENABLE_HTTPS``.
- Other ``ACTIVATE_*`` variables were all renamed to ``RUN_*``.
- The ``WEB_PROXY`` setting was removed and ``RUN_CADDY`` was added.
- The ``NGINX_HTTPS_PORT`` setting is deprecated.
- Architectural changes:
- Use Caddy as a web proxy for automated SSL/TLS certificate generation:
- Nginx no longer listens to port 443 for https traffic
- The Caddy configuration file comes with a new ``caddyfile`` patch for much simpler SSL/TLS management.
- Configuration files for web proxies are no longer provided.
- Kubernetes deployment no longer requires setting up a custom Ingress resource or custom manager.
- Gunicorn and Whitenoise are replaced by uwsgi: this increases boostrap performance and makes it no longer necessary to mount media folders in the Nginx container.
- Replace memcached and rabbitmq by redis.
- Additional features:
- Make it possible to disable all plugins at once with ``plugins disable all``.
- Add ``tutor k8s wait`` command to wait for a pod to become ready
- Faster, more reliable static assets with local memory caching
- Deprecation: proxy files for Apache and Nginx are no longer provided out of the box.
- Removed plugin `{{ patch (...) }}` statements:
- "https-create", "k8s-ingress-rules", "k8s-ingress-tls-hosts": these are no longer necessary. Instead, declare your app in the "caddyfile" patch.
- "local-docker-compose-nginx-volumes": this patch was primarily used to serve media assets. The recommended is now to serve assets with uwsgi.
Previously, it was not possible to override the docker registry for just
one or a few services. Setting the DOCKER_REGISTRY configuration
parameter would apply to all images. This was inconvenient. To resolve
this, we include the docker registry value in the DOCKER_IMAGE_*
configuration parameters. This allows users to override the docker
registry individually by defining the DOCKER_IMAGE_SERVICENAME
configuration parameter.
See https://discuss.overhang.io/t/kubernetes-ci-cd-pipeline/765/3
In production, the ALTERNATE_WORKER_QUEUES setting is overridden by ""
(empty string). This might prevent LMS from sending tasks to the CMS. We
have not seen this issue emerge yet, but better be safe than sorry.
We must be careful not to process the tasks from the CMS, just like for
the CMS worker which does not process the tasks from the LMS.
Half of the tasks from edx.lms.core.default celery queue were being
processed by the CMS worker. Unfortunately, this CMS worker crashes on
some of those tasks. For instance, activation emails complain of a
missing "django_markup" template tag library because "xss_utils" is not
part of the installed app in the CMS.
The problem is that we need this edx.lms.core.default queue to be part
of the CELERY_QUEUES in the cms in order to send tasks from the CMS to
the LMS. The trick to resolve this situation is to ask the CMS celery
worker to not process the tasks from this queue.
To debug this issue, run in the LMS:
from student.tasks import send_activation_email
send_activation_email("{}")
Then watch the logs of the lms and cms workers. If the CMS workers picks
up this task (50% of the time prior to this change) then we have an
issue.
See:
https://discuss.overhang.io/t/reset-password-email-sent-but-activation-email-dont/690
Here, we upgrade the Open edX platform from Ironwood to Juniper. This
upgrade does not come with many feature changes, but there are many
technical improvements under the hood:
- Upgrade from Python 2.7 to 3.5
- Upgrade from Mongodb v3.2 to v3.6
- Upgrade Ruby to 2.5.7
We took the opportunity to completely rething the way locally running
platforms should be accessed for testing purposes. It is no longer
possible to access a running platform from http://localhost and
http://studio.localhost. Instead, users should access
http://local.overhang.io and https://studio.local.overhang.io. This
drastically simplifies internal communication between Docker containers.
To upgrade, users should simply run:
tutor local quickstart
For Kubernetes platform, the upgrade process is outlined when running:
tutor k8s upgrade --from=ironwood
Running jobs was previously done with "exec". This was because it
allowed us to avoid copying too much container specification information
from the docker-compose/deployments files to the jobs files. However,
this was limiting:
- In order to run a job, the corresponding container had to be running.
This was particularly painful in Kubernetes, where containers are
crashing as long as migrations are not correctly run.
- Containers in which we need to run jobs needed to be present in the
docker-compose/deployments files. This is unnecessary, for example when
mysql is disabled, or in the case of the certbot container.
Now, we create dedicated jobs files, both for local and k8s deployment.
This introduces a little redundancy, but not too much. Note that
dependent containers are not listed in the docker-compose.jobs.yml file,
so an actual platform is still supposed to be running when we launch the
jobs.
This also introduces a subtle change: now, jobs go through the container
entrypoint prior to running. This is probably a good thing, as it will
avoid forgetting about incorrect environment variables.
In k8s, we find ourselves interacting way too much with the kubectl
utility. Parsing output from the CLI is a pain. So we need to switch to
the native kubernetes client library.
Because we are running a version of elasticsearch older than Methusalem,
the docker environment variables were not properly taken into account.
For instance, the cluster name and "mlockall" settings were incorrect,
as we could see by running:
$ tutor local run lms curl elasticsearch:9200 | grep cluster_name
...
"cluster_name" : "elasticsearch",
$ tutor local run lms curl elasticsearch:9200/_nodes/process?pretty | grep mlock
...
"mlockall" : false
See
https://discuss.overhang.io/t/elastic-container-is-not-being-removed/312/3
for discussion.
This fix also introduces a new tutor configuration setting to adjust the
elasticsearch heap size.
This has an impact on plugin hooks. Plugin hooks that needed to run
inside mysql-client now need to run inside mysql container. This
simplifies the deployment, as we no longer have an empty mysql-client
container sitting around.
When mysql is not enabled (ACTIVATE_MYSQL=False) the mysql container is
simply a mysql client.
Video transcripts uploaded in the CMS were not visible in the LMS. This
was a symptom caused by the fact that the LMS and the CMS do not share
the same MEDIA_ROOT. We initially thought that data uploaded in the CMS
(such as transcripts) was stored in a shared data service, such as
mongodb. It is, in fact, not. This makes it even more important to run
an object storage service like minio for distributed services.
Close #229
This commit introduces many changes:
- a fully functional minio plugin for local installation
- an almost-functional native k8s deployment
- a new way to process configuration, better suited to plugins
There are still many things to do:
- get rid of all the TODOs
- get a fully functional minio plugin for k8s
- add documentation for pluginso
- ...
This is especially useful on Kubernetes. With this change, the forum
container no longer crashes whenever mongodb or elasticsearch are
unavailable. Instead, it just waits for thoses services to be up.
Previously, we could not run forum migrations in Kubernetes because they
relied on exec-ing the migration command in the running container -- and
there was no such container, because the services where not already up.