This introduces quite a few changes to make it easier to run Caddy as a load
balancer in Kubernetes:
- Make it possible to start/stop a selection of resources with ``tutor k8s
start/stop [names...]``.
- Make it easy to deploy an independent LoadBalancer by converting the caddy
service to a NodePort when ``ENABLE_WEB_PROXY=false``.
- Add a ``app.kubernetes.io/component: loadbalancer`` label to the LoadBalancer
service.
- Add ``app.kubernetes.io/name`` labels to all services.
- Preserve the LoadBalancer service in ``tutor k8s stop`` commands.
- Wait for the caddy deployment to be ready before running initialisation jobs.
Close #532.
Forum is an optional feature, and as such it deserves its own plugin. Starting
from Maple, users will be able to install the forum from
https://github.com/overhangio/tutor-forum/
Close #450.
With this change, containers are no longer run as "root" but as unprivileged
users. This is necessary in some environments, notably some Kubernetes
clusters.
To make this possible, we need to manually fix bind-mounted volumes in
docker-compose. This is pretty much equivalent to the behaviour in Kubernetes,
where permissions are fixed at runtime if the volume owner is incorrect. Thus,
we have a consistent behaviour between docker-compose and Kubernetes.
We achieve this by bind-mounting some repos inside "*-permissions" services.
These services run as root user on docker-compose and will fix the required
permissions, as per build/permissions/setowner.sh These services simply do not
run on Kubernetes, where we don't rely on bind-mounted volumes. There, we make
use of Kubernete's built-in volume ownership feature.
With this change, we get rid of the "openedx-dev" Docker image, in the sense
that it no longer has its own Dockerfile. Instead, the dev image is now simply
a different target in the multi-layer openedx Docker image. This makes it much
faster to build the openedx-dev image.
Because we declare the APP_USER_ID in the dev/docker-compose.yml file, we need
to pass the user ID from the host there. The only way to achieve that is with a
tutor config variable. The downside of this approach is that the
dev/docker-compose.yml file is no longer portable from one machine to the next.
We consider that this is not such a big issue, as it affects the development
environment only.
We take this opportunity to replace the base image of the "forum" image. There
is now no need to re-install ruby inside the image. The total image size is
only decreased by 10%, but re-building the image is faster.
In order to run the smtp service as non-root, we switch from namshi/smtp to
devture/exim-relay. This change should be backward-compatible.
Note that the nginx container remains privileged. We could switch to
nginxinc/nginx-unprivileged, but it's probably not worth the effort, as we are
considering to get rid of the nginx container altogether.
Close #323.
Added OPENEDX_EXTRA_PIP_REQUIREMENTS setting, which allows to specify
extra pip packages that should be installed.
Moved "openedx-scorm-xblock" package from Dockerfile to the new setting
in the config.yml.
Limits the memory chek to the 'local quickstart' command, makes error
handling more accurate and adds warning messages for some conditions.
Also adds a mention of this in troubleshooting.rst.
In conversations with edX, we learned that the name "edge" had negative
undertones for historical reasons. Thus, we switch to "nightly", which means
pretty much the same thing.
Here, we make it possible to automatically append a suffix to the version and app
name (in the sense of appdirs). This guarantees that a tutor edge project will
not accidentally override another community release.
In addition, we take the opportunity to document the tutor versioning format.
(I've been meaning to do that for a long time)
This ensures that any warning generated from compiling the docs is treated as
an error. Also, building the docs is now one of the steps performed in CI.
<rant>I attempted to actually run Tutor with Podman and I was sorely disappointed.
The only reliable source of docs that I found concerning the integration with
docker-compose is this blog post:
https://www.redhat.com/sysadmin/podman-docker-compose
There are no other official docs 😓
1. The instructions given in the blog post don't work out of the box. Launching
the podman service failed altogether on Ubuntu 20.04 and 20.10. It worked on
CentOS 8, but some parameters need to changed, such as the docker socket path.
2. After I got the podman service working, I managed to get an Open edX
platform running with tutor, but with the root user. Then, containers
complained that they could not write data to the bind-mounted volumes. I
attempted to run as a non-root user, and discovered that the podman socket is
only readable by root. This should explain why all commands from that blog post
are prefixed by sudo.
Long story short, I was hoping to update the tutorial. Instead, I'm just moving
it for the sake of better organisation. For the life of me, I do not understand
why some people would want to run Podman instead of Docker. Bad documentation
is an immediate turn-off for me. From my perspective, podman is mostly an
overblown marketina stunt.</rant>
There is too much information in each of the local/k8s/dev docs pages. The
"guides" that are listed in each one of those pages are moved either to "common
tasks" or to a dedicated "tutorials" section. This paves the way for more
comprehensive tutorials, where we describe how to run the latest master
branches of Open edX.
I am well aware that, as they stand, the tutorials are of poor quality and
should be rewritten. This is a task for another day/commit. For now, we only
move the contents to a separate part of the docs.
Also, we should add a "reference" section to the docs, where we add the result
of `tutor <subcommand> --help`.
Previously, the list of domain names to which a theme was assigned had to be
specified manually. Now, the themes are automatically assigned to the LMS and
the CMS, both in development and production modes.
It should be unnecessary to build a custom openedx-dev Docker image. All tests
can run from within the dev Docker image, with a couple additional environment
variables.
The package maintainer of the "tutor" package was kind enough to
transfer ownership of the project to us. This is great, because we no
longer have to use the "openedx" suffix, which is trademarked.
For the time being, we keep maintaining the "tutor-openedx" package
which has a 1-to-1 dependency on the "tutor" package. In the future, we
expect that we will no longer push upgrades to tutor-openedx.
Here we add to the docs a few shameless plugs about Cairn -- because
it's really awesome!
We also add a few improvements to the wording, here and there.
We remove security patches and custom fixes which are now part of koa.3.
We take the opportunity to make it possible to build the openedx Docker image
without relying on a corresponding openedx-i18n repo tag: often, we want to
test whether the image simply builds successfully, and we don't need up-to-date
translations. For those cases, it's now possible to pass the `-a
OPENEDX_I18N_VERSION=oldertag` build argument.
First, allow using custom Django settings on a development
environment (as documented but not implemented), setting it to the
correct value of `tutor.development`. Prior to this, `tutor dev
runserver lms` would default to `tutor.production` when on a custom edX
branch.
Second, fix the documentation so the correct environment variable is
described, at the same time removing an option that doesn't seem to work.
See discussion: https://discuss.overhang.io/t/koa-dev-lms-doesnt-find-static-content/1250
We manage to get unit tests to run in a dedicated openedx-test container. Only
35 tests are failing (out of 17k). I suspect these tests are also failing in
the devstack.
This introduces a new dev/local command:
tutor dev bindmount CONTAINER PATH
And a new volume syntax:
tutor dev run --volume=PATH CONTAINER
This syntax automatically bind-mounts folders from the tutorroot/volumes
directory, which is pretty nifty.
- 💥[Improvement] Upgrade Open edX to Koa
- 💥 Setting changes:
- The ``ACTIVATE_HTTPS`` setting was renamed to ``ENABLE_HTTPS``.
- Other ``ACTIVATE_*`` variables were all renamed to ``RUN_*``.
- The ``WEB_PROXY`` setting was removed and ``RUN_CADDY`` was added.
- The ``NGINX_HTTPS_PORT`` setting is deprecated.
- Architectural changes:
- Use Caddy as a web proxy for automated SSL/TLS certificate generation:
- Nginx no longer listens to port 443 for https traffic
- The Caddy configuration file comes with a new ``caddyfile`` patch for much simpler SSL/TLS management.
- Configuration files for web proxies are no longer provided.
- Kubernetes deployment no longer requires setting up a custom Ingress resource or custom manager.
- Gunicorn and Whitenoise are replaced by uwsgi: this increases boostrap performance and makes it no longer necessary to mount media folders in the Nginx container.
- Replace memcached and rabbitmq by redis.
- Additional features:
- Make it possible to disable all plugins at once with ``plugins disable all``.
- Add ``tutor k8s wait`` command to wait for a pod to become ready
- Faster, more reliable static assets with local memory caching
- Deprecation: proxy files for Apache and Nginx are no longer provided out of the box.
- Removed plugin `{{ patch (...) }}` statements:
- "https-create", "k8s-ingress-rules", "k8s-ingress-tls-hosts": these are no longer necessary. Instead, declare your app in the "caddyfile" patch.
- "local-docker-compose-nginx-volumes": this patch was primarily used to serve media assets. The recommended is now to serve assets with uwsgi.
When I tried running `openedx-assets build` on my `tutor dev lms` machine, I got an error:
```
openedx@1dfe0ece7805:~/edx-platform$ openedx-assets build --env=dev
mkdir_p path('common/static/common/js/vendor')
mkdir_p path('common/static/common/css')
mkdir_p path('common/static/common/css/vendor')
Copying vendor files into static directory
Traceback (most recent call last):
File "/openedx/bin/openedx-assets", line 218, in <module>
main()
File "/openedx/bin/openedx-assets", line 89, in main
args.func(args)
File "/openedx/bin/openedx-assets", line 94, in run_build
run_npm(args)
File "/openedx/bin/openedx-assets", line 117, in run_npm
assets.process_npm_assets()
File "/openedx/edx-platform/pavelib/assets.py", line 643, in process_npm_assets
copy_vendor_library(library)
File "/openedx/edx-platform/pavelib/assets.py", line 614, in copy_vendor_library
raise Exception(u'Missing vendor file {library_path}'.format(library_path=library_path))
Exception: Missing vendor file node_modules/backbone.paginator/lib/backbone.paginator.js
```
As suggested in [this topic](https://discuss.overhang.io/t/issue-with-paver-update-assets/641) I had to run `npm install` to get the packages it tries to copy from. That makes sense, so I think it should be part of the instructions here.
Users appear to run into compatibility issues where podman-compose
does not behave exactly as docker-compose. Add a warning that those
issues exist, and that reports about them are welcome on the
Tutor Discourse.
Reference:
https://discuss.overhang.io/t/tutor-with-podman-not-working/905/2
Previously, it was not possible to override the docker registry for just
one or a few services. Setting the DOCKER_REGISTRY configuration
parameter would apply to all images. This was inconvenient. To resolve
this, we include the docker registry value in the DOCKER_IMAGE_*
configuration parameters. This allows users to override the docker
registry individually by defining the DOCKER_IMAGE_SERVICENAME
configuration parameter.
See https://discuss.overhang.io/t/kubernetes-ci-cd-pipeline/765/3
Here, we upgrade the Open edX platform from Ironwood to Juniper. This
upgrade does not come with many feature changes, but there are many
technical improvements under the hood:
- Upgrade from Python 2.7 to 3.5
- Upgrade from Mongodb v3.2 to v3.6
- Upgrade Ruby to 2.5.7
We took the opportunity to completely rething the way locally running
platforms should be accessed for testing purposes. It is no longer
possible to access a running platform from http://localhost and
http://studio.localhost. Instead, users should access
http://local.overhang.io and https://studio.local.overhang.io. This
drastically simplifies internal communication between Docker containers.
To upgrade, users should simply run:
tutor local quickstart
For Kubernetes platform, the upgrade process is outlined when running:
tutor k8s upgrade --from=ironwood
There are too many different ways to deploy an Ingress resource and to
generate SSL/TLS certificates: it's too much responsibility to make that
decision for the end user.
Running jobs was previously done with "exec". This was because it
allowed us to avoid copying too much container specification information
from the docker-compose/deployments files to the jobs files. However,
this was limiting:
- In order to run a job, the corresponding container had to be running.
This was particularly painful in Kubernetes, where containers are
crashing as long as migrations are not correctly run.
- Containers in which we need to run jobs needed to be present in the
docker-compose/deployments files. This is unnecessary, for example when
mysql is disabled, or in the case of the certbot container.
Now, we create dedicated jobs files, both for local and k8s deployment.
This introduces a little redundancy, but not too much. Note that
dependent containers are not listed in the docker-compose.jobs.yml file,
so an actual platform is still supposed to be running when we launch the
jobs.
This also introduces a subtle change: now, jobs go through the container
entrypoint prior to running. This is probably a good thing, as it will
avoid forgetting about incorrect environment variables.
In k8s, we find ourselves interacting way too much with the kubectl
utility. Parsing output from the CLI is a pain. So we need to switch to
the native kubernetes client library.
The "Certificate" objects are no longer required. As a consequence, the
"k8s-ingress-certificates" has become useless and should be removed from
plugins.
Users can now add custom translation strings to a locale folder at build
time, very much in the same way as custom themes or requirements. This
is quite convenient, although is does require quite a bit of time to
rebuild the docker images.
During an incident at npmjs.org it was extremely difficult to pull
nodejs packages -- so we made it possible to pull from a custom
registry, deployed for instance with Verdaccio (https://verdaccio.org/).
When we were changing unit titles in the CMS, the changes were taking a
long time to be reflected in the LMS. That's because the cache key that
corresponds to the course structure was not being updated. It was the
responsibility of an asynchronous LMS celery worker to update this cache
entry. However, this was impossible in most cases because tasks
triggered in the CMS were only processed by CMS workers. That is, unless
we are using a custom celery router:
https://celery.readthedocs.io/en/latest/userguide/routing.html#routers
This is what edx-platform does in the devstack: certain CMS tasks are
forwarded both to CMS and to LMS workers. This is achieved by defining
the ALTERNATE_WORKER_QUEUES="lms" django setting in the CMS.
Adding this setting to Tutor solves the problem in production. However,
in development mode Open edX runs without workers
(`CELERY_ALWAYS_EAGER=True`). This means that the course structure will
not be automatically updated when running `tutor dev` commands, which is
a shame. The alternative is to define the
"block_structure.invalidate_cache_on_publish" waffle switch. This can be
done from the UI (in /admin/waffle/switch/add/) or by running:
tutor dev run lms ./manage.py lms waffle_switch block_structure.invalidate_cache_on_publish on --create
However, this flag seems to slow down access to the LMS for the first
user who tries to access the course after it has been updated.
Close #302
There are too many patches on top of ironwood.2, and it's not practical
to pull them all one by one. We still want to build on top of a specific
version, and not a branch, so we use a dirty hack to guarantee that the
docker image is properly rebuilt by CI when we change it.
Because we are running a version of elasticsearch older than Methusalem,
the docker environment variables were not properly taken into account.
For instance, the cluster name and "mlockall" settings were incorrect,
as we could see by running:
$ tutor local run lms curl elasticsearch:9200 | grep cluster_name
...
"cluster_name" : "elasticsearch",
$ tutor local run lms curl elasticsearch:9200/_nodes/process?pretty | grep mlock
...
"mlockall" : false
See
https://discuss.overhang.io/t/elastic-container-is-not-being-removed/312/3
for discussion.
This fix also introduces a new tutor configuration setting to adjust the
elasticsearch heap size.
A prior change used the ironwood.1 tag to build the Android app in an
attempt to solve #289. Turns out that this change was unnecessary. So
here we revert to a more recent release of the Android app. Instead of
building from the master branch (which might create suprises) we build
from a fixed release tag.
The source repo and version are customisable via build arguments.
https://podman.io/ is meant to be a drop-in replacement for Docker.
Thus, with some tweaking to the installation environment, it appears
to be perfectly feasible to run Tutor in a Docker-less environment
that only has Podman and podman-compose installed.
Add installation instructions for doing just that.
By de-duplicating the code between dev.py and local.py, we are able to
support more docker-compose run/up/stop options passed from tutor. To do
so, we had to disable some features, such as automatically mounting the
edx-platform repo when the TUTOR_EDX_PLATFORM_PATH environment variable
was defined.
It makes more sense to document this command instead of adding it to the
`local` commands. If need be, in the future we should be able to re-add
it as a plugin.
This command adds a burden on the `local` and `k8s` command. It does not
make sense to provide this command out of the box, and not other
administration commands. Instead, we should better document how to run
regular `manage.py` commands from tutor.
Close #269.