6
0
mirror of https://github.com/ChristianLight/tutor.git synced 2024-12-12 14:17:46 +00:00

feat: k8s scale command + tutorial

- Add a `tutor k8s scale lms 11` command
- Create a "Running Open edX at scale" tutorial
This commit is contained in:
Régis Behmo 2021-11-30 18:02:14 +01:00 committed by Régis Behmo
parent d8d0560b9e
commit 62ddc01cdc
4 changed files with 125 additions and 11 deletions

View File

@ -74,6 +74,8 @@ You may want to pull/push images from/to a custom docker registry. For instance,
(the trailing ``/`` is important)
.. _openedx_configuration:
Open edX customisation
~~~~~~~~~~~~~~~~~~~~~~

View File

@ -18,6 +18,7 @@ System administration
.. toctree::
:maxdepth: 2
tutorials/scale
tutorials/portainer
tutorials/podman
tutorials/proxy

85
docs/tutorials/scale.rst Normal file
View File

@ -0,0 +1,85 @@
.. _scale:
Running Open edX at scale
=========================
Does Open edX scale? This is the $10⁶ question when it comes to Tutor and Open edX deployments. The short answer is "yes". The longer answer is also "yes", but the details will very much depend on what we mean by "scaling".
Depending on the context, "scaling" can imply different things:
1. `Vertical scaling <https://en.wikipedia.org/wiki/Scalability#VERTICAL-SCALING>`__: increasing platform capacity by allocating more resources to a single server.
2. `Horizontal scaling <https://en.wikipedia.org/wiki/Scalability#HORIZONTAL-SCALING>`__: the ability to serve an infinitely increasing number of users with consistent performance and linear costs.
3. `High availability (HA) <https://en.wikipedia.org/wiki/High_availability>`__: the ability of the platform to remain fully functional despite one or more components being unavailable.
All of these can be achieved with Tutor and Open edX, but the method to attain either differs greatly. First of all, the range of available solutions will depend on which deployment target is used. Tutor supports installations of Open edX on a single server with the :ref:`"local" <local>` deployment target, where Docker containers are orchestrated by docker-compose. On a single server, by definition, the server is a single point of failure (`SPOF <https://en.wikipedia.org/wiki/Single_point_of_failure>`__). Thus, high availability is out of the question with a single server. To achieve high availability, it is necessary to deploy to a cluster of multiple servers. But while docker-compose is a great tool for managing single-server deployments, it is simply inappropriate for deploying to a cluster. Tutor also supports deploying to a Kubernetes cluster (see :ref:`k8s`). This is the recommended solution to deploy Open edX "at scale".
Scaling with a single server
----------------------------
Options are limited when it comes to scaling an Open edX platform deployed on a single-server. High availability is out of the question and the number of users that your platform can serve simultaneously will be limited by the server capacity.
Fortunately, Open edX was designed to run at scale -- most notably at `edX.org <edx.org>`__, but also on large national education platforms. Thus, performance will not be limited by the backend software, but only by the hardware.
Increasing web server capacity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As the server CPU and memory are increased, the request throughput can be increased by adjusting the number of uWSGI workers (see :ref:`configuration docs <openedx_configuration>`). By default, the "lms" and "cms" containers each spawn 2 uWSGI workers. The number of workers should be increased if you observe an increase of the latency of user requests but CPU usage remains below 100%. To increase the number of workers for the LMS and the CMS, run for example::
tutor config save \
--set OPENEDX_LMS_UWSGI_WORKERS=8 \
--set OPENEDX_CMS_UWSGI_WORKERS=4
tutor local restart lms cms
The right values will very much depend on your server available memory and CPU performance, as well as the maximum number of simultaneous users who use your platform. As an example data point, it was reported that a large Open edX platform can serve up to 500k unique users per week on a virtual server with 8 vCPU and 16 GB memory.
Offloading data storage
~~~~~~~~~~~~~~~~~~~~~~~
Aside from web workers, the most resource-intensive services are in the data persistence layer. They are, by decreasing resource usage:
- `Elasticsearch <https://www.elastic.co/elasticsearch/>`__: indexing of course contents and forum topics, mostly for search. Elasticsearch is never a source of truth in Open edX, and the data can thus be trashed and re-created safely.
- `MySQL <https://www.mysql.com>`__: structured, consistent data storage which is the default destination of all data.
- `Mongodb <https://www.mongodb.com>`__: structured storage of course data.
- `Redis <https://redis.io/>`__: caching and asynchronous task management.
- `MinIO <https://min.io>`__: S3-like object storage for user-uploaded files, which is enabled by the `tutor-minio <https://github.com/overhangio/tutor-minio>`__ plugin. It is possible to replace MinIO by direct filesystem storage (the default), but scaling will then become much more difficult down the road.
When attempting to scale a single-server deployment, we recommend to start by offloading some of these stateful data storage components, in the same order of priority. There are multiple benefits:
1. It will free up some resource both for the web workers and the data storage components.
2. It is a first step towards horizontal scaling of the web workers.
3. It becomes possible to either install every component as a separate service or rely on 3rd-party SaaS with high availability.
Moving each of the data storage components is a fairly straightforward process, although details vary for every component. For instance, for the MySQL database, start by disabling the locally running MySQL instance::
tutor config save --set RUN_MYSQL=false
Then, migrate the data located at ``$(tutor config printroot)/data/mysql`` to the new MySQL instance. Configure the Open edX platform to point at the new database::
tutor config save \
--set MYSQL_HOST=yourdb.com \
--set MYSQL_PORT=3306 \
--set MYSQL_ROOT_USERNAME=root \
--set MYSQL_ROOT_PASSWORD=p4ssw0rd
The changes will be taken into account the next time the platform is restarted.
Beware that moving the data components to dedicated servers has the potential of creating new single points of failures (`SPOF <https://en.wikipedia.org/wiki/Single_point_of_failure>`__). To avoid this situation, each component should be installed as a highly available service (or as highly available SaaS).
Scaling with multiple servers
-----------------------------
Horizontally scaling web services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As the number of users of a web platform increases, they put increased pressure on the web workers that respond to their requests. Thus, in most cases, web worker performance is the first bottleneck that system administrators have to face when their service becomes more popular. Initially, any given Kubernetes-based Tutor platform ships with one replica for each deployment. To increase (or reduce) the number of replicas for any given service, run ``tutor k8s scale <name> <number of replicas>``. Behind the scenes, this command will trigger a ``kubectl scale --replicas=...`` command that will seamlessly increase the number of pods for that deployment.
In Open edX there are multiple web services that are exposed to the outside world. The ones that usually receive the most traffic are, in decreasing order, the LMS, the CMS and the forum (assuming the `tutor-forum <https://github.com/overhangio/tutor-forum>`__ plugin was enabled). As an example, all three deployment replicas can be scaled by running::
tutor k8s scale lms 8
tutor k8s scale cms 4
tutor k8s scale forum 2
Highly-available architecture, autoscaling, ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is only so much that Tutor can do for you, and scaling some components falls beyond the scope of Tutor. For instance, it is your responsibility to make sure that your Kubernetes cluster has a `highly available control plane <https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/>`__ and `topology <https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/>`__. Also, it is possible to achieve `autoscaling <https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/>`__; but it is your responsibility to setup latency metrics collection and to configure the scaling policies.

View File

@ -296,17 +296,6 @@ def reboot(context: click.Context) -> None:
context.invoke(start)
def resource_selector(config: Config, *selectors: str) -> List[str]:
"""
Convenient utility for filtering only the resources that belong to this project.
"""
selector = ",".join(
["app.kubernetes.io/instance=openedx-" + get_typed(config, "ID", str)]
+ list(selectors)
)
return ["--namespace", k8s_namespace(config), "--selector=" + selector]
@click.command(help="Completely delete an existing platform")
@click.option("-y", "--yes", is_flag=True, help="Do not ask for confirmation")
@click.pass_obj
@ -337,6 +326,24 @@ def init(context: Context, limit: Optional[str]) -> None:
jobs.initialise(runner, limit_to=limit)
@click.command(help="Scale the number of replicas of a given deployment")
@click.argument("deployment")
@click.argument("replicas", type=int)
@click.pass_obj
def scale(context: Context, deployment: str, replicas: int) -> None:
config = tutor_config.load(context.root)
utils.kubectl(
"scale",
# Note that we don't use the full resource selector because selectors
# are not compatible with the deployment/<name> argument.
*resource_namespace_selector(
config,
),
"--replicas={}".format(replicas),
"deployment/{}".format(deployment),
)
@click.command(help="Create an Open edX user and interactively set their password")
@click.option("--superuser", is_flag=True, help="Make superuser")
@click.option("--staff", is_flag=True, help="Make staff user")
@ -562,6 +569,24 @@ def wait_for_pod_ready(config: Config, service: str) -> None:
)
def resource_selector(config: Config, *selectors: str) -> List[str]:
"""
Convenient utility to filter the resources that belong to this project.
"""
selector = ",".join(
["app.kubernetes.io/instance=openedx-" + get_typed(config, "ID", str)]
+ list(selectors)
)
return resource_namespace_selector(config) + ["--selector=" + selector]
def resource_namespace_selector(config: Config) -> List[str]:
"""
Convenient utility to filter the resources that belong to this project namespace.
"""
return ["--namespace", k8s_namespace(config)]
def k8s_namespace(config: Config) -> str:
return get_typed(config, "K8S_NAMESPACE", str)
@ -572,6 +597,7 @@ k8s.add_command(stop)
k8s.add_command(reboot)
k8s.add_command(delete)
k8s.add_command(init)
k8s.add_command(scale)
k8s.add_command(createuser)
k8s.add_command(importdemocourse)
k8s.add_command(settheme)