We have a Flask application that is served via gunicorn, using the eventlet worker. We're deploying the application in a kubernetes pod, with the idea of scaling the number of pods depending on workload.
The recommended settings for the number of workers in gunicorn is
2 - 4 x $NUM_CPUS. See docs. I've previously deployed services on dedicated physical hardware where such calculations made sense. On a 4 core machine, having 16 workers sounds OK and we eventually bumped it to 32 workers.
Does this calculation still apply in a kubernetes pod using an async worker particularly as:
How should I set the number of gunicorn workers?
-w 1and let kubernetes handle the scaling via pods?
2-4 x $NUM_CPUon the kubernetes nodes. On one pod or multiple?
We decided to go with the 1st option, which is our current approach. Set the number of gunicorn works to 1, and scale horizontally by increasing the number of pods. Otherwise there will be too many moving parts plus we won't be leveraging Kubernetes to its full potential.
For better visibility of the final solution chosen by original author of this question as of 2019 year
Set the number of gunicorn works to 1 (-w 1), and scale horizontally by increasing the number of pods (using Kubernetes HPA).
and the fact it might be not applicable in the close future, taking into account fast growth of workload related features in Kubernetes platform, e.g. some distributions of Kubernetes propose beside HPA, Vertical Pod Autoscaling (VPA) and Multidimensional Pod autoscaling (MPA) too, so I propose to continue this thread in form of community wiki post.