Idle/Orphaned Jupyter processes that block memory resources
This happens on the Nesh login nodes (as e.g. reported in March by the Nesh admins) and has started to appear also on the scalc*
machines (the idle Jupyter processes that were especially present on scalc01
were finally shut down with the reboot during yesterday's patch day, though).
I did a bit of reading/experimenting and think that a robust automatic (forced) shutdown of (forgotten) Jupyter processes could be achieved with a combination of the Linux timeout
command and the built-in "culling features" of Jupyter.
In a scripted approach it might look like this
# in seconds
JUPYTER_HARD_TIMEOUT=300
JUPYTER_SOFT_TIMEOUT=60
JUPYTER_CULL_INTERVAL=10
JUPYTER_CULL_TIMEOUT=30
timeout $JUPYTER_HARD_TIMEOUT \
jupyter lab \
--LabApp.shutdown_no_activity_timeout=$JUPYTER_SOFT_TIMEOUT \
--MappingKernelManager.cull_connected=True \
--MappingKernelManager.cull_interval=$JUPYTER_CULL_INTERVAL \
--MappingKernelManager.cull_idle_timeout=$JUPYTER_CULL_TIMEOUT \
--TerminalManager.cull_interval=$JUPYTER_CULL_INTERVAL \
--TerminalManager.cull_inactive_timeout=$JUPYTER_CULL_TIMEOUT \
--ip=$(hostname) --no-browser
which implements both a soft and a hard time limit. The built-in shutdown mechanism for idle Jupyter kernels and terminals seems to be very robust and works as I expect them (from reading the docs). The hard limit, on the other hand, I found necessary because I experienced unpredictable behaviour (at least for me) for the shutdown_no_activity_timeout
option. Sometimes, an inactive JupyterLab open in the browser was shutting downand sometimes it was still there after several minutes (without any kernels and terminals running, or manual activity during this period). It seems to always work if the browser tab is closed, though.
I think, it would be helpful to implement this (or a similar mechanism?) in the scripts we maintain here, i.e. into the nesh-linux-cluster-jupyterlab.sh
, hlrn-goettingen-jupyterlab.sh
and into the remote_jupyter_manager.sh
script? It would also be good to mention the problem of idling Jupyter processes that block memory resources in the README.md.
As I never have worked with a JupyterLab session across several work days I personally would be happy (i.e. not disrupted) with setting up default time limits such as:
# in seconds
JUPYTER_HARD_TIMEOUT=64800 # 18 hours
JUPYTER_SOFT_TIMEOUT=43200 # 12 hours
JUPYTER_CULL_INTERVAL=300 # 5 minutes (the Jupyter default, could/should be increased?)
JUPYTER_CULL_TIMEOUT=21600 # 6 hours
Any thoughts on this? Should we and how should we implement this? What are (more) useful default time limits?
/cc @willi-rath @sebastian-wahl @martin-claus @klaus-getzlaff