jupyter_on_HPC_setup_guide issueshttps://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues2019-11-10T16:10:18Zhttps://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/7MATLAB kernel support2019-11-10T16:10:18ZJaard HauschildtMATLAB kernel supportI am using the ROMS/CROCO model, which unfortunately still requires running a lot of MATLAB code for pre- and postprocessing. I would therefore be interested in installing a MATLAB kernel for Jupyter on e.g. Taurus, but I have no idea wh...I am using the ROMS/CROCO model, which unfortunately still requires running a lot of MATLAB code for pre- and postprocessing. I would therefore be interested in installing a MATLAB kernel for Jupyter on e.g. Taurus, but I have no idea where to start. If anybody can point me in the right direction, it would be greatly appreciated.https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/30Should we restructure this repository for easier user access to co-existing/c...2019-12-11T12:16:26ZKatharina HöflichShould we restructure this repository for easier user access to co-existing/complementing solutions?For easier navigation of the repository it could help to do a functional grouping of the scripts and I would therefore propose to re-organize the scripts in a folder structure like this...
```
.
└── jupyer_on_HPC_setup_guide/
├── j...For easier navigation of the repository it could help to do a functional grouping of the scripts and I would therefore propose to re-organize the scripts in a folder structure like this...
```
.
└── jupyer_on_HPC_setup_guide/
├── job_scripts/
│ └── ...
├── remote_Jupyter_manager/
│ └── ...
└── SSH_tunneling/
└── ...
```
Another potential issue might be the readme. While "proofreading" for merge request !14 I came to think that it is rather extensive by now and I guess especially for newcomers it might be rather confusing/difficult to quickly access the desired information. Should we do something about this?
@willi-rath @sebastian-wahlhttps://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/36Add hints on how to debug failing JupyterLab instances on compute nodes2020-06-02T13:35:53ZKatharina HöflichAdd hints on how to debug failing JupyterLab instances on compute nodesI think this would be helpful. Should we prepare a readme section on this?I think this would be helpful. Should we prepare a readme section on this?https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/33Adapt HLRN Jobscript for Berlin?2020-06-03T06:08:38ZWilli RathAdapt HLRN Jobscript for Berlin?From <https://git.geomar.de/python/jupyter_on_HPC_setup_guide/issues/32#note_18872>:
> The [job script](https://git.geomar.de/python/jupyter_on_HPC_setup_guide/blob/master/job-scripts/hlrn-goettingen-jupyterlab.sh) for Goettingen also w...From <https://git.geomar.de/python/jupyter_on_HPC_setup_guide/issues/32#note_18872>:
> The [job script](https://git.geomar.de/python/jupyter_on_HPC_setup_guide/blob/master/job-scripts/hlrn-goettingen-jupyterlab.sh) for Goettingen also works in Berlin. No changes needed, except for queue names in the header.
Should we provide a separate job script for Berlin even if it is only small adaptions?https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/41Linux connection problems2020-07-20T10:32:56ZWilli RathLinux connection problemsFrom #39:
> While trying to set up jupyter on scalc01, I stumbled across a similar issue.
>
> I am using a linux machine `OD-NB010LX`. Starting jupyter-lab from the base env on scalc01 works but when I try to set up the tunnel with c...From #39:
> While trying to set up jupyter on scalc01, I stumbled across a similar issue.
>
> I am using a linux machine `OD-NB010LX`. Starting jupyter-lab from the base env on scalc01 works but when I try to set up the tunnel with chromium, I also get
> > This site can’t be reached
> > 127.0.0.1 refused to connect.
> > Try:
> > Checking the connection
> > Checking the proxy and the firewall
> > ERR_CONNECTION_REFUSED
@gabriel-ditzinger: Can you give details on how exactly you start the tunnel?https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/42[DOC] Make sure user's don't run into round-robin intermittent problems2020-09-08T10:23:57ZWilli Rath[DOC] Make sure user's don't run into round-robin intermittent problemsI've seen people using round-robin host names (like `nesh-fe....`, or `glogin.hlrn.de`, etc.) and fail intermittently if trying to connect to services listening on `localhost`, because they end up with the tunnel being to a different hos...I've seen people using round-robin host names (like `nesh-fe....`, or `glogin.hlrn.de`, etc.) and fail intermittently if trying to connect to services listening on `localhost`, because they end up with the tunnel being to a different host than the service.
We should add this to the documentation / trouble shooting section.https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/43Idle/Orphaned Jupyter processes that block memory resources2020-09-17T09:10:31ZKatharina HöflichIdle/Orphaned Jupyter processes that block memory resourcesThis happens on the Nesh login nodes (as e.g. reported in March by the Nesh admins) and has started to appear also on the `scalc*` machines (the idle Jupyter processes that were especially present on `scalc01` were finally shut down with...This happens on the Nesh login nodes (as e.g. reported in March by the Nesh admins) and has started to appear also on the `scalc*` machines (the idle Jupyter processes that were especially present on `scalc01` were finally shut down with the reboot during yesterday's patch day, though).
I did a bit of reading/experimenting and think that a robust automatic (forced) shutdown of (forgotten) Jupyter processes could be achieved with a combination of the Linux `timeout` command and the built-in "culling features" of Jupyter.
In a scripted approach it might look like this
```
# in seconds
JUPYTER_HARD_TIMEOUT=300
JUPYTER_SOFT_TIMEOUT=60
JUPYTER_CULL_INTERVAL=10
JUPYTER_CULL_TIMEOUT=30
timeout $JUPYTER_HARD_TIMEOUT \
jupyter lab \
--LabApp.shutdown_no_activity_timeout=$JUPYTER_SOFT_TIMEOUT \
--MappingKernelManager.cull_connected=True \
--MappingKernelManager.cull_interval=$JUPYTER_CULL_INTERVAL \
--MappingKernelManager.cull_idle_timeout=$JUPYTER_CULL_TIMEOUT \
--TerminalManager.cull_interval=$JUPYTER_CULL_INTERVAL \
--TerminalManager.cull_inactive_timeout=$JUPYTER_CULL_TIMEOUT \
--ip=$(hostname) --no-browser
```
which implements both a soft and a hard time limit. The built-in shutdown mechanism for idle Jupyter kernels and terminals seems to be very robust and works as I expect them (from reading the docs). The hard limit, on the other hand, I found necessary because I experienced unpredictable behaviour (at least for me) for the `shutdown_no_activity_timeout` option. Sometimes, an inactive JupyterLab open in the browser was shutting downand sometimes it was still there after several minutes (without any kernels and terminals running, or manual activity during this period). It seems to always work if the browser tab is closed, though.
I think, it would be helpful to implement this (or a similar mechanism?) in the scripts we maintain here, i.e. into the `nesh-linux-cluster-jupyterlab.sh`, `hlrn-goettingen-jupyterlab.sh` and into the `remote_jupyter_manager.sh` script? It would also be good to mention the problem of idling Jupyter processes that block memory resources in the README.md.
As I never have worked with a JupyterLab session across several work days I personally would be happy (i.e. not disrupted) with setting up default time limits such as:
```
# in seconds
JUPYTER_HARD_TIMEOUT=64800 # 18 hours
JUPYTER_SOFT_TIMEOUT=43200 # 12 hours
JUPYTER_CULL_INTERVAL=300 # 5 minutes (the Jupyter default, could/should be increased?)
JUPYTER_CULL_TIMEOUT=21600 # 6 hours
```
Any thoughts on this? Should we and how should we implement this? What are (more) useful default time limits?
/cc @willi-rath @sebastian-wahl @martin-claus @klaus-getzlaffhttps://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/44No connection to nesh with Mac OS X Catalina2021-01-28T15:19:51ZPatricia HandmannNo connection to nesh with Mac OS X CatalinaEventhough I am running the ./run_chromium_through_ssh_tunnel.sh myname6@nesh-fe1.rz.uni-kiel.de
nesh apparently refuses to connect :
Die Website ist nicht erreichbar
127.0.0.1 hat die Verbindung abgelehnt.
Versuchen Sie Folgendes:
Ve...Eventhough I am running the ./run_chromium_through_ssh_tunnel.sh myname6@nesh-fe1.rz.uni-kiel.de
nesh apparently refuses to connect :
Die Website ist nicht erreichbar
127.0.0.1 hat die Verbindung abgelehnt.
Versuchen Sie Folgendes:
Verbindung prüfen
Proxy und Firewall prüfen
ERR_CONNECTION_REFUSED
I hope you can help me.
Cheers,
Patriciahttps://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/45HLRN: How to use all cpus on the single-tenant queues?2021-02-16T15:19:43ZWilli RathHLRN: How to use all cpus on the single-tenant queues?```
salloc --ntasks=1 -p standard96 -A $USER srun --pty bash -i
```
resulted in Dask only seeing 2 cpus but all ~350GB of memory available on the `standard96` nodes. Do we really need to specify `--cpus-per-task=96` to use them all, or i...```
salloc --ntasks=1 -p standard96 -A $USER srun --pty bash -i
```
resulted in Dask only seeing 2 cpus but all ~350GB of memory available on the `standard96` nodes. Do we really need to specify `--cpus-per-task=96` to use them all, or is this a configuration issue at HLRN.https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/46Make sure to mention that conda commands do not work if .bashrc is not sourced2021-02-19T16:25:46ZKatharina HöflichMake sure to mention that conda commands do not work if .bashrc is not sourcedIf `.bashrc` is not sourced, either because there is no e.g. entry in `.bash_profile` (or because that file is entirely missing!) then `conda` commands do not work as expected, even though `conda` was initialized during installation. Let...If `.bashrc` is not sourced, either because there is no e.g. entry in `.bash_profile` (or because that file is entirely missing!) then `conda` commands do not work as expected, even though `conda` was initialized during installation. Let's explicitly point this out in the troubleshooting section.https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/39SSH tunnel from Windows2022-07-21T08:00:36ZAnnika ReintgesSSH tunnel from WindowsMy SSH tunnel from Windows is not working. To connect I have now tried MobaXterm, Anaconda, and Windows Power Shell.
I tried the following things:
1) Without the script on MobaXterm:
`ssh -f -D localhost:54321 smomw247@nesh-fe.rz.uni-k...My SSH tunnel from Windows is not working. To connect I have now tried MobaXterm, Anaconda, and Windows Power Shell.
I tried the following things:
1) Without the script on MobaXterm:
`ssh -f -D localhost:54321 smomw247@nesh-fe.rz.uni-kiel.de sleep 15 `
...some waiting ... back to prompt ...
`chrome-browser --proxy-server="socks5://localhost:54321"`
no matter whether using "chromium" oder "chrome" --> "command not found"
2) With the [script](https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/tree/master#wrapped-in-a-script-on-windows) for Windows on MobaXterm (Git bash is installed)
`./run_chromium_through_ssh_tunnel_WINDOWS.sh smomw247@nesh-fe.rz.uni-kiel.de http://127.0.0.1:8889/?token=29ed9b500bfb49264316558faa963416586781409b70ce92`
Note: I had to set the variable 'browser' manually in this script as MobaXterm adds '/drives' in front of all paths.
A chrome window is opening but with the message "Die Website ist nicht erreichbar 127.0.0.1 hat die Verbindung abgelehnt."
And in der MobaXterm prompt I get:
> [5384:8604:0626/191730.302:ERROR:cache_util_win.cc(21)] Unable to move the cache: Zugriff verweigert (0x5)
> [5384:8604:0626/191730.303:ERROR:cache_util.cc(139)] Unable to move cache folder C:\Users\areintges\AppData\Local\Google\Chrome\User Data\ShaderCache\GPUCache to C:\Users\areintges\AppData\Local\Google\Chrome\User Data\ShaderCache\old_GPUCache_000
> [5384:8604:0626/191730.303:ERROR:disk_cache.cc(184)] Unable to create cache
> [5384:8604:0626/191730.303:ERROR:shader_disk_cache.cc(606)] Shader Cache Creation failed: -2
> Wird in einer aktuellen Browsersitzung ge▒ffnet.
3) Without script in Anaconda
`ssh -f -D localhost:54321 smomw247@nesh-fe.rz.uni-kiel.de sleep 15`
I enter my password, then the session hangs.
4) With the script for Windows on Anaconda and Windows Power Shell:
`bash run_chromium_through_ssh_tunnel_WINDOWS.sh smomw247@nesh-fe.rz.uni-kiel.de http://127.0.0.1:8889/?token=29ed9b500bfb49264316558faa963416586781409b70ce92`
I get this:
> will route traffic through smomw247@nesh-fe.rz.uni-kiel.de
> using port 54321
> run_chromium_through_ssh_tunnel_WINDOWS.sh: ssh: command not found
> run_chromium_through_ssh_tunnel_WINDOWS.sh: tr: command not found
> run_chromium_through_ssh_tunnel_WINDOWS.sh: paste: command not found
> Won't use proxy for any of:
> run_chromium_through_ssh_tunnel_WINDOWS.sh: : command not found
Can anybody help?
My approach no. 2 seemed most promising to me.https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/49chrome doesn't find jupyter notebook http address on the HPC2022-11-22T09:57:32ZLavinia Patarachrome doesn't find jupyter notebook http address on the HPCI am trying to run a jupyter notebook on HLRN with local tunneling on a mac. The jupyter notebook runs without problem on hlrn, and the tunneling script opens chrome. The problem is that when I put the http address corresponding to the j...I am trying to run a jupyter notebook on HLRN with local tunneling on a mac. The jupyter notebook runs without problem on hlrn, and the tunneling script opens chrome. The problem is that when I put the http address corresponding to the jupyer notebook, an error occurs ("Address not found"). It occurred to me that when I launch the jupyter script on a compute node (using sbatch) I don't know how to check in which partition it is running. Would anyone know what the problem is? Thanks.https://git.geomar.de/python/jupyter_on_HPC_setup_guide/-/issues/40Document how to deal with round-robin host names2023-05-24T09:32:27ZWilli RathDocument how to deal with round-robin host namesIn #39 (and a few times before), we see potential problems with JupyterLab running on a host that has been selected from a round robin IP like:
```shell
$ host nesh-fe.rz.uni-kiel.de ...In #39 (and a few times before), we see potential problems with JupyterLab running on a host that has been selected from a round robin IP like:
```shell
$ host nesh-fe.rz.uni-kiel.de
nesh-fe.rz.uni-kiel.de has address 134.245.3.14
nesh-fe.rz.uni-kiel.de has address 134.245.3.13
nesh-fe.rz.uni-kiel.de has address 134.245.3.15
$ host nesh-fe.rz.uni-kiel.de | awk '{print $4}' | xargs -n1 host
13.3.245.134.in-addr.arpa domain name pointer nesh-fe1.rz.uni-kiel.de.
14.3.245.134.in-addr.arpa domain name pointer nesh-fe2.rz.uni-kiel.de.
15.3.245.134.in-addr.arpa domain name pointer nesh-fe3.rz.uni-kiel.de.
```
As this can lead to difficult-to-debug intermittent problems, we should document and give a recommendation for how to deal with this.