Climate-Data Analysis on Large Distributed Systems issueshttps://git.geomar.de/willi-rath/climate-data-analysis-on-large-distributed-systems/-/issues2018-03-21T23:15:09Zhttps://git.geomar.de/willi-rath/climate-data-analysis-on-large-distributed-systems/-/issues/1Outline2018-03-21T23:15:09ZWilli RathOutline### 1. Current status
Splitting computation into independent tasks:
```bash
for year in {1958..2017}; do
./calculate_for_year.sh ${year}
done
```
Or (naively) parallelizing over these independent tasks:
```bash
msub <<EOF
#!/bin/b...### 1. Current status
Splitting computation into independent tasks:
```bash
for year in {1958..2017}; do
./calculate_for_year.sh ${year}
done
```
Or (naively) parallelizing over these independent tasks:
```bash
msub <<EOF
#!/bin/bash
#PBS -l feature=smp2
#PBS -l nodes=1:ppn=40
#PBS -l walltime=12:00:00
echo {1958..2017} | tr ' ' '\n' | xargs -n1 -P${PBS_NP} ./calculate_for_year.sh ${year}
EOF
```
### 2. Problems with current status
- User has to explicitly care for parallelization
- choose "axis" along which do parallelize
- set up the logics and logistics of the parallelizatoin
- does not scale well beyond limits of a CPU socket / node / data center / ...
### 3. Towards distributed systems
![image](https://pangeo-data.github.io/img/pangeo_cartoon.png)
### 4. Demo
...Willi RathWilli Rath