Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
C
Climate-Data Analysis on Large Distributed Systems
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • Operations
    • Operations
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Willi Rath
  • Climate-Data Analysis on Large Distributed Systems
  • Issues
  • #1

Closed
Open
Created Mar 20, 2018 by Willi Rath@willi-rathMaintainer

Outline

1. Current status

Splitting computation into independent tasks:

for year in {1958..2017}; do
    ./calculate_for_year.sh ${year}
done

Or (naively) parallelizing over these independent tasks:

msub <<EOF
#!/bin/bash
#PBS -l feature=smp2
#PBS -l nodes=1:ppn=40
#PBS -l walltime=12:00:00

echo {1958..2017} | tr ' ' '\n' | xargs -n1 -P${PBS_NP} ./calculate_for_year.sh ${year}

EOF

2. Problems with current status

  • User has to explicitly care for parallelization
    • choose "axis" along which do parallelize
    • set up the logics and logistics of the parallelizatoin
  • does not scale well beyond limits of a CPU socket / node / data center / ...

3. Towards distributed systems

image

4. Demo

...

Edited Mar 21, 2018 by Willi Rath
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None