Commit 551976a7 authored by Willi Rath's avatar Willi Rath

Modify title and improve slope example

parent d0db36cc
class: middle, left
count: false
# Towards <br>Climate-Data Analysis on <br>Large Distributed Systems
# Towards <br>(Climate) Data Analysis on <br>Large _Distributed Systems_
Willi Rath | <wrath@geomar.de>
<p>&nbsp;</p>
.smaller[Thanks: _ Martin&nbsp;Claus, Claus&nbsp;Böning, Torge&nbsp;Martin,
Markus&nbsp;Scheinert, Klaus&nbsp;Getzlaff, Franziska&nbsp;Schwarzkopf,
Christina&nbsp;Roth, Rafael&nbsp;Abel, Arne&nbsp;Biastoch,
Kristin&nbsp;Burmeister, Julia&nbsp;Getzlaff, Carsten&nbsp;Schirnick,
Claas&nbsp;Faber, Kai&nbsp;Grunau, Stefan&nbsp;Jöhnke, Lutz&nbsp;Griesbach,
Thomas&nbsp;Grunert, Knut&nbsp;Günther, Friedrich&nbsp;Althausen, GEOMAR
Data-Management&nbsp;Team, GEOMAR IT&nbsp;Department, … _ ]
<p>&nbsp;</p>
.smaller[.right[
Git repo —
<https://git.geomar.de/willi-rath/climate-data-analysis-on-large-distributed-systems/>]]
......@@ -72,15 +62,33 @@ echo {1958..2017} | tr '[:blank:]' '\n' | xargs -n1 -P${PBS_NP} ./analyse_for.sh
---
class: middle
layout: false
## Sum of the First 100 Natural Numbers
> .center[ `1 + 2 + ... + 100 = 5050 = (100 / 2) * (100 + 1)` ]
... what if Gauß was not as smart but had a bunch of friends?
> .center[ `(1 + 2 + ... + 10) + ... + (91 + 92 + ... + 100)` ]
> .center[ `55 + ... + 955` ]
.right[ → Use a _graph-based_ approach. ]
---
class: top
layout: false
## A Graph-Based Approach:
## A Graph-Based Approach
```python
import dask.array as da
sum_from_1_to_100 = da.linspace(1, 100, 100, chunks=(13, )).sum()
first_100_natural_numbers = da.linspace(1, 100, 100, chunks=(13, ))
sum_of_first_100_natural_numbers = first_100_natural_numbers.sum()
```
<img src="images/dask_gauss_trick_collaborative.svg" width="99%">
......@@ -94,19 +102,17 @@ sum_from_1_to_100 = da.linspace(1, 100, 100, chunks=(13, )).sum()
class: top
layout: false
## A Graph-Based Approach:
## Example — _ Trend = Cov[x, y] / Var[x] _
```python
import dask.array as da
x = da.linspace(1, 100, 1000, chunks=(50, ))
r = (da.random.random(1000, chunks=(50, )) / 2.0 - 1)
y = 5 * x + r
x = da.linspace(0, 1, 1000, chunks=(50, )) # x ∊ [0.0, 1.0]
r = 0.5 * da.random.random(1000, chunks=(50, )) - 1.0 # r ∊ [-0.5, 0.5)
xprime = x - x.mean()
yprime = y - y.mean()
y = 5 * x + r
slope = (xprime * yprime).sum() / (xprime**2).sum()
slope = ((x - x.mean()) * (y - y.mean())).sum() / ((x - x.mean()) ** 2).sum()
```
<img src="images/dask_slope_of_independent_data_thicker.png" width="99%">
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment