README.md 8.31 KB
Newer Older
Willi Rath's avatar
Willi Rath committed
1 2
# data repo renderer

Willi Rath's avatar
Willi Rath committed
3 4
Render data repos from a single YAML file.

Willi Rath's avatar
Willi Rath committed
5
- master:
Willi Rath's avatar
Willi Rath committed
6 7 8
  [![pipeline status](https://git.geomar.de/data/tools/data_repo_renderer/badges/master/pipeline.svg)](https://git.geomar.de/data/tools/data_repo_renderer/commits/master)
  [![coverage report](https://git.geomar.de/data/tools/data_repo_renderer/badges/master/coverage.svg)](https://git.geomar.de/data/tools/data_repo_renderer/commits/master)

9
- develop:
Willi Rath's avatar
Willi Rath committed
10 11
  [![pipeline status](https://git.geomar.de/data/tools/data_repo_renderer/badges/develop/pipeline.svg)](https://git.geomar.de/data/tools/data_repo_renderer/commits/develop)
  [![coverage report](https://git.geomar.de/data/tools/data_repo_renderer/badges/develop/coverage.svg)](https://git.geomar.de/data/tools/data_repo_renderer/commits/develop)
Willi Rath's avatar
Willi Rath committed
12

Willi Rath's avatar
Willi Rath committed
13

14 15
## What is this?

16 17 18 19 20 21 22 23
This is a Python package wich takes a realtively simple YAML file (see examples
in [the inventory/](https://git.geomar.de/data/tools/inventory/)) and  creates
(renders) a full data repository with scripts to download, update, pre- and
post-process, and version control data. The idea is that addind data sets to a
central data base will be easy for a normal user who will only have to fill in
a template YAML file and then either take care of the repository themselves or
submit it via an issue in the [data/docs
project](https://git.geomar.de/data/docs/).
24 25 26

## What to read?

27
- If you just want to ask for the addition of a new data set, have a look at
28 29 30 31 32
  the examples in [the
  inventory/](https://git.geomar.de/data/tools/inventory/).  In particular,
  look at [the HadISST example](input_data/HadISST/) and the corresponding
  [rendered repository](https://git.geomar.de/data/HadISST/), and try to
  provide the relevant information.
33 34 35

- If you wanto to fully maintain an own repository or help developing this
  project, read on.
36

Willi Rath's avatar
Willi Rath committed
37 38
## Installation

39 40
To install the renderer, make sure you have a recent Python3 (tests run
sucessfully with `3.5` at the moment.)
Willi Rath's avatar
Willi Rath committed
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

```bash
cd ~/src/
git clone https://git.geomar.de/data/data_repo_renderer.git
cd data_repo_renderer
pip install -e .
```

See also [setup.py](setup.py).


## Usage

After installation, help can be found with:
```bash
data_repo_renderer -h
```

59
Typically, you will want to use
Willi Rath's avatar
Willi Rath committed
60
```bash
61 62 63
data_repo_renderer \
    --prefix <destination_path> \
    --util <additional_scripts> YAML_FILE
Willi Rath's avatar
Willi Rath committed
64 65
```

Willi Rath's avatar
Willi Rath committed
66
- `<destination_path>` is the path where the repository will be rendered.  If
67 68
  `--prefix` is omitted, the repository will be rendered in `./rendered/`.

69
- `<additional_scripts>` is a path to a directory with additional scripts that
70 71
  will be copied to `<destination_path>/util/`.  This path is meant to hold
  scripts to be called for pre or post processing.
Willi Rath's avatar
Willi Rath committed
72

Willi Rath's avatar
Willi Rath committed
73 74
## A walkthrough

75 76 77 78
**Note** *that this walkthrough may be slightly out of sync with the actual
latest contents of the `meta.yaml` discussed here and hence describes a
slightly different state of <https://git.geomar.de/data/HadISST/>.*

79
This will explain all steps to create <https://git.geomar.de/data/HadISST/>
80 81
from
[../inventory/HadISST/meta.yaml](https://git.geomar.de/data/tools/inventory/HadISST/meta.yaml).
82

Willi Rath's avatar
Willi Rath committed
83
### Configuration file
Willi Rath's avatar
Willi Rath committed
84

85 86 87 88
The configuration file
[HadISST/meta.yaml](https://git.geomar.de/data/tools/inventory/HadISST/meta.yaml)
defines the desired paths to the repository on Geomar's Git server, a
description, and urls for the data files and the documentation:
Willi Rath's avatar
Willi Rath committed
89 90

```yaml
91
repo_name: HadISST
Willi Rath's avatar
Willi Rath committed
92

93
people:  Willi Rath (<wrath@geomar.de>)
Willi Rath's avatar
Willi Rath committed
94

95
http_path_remote: https://git.geomar.de/data/HadISST
96

97
git_path_remote: git@git.geomar.de:data/HadISST.git
Willi Rath's avatar
Willi Rath committed
98 99

repo_description: |
100 101
    Met Office Hadley Centre observations datasets
    <http://www.metoffice.gov.uk/hadobs/hadisst/data/download.html>.
Willi Rath's avatar
Willi Rath committed
102 103 104 105 106

prefixes: data doc

data:

107
    - url: http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_sst.nc.gz
Willi Rath's avatar
Willi Rath committed
108
      prefix: data
109
      file_name: HadISST_sst.nc
Willi Rath's avatar
Willi Rath committed
110 111
      method: !!python/name:data_repo_renderer.CurlSingleFile

112
    - url: http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_ice.nc.gz
Willi Rath's avatar
Willi Rath committed
113
      prefix: data
114
      file_name: HadISST_ice.nc
Willi Rath's avatar
Willi Rath committed
115 116
      method: !!python/name:data_repo_renderer.CurlSingleFile

117
    - url: http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST1_SST_update.nc.gz
Willi Rath's avatar
Willi Rath committed
118
      prefix: data
119
      file_name: HadISST1_SST_update.nc
Willi Rath's avatar
Willi Rath committed
120 121
      method: !!python/name:data_repo_renderer.CurlSingleFile

122
    - url: http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST1_ICE_update.nc.gz
Willi Rath's avatar
Willi Rath committed
123
      prefix: data
124 125
      file_name: HadISST1_ICE_update.nc
      method: !!python/name:data_repo_renderer.CurlSingleFile
126

Willi Rath's avatar
Willi Rath committed
127 128
doc:

129 130
    - url: http://www.metoffice.gov.uk/hadobs/hadisst/data/download.html
      file_name: download.html
Willi Rath's avatar
Willi Rath committed
131 132 133 134 135 136 137
      prefix: doc
      method: !!python/name:data_repo_renderer.CurlSingleFile

```

### Running the renderer

138
With the above configuration file in `HadISST/meta.yaml`, run:
Willi Rath's avatar
Willi Rath committed
139
```bash
140
data_repo_renderer --prefix <path_with_enough_space>/HadISST HadISST/meta.yaml
Willi Rath's avatar
Willi Rath committed
141 142 143 144 145 146
```

### Resulting structure

Rendering will result in:
```
147
<path_with_enough_space>/HadISST
Willi Rath's avatar
Willi Rath committed
148 149 150
├── init.sh
├── meta.yaml
├── README.md
151
└── update.sh
Willi Rath's avatar
Willi Rath committed
152 153
```

154 155 156 157 158
### Creating the remote

The rendered repository will try to use <https://git.geomar.de/data/HadISST/>
(or, better, the SSH version of this repo) as a remote.  So we created the
project and left it empty.
Willi Rath's avatar
Willi Rath committed
159 160 161

### Initialization of the repo

162 163
The `init.sh`, which needs to be run exactly once (after creating the empty
repository on a server) will be:
Willi Rath's avatar
Willi Rath committed
164 165 166 167

```bash
#!/bin/bash

168
# Rendered with data_repo_renderer 0.1.1.dev40+g797d29f.d20170719
Willi Rath's avatar
Willi Rath committed
169 170

git init || exit 1
171
git remote add origin git@git.geomar.de:data/HadISST.git || exit 1
172 173
git config --add lfs.activitytimeout 30

174
```
Willi Rath's avatar
Willi Rath committed
175

176 177 178 179
Running it with
```bash
cd <path_with_enough_space>/HadISST
./init.sh
Willi Rath's avatar
Willi Rath committed
180
```
181 182
will add the remote, perform an initial commit, and push it to the master
branch of <https://git.geomar.de/data/HadISST/>.
Willi Rath's avatar
Willi Rath committed
183 184 185

### Updating the repo

186
To download the data and update the repo, the `update.sh` is created:
Willi Rath's avatar
Willi Rath committed
187 188 189 190

```bash
#!/bin/bash

191
# Rendered with data_repo_renderer 0.1.1.dev40+g797d29f.d20170719
Willi Rath's avatar
Willi Rath committed
192 193 194 195 196 197 198 199 200 201 202

mkdir -p log
exec &> >(tee -a "log/update.log")
date -I'ns'

mkdir -p data doc

git pull
git lfs pull
git lfs track "data/**"

203 204 205 206 207
curl -o "data/HadISST_sst.nc" "http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_sst.nc.gz"
curl -o "data/HadISST_ice.nc" "http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_ice.nc.gz"
curl -o "data/HadISST1_SST_update.nc" "http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST1_SST_update.nc.gz"
curl -o "data/HadISST1_ICE_update.nc" "http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST1_ICE_update.nc.gz"
curl -o "doc/download.html" "http://www.metoffice.gov.uk/hadobs/hadisst/data/download.html"
Willi Rath's avatar
Willi Rath committed
208 209 210 211 212 213 214 215 216

target_branch=`git describe 2> /dev/null`_update_`date +%s%N`
git checkout -b ${target_branch}
git add .
git commit -m "Auto-update data"
git push -u origin ${target_branch}

```

217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245
Run it with:
```bash
./update.sh
```

This will first get the latest version of the master branch of
<https://git.geomar.de/data/HadISST/>, and then download the latest versions of
the data files, commit them in a new branch, and push the updated files to the
server.

### Structure after the update

```
<path_with_enough_space>/HadISST
├── data
│   ├── HadISST1_ICE_update.nc
│   ├── HadISST1_SST_update.nc
│   ├── HadISST_ice.nc
│   └── HadISST_sst.nc
├── doc
│   └── download.html
├── init.sh
├── log
│   └── update.log
├── meta.yaml
├── README.md
└── update.sh
```

Willi Rath's avatar
Willi Rath committed
246 247 248 249
### The resulting README

The resulting `README.md` will be:

250
> # HadISST
Willi Rath's avatar
Willi Rath committed
251
>
252
> People: Willi Rath (<wrath@geomar.de>)
Willi Rath's avatar
Willi Rath committed
253
>
254 255
> Met Office Hadley Centre observations datasets
> <http://www.metoffice.gov.uk/hadobs/hadisst/data/download.html>.
Willi Rath's avatar
Willi Rath committed
256 257
>
>
258
> ## Known problems
Willi Rath's avatar
Willi Rath committed
259
>
260
> - Open and closed issues are here:
261
>   <https://git.geomar.de/data/HadISST/issues?scope=all&state=all>
Willi Rath's avatar
Willi Rath committed
262
>
263
> - Found a problem?  Report it here:
264
>   <https://git.geomar.de/data/HadISST/issues/new>
Willi Rath's avatar
Willi Rath committed
265 266
>
>
267
> ## History
Willi Rath's avatar
Willi Rath committed
268
>
269
> - Download logs are in [log/update.log](log/update.log).
Willi Rath's avatar
Willi Rath committed
270
>
271
> - Also have a look at the
272
>   [activity log](https://git.geomar.de/data/HadISST/activity).
Willi Rath's avatar
Willi Rath committed
273 274
>
>
275
> ## Original Documentation
Willi Rath's avatar
Willi Rath committed
276
>
277
> See [doc/](doc/) for any of the original documentation.
Willi Rath's avatar
Willi Rath committed
278 279
>
>
280
> ## Maintenance
Willi Rath's avatar
Willi Rath committed
281
>
282 283 284 285
> Update with
> ```bash
> update.sh
> ```
Willi Rath's avatar
Willi Rath committed
286
>
287
>
288 289
> For details on the configuration, look at [update.sh](update.sh) and
> [meta.yaml](meta.yaml).
Willi Rath's avatar
Willi Rath committed
290
>
291 292 293
> *Rendered with
> [data_repo_renderer](https://git.geomar.de/data/data_repo_renderer/)
> <version>*
Willi Rath's avatar
Willi Rath committed
294
>