Commit feb3538f authored by Claas Faber's avatar Claas Faber

added *.m files to repository

parent 274e102f
####################################################################
# #
# ------------------------------------------------------------ #
# A COMPARISON OF THE ATLANTIC AND PACIFIC BJERKNES FEEDBACKS: #
# SEASONALITY, SYMMETRY, AND STATIONARITY #
# ------------------------------------------------------------ #
# #
# T. DIPPE, J. F. LÜBBECKE, R. J. GREATBATCH #
# #
# in #
# #
# JOURNAL OF GEOPHYSICAL RESEARCH: OCEANS #
# #
# #
# Readme for using the Matlab scripts #
# #
####################################################################
# --------- #
# TECHNICAL #
# --------- #
Matlab version used for all analysis: Matlab 2018
# ------------- #
# USED DATASETS #
# ------------- #
- AVISO-SSH: https://www.aviso.altimetry.fr/en/home.html
- ERSST: https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v5
- ERA-Interim USTR: https://www.ecmwf.int/en/forecasts/datasets/archive-datasets/reanalysis-datasets/era-interim
- ERA-40 USTR: https://apps.ecmwf.int/datasets/data/era40-moda/
- ORAS4 data: http://icdc.cen.uni-hamburg.de/projekte/easy-init/easy-init-ocean.html
# ------- #
# GENERAL #
# ------- #
# ANALYSIS USING A BUNDLE OF SCRIPTS
# ++++++++++++++++++++++++++++++++++
Many scripts will be made up of a group of three to four scripts. These are:
- <name>.m: The main script that does the principal calculations
- <name>_settings.m: Prescribing all kinds of settings, including paths and
various plotting options
- <name>_plots.m or similar: Plots
- <name>_wrapper.m: A wrapper around the main script that allows looping over,
for example, feedback elements or basins
For a given script <name>, I will refer to the bundle as <name>*.m
# ZONAL ANALYSIS VERSUS INDIVIDUAL INDICES
# ++++++++++++++++++++++++++++++++++++++++
All analyses are based on indices. However, I distinguish between "zonal
analyses" and "individual indices". Datasets for the zonal analyses contain a
matrix that holds time series of indices that correspond to a sliding index
along the equator. The zonal analysis is set up in
zonal_analysis_region_definition.m (for the published paper: 4 deg lon x
4 deg lat boxes that slide along the equator from a fixed western boundary
towards the eastern boundary), an the collection of sliding indices is
calculated with calculate_equatorial_indices*.m. These datasets should be stored
in the <path where data for zonal analysis is stored>.
In contrast, individual indices refer to just a single index. Single indices are
calculated with the function calcIndexAveraged() and should be stored in the
<path were individual indices are stored>.
Additionally, raw SST data might be required, such as ERSST. These datasets
should be stored in the <path where SST data is stored>.
# --------------- #
# LIST OF SCRIPTS #
# --------------- #
# FUNCTIONS
# +++++++++
- bootstrapFeedbackStrength() and bootstrapFeedbackStrengthDiff(): Significance
test for feedback strengths and their differences.
- calcAnomalies(): Calculates the anomalies, and allows for variable detrending
- calcIndexAveraged(): Calculates an index, requires a file that describes the
grids that the variables are given on, including the
spatial extent of each grid cell to apply the correct
spatial averaging. (The grid files are stored in my case
in the file modelGrids.mat, which contains a structure
for each grid, which in turn stores information such as
box_grid_area.)
- calculateFeedback(): Perform a robust regression to estimate the strength of a
given feedback element.
- diagnoseLag(): Diagnose the lag at which the feedback strength between two
variables is maximum.
- eventCharacteristics(): Identifies events in a time series and returns various
statistical measures about them. This function is the
backbone of the SST event analysis in the Discussion.
- testValue(): Given a distribution, a significance level and a test value,
discern whether the test value sits on the "signifi cant" tails of
the distribution.
# (More or less) PROJECT-WIDE SETTINGS
# ++++++++++++++++++++++++++++++++++++
- atlantic_asymmetric_bjf_plot_settings.m contains general settings
for plots that will be called for a number of analysis scripts. Customize the
colorbar handles in this script. However, the script is not used uniformly
across all analysis scripts. Sorry for the inconvenience.
- zonal_analysis_region_definition.m defines the properties of the zonal
analysis, i.e. the total zonal extent of the analysis region, as well as the
width of zonal bins that will be used to calculate the equatorial indices.
# PREPARING THE DATA: Binning data into sliding indices along the equator
# ++++++++++++++++++
- calculate_equatorial_indices*.m: Calculate indices for each zonal bin defined
in zonal_analysis_region_definition.m. These datasets are the basis for most
of the analysis. They should be stored in the <path where data for zonal
analysis is stored>.
Date vectors are expected to be Matlab date vectors (see
https://de.mathworks.com/help/matlab/ref/datevec.html).
# ANALYSIS
# ++++++++
Fig. 1: compare_oras4_ersst*.m
Fig. 2: diagnose_zonal_feedback_lags.*
<Fig. 3: Created in MacOS's Keynote>
Fig. 4-6: lagged_feedback_strength*.m to diagnose the strengths of the feedback
elements, and diagnose_feedback_strength_difference_significance*.m to
test the significance
Fig. 7: total_bjerknes_feedback_from_regression*.m
Fig. 8: compare_aviso_oras4_strengths_Atl3_Nino34*.m
Figs. 9-11: decadal_feedback_variations*.m
Fig. 12: zonal_enso_properties*.m
Fig. 13: decadal_sst_event_variations*.m
Figs. A1-A2: calculate_instantaneous_feedback*.m, and
compare_lagged_instantaneous_feedback_strengths*.m
Figs. A3-A5: total_bjerknes_feedback_from_regression*.m
# ------- #
# CONTACT #
# ------- #
Contact me via tina.dippe@gmail.com if you have questions.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% COMMON SETTINGS FOR PLOTS OF THE ATLANTIC ASYMMETRY ANALYSIS %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% AXES POSITION
% +++++++++++++
% scaling of the figure?
scale_figure_height = 0.8;
scale_figure_width = 1;
% position of the axes
x_axes_offset = 0.02;
y_axes_offset = 0.05;
width_axes_offset = -0.1;
height_axes_offset = -0.06;
% hard-wire axes position
axes_position = [0.13 0.11 0.725 0.7476];
% COLOURS AND COLORBAR
% ++++++++++++++++++++
my_white = [254 254 254]./255;
my_left = [0 76 153]./255;
my_middle = [0 102 204]./255;
my_right = [0 255 255]./255;
my_teal = [0 149 186]./255;
my_light_teal = [153 255 255]./255;
my_dark_teal = [0 47 84]./255;
my_blue = [0 102 204]./255;
my_dark_blue = [0 51 102]./255;
my_red = [255 51 51]./255;
my_yellow = [255 255 51]./255;
my_cyan = [0 255 255]./255;
my_purple = [153 0 153]./255;
% colorbar handles
sunset = <custom colorbar handle for absolute values>;
plasma = <custom colorbar handle for absolute values>;
centered = <custom colorbar handle for differences>;
% colours to mark negative lags
lag_colour = [0.35 0.35 0.35];
lag_border_colour = my_white;
lag_stretch_width = 6;
lag_stretch_border_width = 9;
% HIGHLIGHT REGIONS
% +++++++++++++++++
% which background regions to plot? Define longitudinal extent here
% i.e., mark WAtl, Atl3 in the Atlantic, and Nino4, Nino34, and Nino12
% in the Pacific
if strcmp(basin_name,'Pacific')
% longitudinal boundaries of standard regions
highlight_regions = [...
160 210; ...
190 240; ...
270 280];
% and the names of the regions
region_names = {'Nino4','Nino34','Nino12'};
else
highlight_regions = [...
-40 -20; ...
-20 0];
region_names = {'WAtl','Atl3'};
end
% y values for the patches; width: 0.25 months, gap: 0.2 months, maximum y value: 12.5
if strcmp(basin_name,'Pacific')
highlight_y_values = [11.8 12.05; 12.25 12.5; 11.8 12.05];
else
highlight_y_values = [12.25 12.5; 12.25 12.5];
end
region_colours = [my_teal; my_light_teal; my_dark_teal];
% MISCELLANEOUS
% +++++++++++++
% font size
scale_font_size = 2;
% size of the significance crosses
size_significance_marker = 8;
% line widths for the significance markers
line_width_significance_markers = 0.5;
line_width_significance_outline = 4.5;
% month shorts
month_ids = {'J','F','M','A','M','J','J','A','S','O','N','D'};
% mute warning from robust regression
% query the last warning, including it's ID with
% w = warning('query','last').
warn_id = 'stats:statrobustfit:IterationLimit';
function strength_values = bootstrapFeedbackStrength(data1,data2,sample_size,n_bootstraps,analysis_type)
% Bootstrap two time series and produce a historgram of
% slope parameters of a linear regression.
%
% USAGE: strength_values = bootstrapRegression(data1,data2,sample_size,n_bootstraps)
%
% OUTPUT
% strength_values: bootstrapped slope values
%
% INPUT
% data1: Time series 1, "x-values" or predictors for the regression model
% data2: Time series 2, "y-values" or predictands of the regression model
% sample_size: How many samples should be drawn from the
% full data?
% n_bootstraps: How often should the bootstrapping be repeated?
% analysis_type: Which analysis method should be used to diagnose the
% feedback strength? acc (ACC), reg (regular linear regression,
% that regresses >data2< on >data1<), or rreg (robust
% regression that regresses >data2< on >data1<)
% ----------- %
% PREPARATION %
% ----------- %
% assign an ID to the analysis type
% for use with switch
if strcmp(analysis_type,'acc')
analysis_id = 1;
elseif strcmp(analysis_type,'reg')
analysis_id = 2;
elseif strcmp(analysis_type,'rreg')
analysis_id = 3;
else
error_string = sprintf('Analysis type >%s< not known.\nChoose either of acc, reg, or rreg.',analysis_type);
error(error_string);
end
% --------- %
% BOOTSTRAP %
% --------- %
% total pool of data
n_data = length(data1);
all_indices = 1:n_data;
% storage
strength_values = zeros(1,n_bootstraps);
% actual bootstrapping
for c_strap = 1:n_bootstraps
% randomly draw samples
random_sample = datasample(all_indices,sample_size,'Replace',false);
sample_data1 = data1(random_sample);
sample_data2 = data2(random_sample);
% calculate feedback strengths
switch analysis_id
% correlation
case 1
% acc
current_feedback = corrcoef(sample_data1,sample_data2);
current_feedback = current_feedback(1,2);
% normal linear regression
case 2
regression_parameters = polyfit(sample_data1,sample_data2,1);
current_feedback = regression_parameters(1);
% robust regression
case 3
rregression_parameters = robustfit(sample_data1,sample_data2);
current_feedback = rregression_parameters(2); % yes, the order is different from polyfit ...
% switch: calculate feedback strength
end
% assign regression value
strength_values(c_strap) = current_feedback;
% for: repeat bootstrapping
end
function strength_diffs = bootstrapFeedbackStrengthDiff(data1,data2,sample_size,n_bootstraps,analysis_type)
% Bootstrap tuples of two time series two time series and produce a historgram of
% of differences of the feedback strength diagnosed from either ACC, linear regression
% or robust regression. The differences are calculated
% as follows: Randly sample m tuples from the time series and calculate the slope
% parameter; then use the remaining n-m tuples, again calculate the slope parameter,
% and save the difference between the two. n is total length of the time dataset,
% and m is equivalent with the parameter sample_size.
%
% USAGE: strength_diffs = bootstrapRegression(data1,data2,sample_size,n_bootstraps)
%
% OUTPUT
% strength_diffs: bootstrapped slope difference values
%
% INPUT
% data1: Time series 1, "x-values" or predictors for the regression model
% data2: Time series 2, "y-values" or predictands of the regression model
% sample_size: How many samples should be drawn from the
% full data?
% n_bootstraps: How often should the bootstrapping be repeated?
% analysis_type: Which analysis method should be used to diagnose the
% feedback strength? acc (ACC), reg (regular linear regression,
% that regresses >data2< on >data1<), or rreg (robust
% regression that regresses >data2< on >data1<)
% ----------- %
% PREPARATION %
% ----------- %
% assign an ID to the analysis type
% for use with switch
if strcmp(analysis_type,'acc')
analysis_id = 1;
elseif strcmp(analysis_type,'reg')
analysis_id = 2;
elseif strcmp(analysis_type,'rreg')
analysis_id = 3;
else
error_string = sprintf('Analysis type >%s< not known.\nChoose either of acc, reg, or rreg.',analysis_type);
error(error_string);
end
% --------- %
% BOOTSTRAP %
% --------- %
% total pool of data
n_data = length(data1);
all_indices = 1:n_data;
% storage
strength_diffs = zeros(1,n_bootstraps);
% actual bootstrapping
for c_strap = 1:n_bootstraps
% randomly draw samples, no replacement
random_sample = datasample(all_indices,sample_size,'Replace',false);
% opposite sample
opposite_sample = setdiff(all_indices,random_sample);
% calculate feedback strength differences
switch analysis_id
% correlation
case 1
% acc
random_feedback = corrcoef(data1(random_sample), data2(random_sample));
opposite_feedback = corrcoef(data1(opposite_feedback),data2(opposite_feedback));
feedback_difference = random_feedback(1,2)-opposite_feedback(1,2);
% normal linear regression
case 2
random_pp = polyfit(data1(random_sample),data2(random_sample),1);
opposite_pp = polyfit(data1(opposite_sample),data2(opposite_sample),1);
feedback_difference = random_pp(1)-opposite_pp(1);
% robust regression
case 3
random_rreg = robustfit(data1(random_sample),data2(random_sample));
opposite_rreg = robustfit(data1(opposite_sample),data2(opposite_sample));
feedback_difference = random_rreg(2)-opposite_rreg(2);
% switch: calculate feedback strength
end
% assign regression value
strength_diffs(c_strap) = feedback_difference;
% for: repeat bootstrapping
end
function [anomalies,cropped_dates] = calcAnomalies(data,date_vector,varargin)
% Calculates anomalies relative to a user-specified trend
% and to the climatological seasonal cycle.
%
% USAGE: [anomalies,cropped_dates] = calcAnomalies(data,date_vector)
% [anomalies,cropped_dates] = calcAnomalies(__,trend)
% [anomalies,cropped_dates] = calcAnomalies(__,start_year,stop_year);
% [anomalies,cropped_dates] = calcAnomalies(__,trend,start_year,stop_year);
%
% data and anomalies are vectors. date_vector is a Matlab date vector.
% cropped_dates is the cropped date vector.
%
% Optional parameter are
%
% trend: trend that is to be subtracted from the data
% prior to the removal of the seasonal cycle.
% Vector that has the same length as data.
% Default: When no trend is provided, the linear
% trend is calculated and subtracted from
% the data.
%
% start_year: When should anomaly calculation begin?
%
% stop_year: When should it stop?
%
% IF YOU DON'T WHISH TO CALCULATE A TREND,
% specify a "dummy trend" that consists of zeros and
% has the same size as the data.
% ----------- %
% PARSE INPUT %
% ----------- %
n_additional = length(varargin);
% define defaults:
trend_data = [];
start_year = min(date_vector(:,1));
stop_year = max(date_vector(:,1));
switch n_additional
case 0
case 1
trend_data = varargin{1};
case 2
start_year = varargin{1};
stop_year = varargin{2};
case 3
trend_data = varargin{1};
start_year = varargin{2};
stop_year = varargin{3};
otherwise
error('Wrong amount of input variables specified.');
end
% -------------- %
% CONSTRAIN TIME %
% -------------- %
is_time = ( date_vector(:,1) >= start_year ) & ( date_vector(:,1) <= stop_year );
% cropping
date_vector = date_vector(is_time,:);
data = data(is_time);
% need to crop the trend data as well if it is not empty
if ~isempty(trend_data)
trend_data = trend_data(is_time);
end
% for returning
cropped_dates = date_vector;
% use datenums as x-axis
x_values = datenum(date_vector);
% ------------------ %
% DATA QUALITY CHECK %
% ------------------ %
is_okay = ~(isnan(data));
% anything else but NaNs?
if ( isempty(data(is_okay)) )
anomalies = nan(size(data));
return
end
% ------------ %
% DETREND DATA %
% ------------ %
if isempty(trend_data)
% disp('calculating linear trend');
% need to have NaN-free data for regression
valid_data = data(is_okay);
valid_times = x_values(is_okay);
% fitting: use only good values
vals_poly = polyfit(valid_times,valid_data,1);
% evaluation: use ALL values
trend_data = polyval(vals_poly,x_values);
% trend data compatible with data?
else
if ( length(data) ~= length(trend_data) )
error('Data and trend data have different sizes.');
end
end
% assign anomalies relative to the trend to
% the correct piece of monthlyAnomalies
detrended = data - trend_data;
% ------------------------------------ %
% REMOVE CLIMATOLOGICAL SEASONAL CYCLE %
% ------------------------------------ %
anomalies = zeros(size(data));
seasonal_cycle = zeros(1,12);
for month = 1:12
% timing
is_month = ( date_vector(:,2) == month );
% climatology
climatology = nanmean(detrended(is_month));
% assign anomalies
anomalies(is_month) = detrended(is_month)-climatology;
seasonal_cycle(month) = nanmean(data(is_month));
end
function index = calcIndexAveraged(monthlyAnomalies,latGrid,lonGrid,gridArea,latBoundaries,lonBoundaries)
% Calculates a climate index as area average over an area with given boundaries.
%
% Usage: index = calcIndexAveraged(monthlyAnomalies,latGrid,lonGrid,gridArea,latBoundaries,lonBoundaries)
% index: Time series of area averages (the "climate index").
% Same length as time dimension of monthlyAnomalies.
% monthlyAnomalies: Raw data. Monthly anomalies.
% Dimensions: lon x lat x time
% latGrid: Latitude specification for monthlyAnomalies.
% Dimensions: lon x lat
% lonGrid: Same as latGrid, but for longitude.
% gridArea: Indicates the area of the individual grid boxes.
% Dimensions: lon x lat
% latBoundaries: Vector of length 2 that specifies the latitude boundaries of the area.
% latBoundaries(1): Minimum latitude
% latBoundaries(2): Maximum latitude
% lonBoundaries: Same as latBoundaries but for longitude.
% -------- %
% CHECKING %
% -------- %
% Grid Sizes
if ( ( size(monthlyAnomalies,1) ~= size(latGrid,1) ) | ...
( size(monthlyAnomalies,1) ~= size(lonGrid,1) ) | ...
( size(monthlyAnomalies,2) ~= size(latGrid,2) ) | ...
( size(monthlyAnomalies,2) ~= size(lonGrid,2) ) )
error('Dimension between either of latGrid, lonGrid, and monthlyAnomalies!');
end
% Longitude format: grid
if ( max(max(lonGrid)) > 180 )
% disp('Reformatting: Maximum grid longitude > 180');
lonGrid(lonGrid > 180) = lonGrid(lonGrid > 180) - 360;
end
% Longitude format: boundaries
if ( max(lonBoundaries) > 180 )
% disp('Reformatting: Maximum boundary longitude > 180');
lonBoundaries(lonBoundaries > 180) = lonBoundaries(lonBoundaries > 180) - 360;
end
% ------------------------------------------------ %
% FIND GRID POINTS CORRESPONDING TO THE GIVEN AREA %
% ------------------------------------------------ %
% work with logical arrays the sime size as the data
isLat = ( ( latGrid >= latBoundaries(1) ) & ( latGrid <= latBoundaries(2) ) );
% does the longitudinal specification cross the date line?
if ( ( lonBoundaries(1) > lonBoundaries(2) ) & ...
( sign(lonBoundaries(1)) == 1 ) & ( sign(lonBoundaries(2)) == -1 ) )
% warning('Specified longitude boundaries cross the date line.');
isLon_west = ( ( lonGrid >= lonBoundaries(1) ) & ( lonGrid <= 180 ) );
isLon_east = ( ( lonGrid >= -180 ) & ( lonGrid <= lonBoundaries(2) ) );
isLon = ( isLon_west | isLon_east );
% no crossing: first entry is smaller than the second one
else
isLon = ( ( lonGrid >= lonBoundaries(1) ) & ( lonGrid <= lonBoundaries(2) ) );
end
% --------------- %
% CALCULATE INDEX %
% --------------- %
% AREA MASK
% initialize the mask:
mask = ones(size(monthlyAnomalies));
% multiply the (3D) mask with the logical array for the area:
mask = bsxfun(@times,mask,(isLat & isLon));
% set zeros to NaN:
mask(mask == 0) = NaN;
% mask anomalies:
maskedAnomalies = monthlyAnomalies.*mask;
% THE INDEX
% total area of the region: Not a scalar but already of the same length as the
% resulting time series --> can calculate the index time series
% in one calculation step
totalArea = ones(size(monthlyAnomalies,3),1).*sum(gridArea(isLat & isLon));
% multiply anomalies with grid box area:
weighted = bsxfun(@times,maskedAnomalies,gridArea);
% THE INDEX:
% area average: sum of the weighted anomalies divided by the total area
index = squeeze(nansum(nansum(weighted,1),2))./totalArea;
function feedback_strength = calculateFeedback(forcing,response,analysis_type)
% Calculate the "feedback strength" (co-variability) between two variables
%
% SYNTAX:
%
% feedback_strength = calculateFeedback(forcing,response,analysis_type)
%
% OUTPUT:
% feedback_strength: Scalar, "feedback strength" between the >forcing< and
% and >response< datasets, based on the chosen >analysis_type<.
%
%
% INPUT:
% forcing: Vector, forcing dataset ("forcing" of the "feedback").
% response: Vector of the same size as >forcing<, "response" of the "feedback"
% analysis_type: Char, specifies how the "feedback" strength is diagnosed.
% Either of
% "acc": Anomaly correlation, not suitable for very short datasets
% "reg": reguar linear regression, where >response< is regressed on >forcing<
% "rreg": robust lineat regression, where >response< is regressed on >forcing<
% --------------- %
% CHECK ARGUMENTS %
% --------------- %
% do the datasets have the same size?
size_forcing = size(forcing);
size_response = size(response);
same_size = sum(~( size_forcing == size_response ));
if same_size ~= 0