Commit feb3538f authored by Claas Faber's avatar Claas Faber
Browse files

added *.m files to repository

parent 274e102f
Loading
Loading
Loading
Loading

README.txt

0 → 100644
+177 −0
Original line number Diff line number Diff line
####################################################################
#                                                                  #
#   ------------------------------------------------------------   #
#   A COMPARISON OF THE ATLANTIC AND PACIFIC BJERKNES FEEDBACKS:   #
#   SEASONALITY, SYMMETRY, AND STATIONARITY                        #
#   ------------------------------------------------------------   #
#                                                                  #
#   T. DIPPE, J. F. LÜBBECKE, R. J. GREATBATCH                     #
#                                                                  #
#   in                                                             #
#                                                                  #
#   JOURNAL OF GEOPHYSICAL RESEARCH: OCEANS                        #
#                                                                  #
#                                                                  #
#   Readme for using the Matlab scripts                            #
#                                                                  #
####################################################################

# --------- #
# TECHNICAL #
# --------- #

Matlab version used for all analysis: Matlab 2018


# ------------- #
# USED DATASETS #
# ------------- #

- AVISO-SSH:        https://www.aviso.altimetry.fr/en/home.html

- ERSST:            https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v5

- ERA-Interim USTR: https://www.ecmwf.int/en/forecasts/datasets/archive-datasets/reanalysis-datasets/era-interim

- ERA-40 USTR:      https://apps.ecmwf.int/datasets/data/era40-moda/

- ORAS4 data:       http://icdc.cen.uni-hamburg.de/projekte/easy-init/easy-init-ocean.html


# ------- #
# GENERAL #
# ------- #

# ANALYSIS USING A BUNDLE OF SCRIPTS
# ++++++++++++++++++++++++++++++++++

Many scripts will be made up of a group of three to four scripts. These are:

- <name>.m: The main script that does the principal calculations

- <name>_settings.m: Prescribing all kinds of settings, including paths and
  various plotting options

- <name>_plots.m or similar: Plots

- <name>_wrapper.m: A wrapper around the main script that allows looping over,
  for example, feedback elements or basins

For a given script <name>, I will refer to the bundle as <name>*.m


# ZONAL ANALYSIS VERSUS INDIVIDUAL INDICES
# ++++++++++++++++++++++++++++++++++++++++

All analyses are based on indices. However, I distinguish between "zonal
analyses" and "individual indices". Datasets for the zonal analyses contain a
matrix that holds time series of indices that correspond to a sliding index
along the equator. The zonal analysis is set up in
zonal_analysis_region_definition.m (for the published paper: 4 deg lon x
4 deg lat boxes that slide along the equator from a fixed western boundary
towards the eastern boundary), an the collection of sliding indices is
calculated with calculate_equatorial_indices*.m. These datasets should be stored
in the <path where data for zonal analysis is stored>.

In contrast, individual indices refer to just a single index. Single indices are
calculated with the function calcIndexAveraged() and should be stored in the
<path were individual indices are stored>.

Additionally, raw SST data might be required, such as ERSST. These datasets
should be stored in the <path where SST data is stored>.


# --------------- #
# LIST OF SCRIPTS #
# --------------- #

# FUNCTIONS
# +++++++++

- bootstrapFeedbackStrength() and bootstrapFeedbackStrengthDiff(): Significance
               test for feedback strengths and their differences.

- calcAnomalies(): Calculates the anomalies, and allows for variable detrending

- calcIndexAveraged(): Calculates an index, requires a file that describes the
                       grids that the variables are given on, including the
                       spatial extent of each grid cell to apply the correct
                       spatial averaging. (The grid files are stored in my case
                       in the file modelGrids.mat, which contains a structure
                       for each grid, which in turn stores information such as
                       box_grid_area.)

- calculateFeedback(): Perform a robust regression to estimate the strength of a
                       given feedback element.

- diagnoseLag(): Diagnose the lag at which the feedback strength between two
                 variables is maximum.

- eventCharacteristics(): Identifies events in a time series and returns various
                          statistical measures about them. This function is the
                          backbone of the SST event analysis in the Discussion.

- testValue(): Given a distribution, a significance level and a test value,
               discern whether the test value sits on the "significant" tails of
               the distribution.


# (More or less) PROJECT-WIDE SETTINGS
# ++++++++++++++++++++++++++++++++++++

- atlantic_asymmetric_bjf_plot_settings.m contains general settings
  for plots that will be called for a number of analysis scripts. Customize the
  colorbar handles in this script. However, the script is not used uniformly
  across all analysis scripts. Sorry for the inconvenience.

- zonal_analysis_region_definition.m defines the properties of the zonal
  analysis, i.e. the total zonal extent of the analysis region, as well as the
  width of zonal bins that will be used to calculate the equatorial indices.


# PREPARING THE DATA: Binning data into sliding indices along the equator
# ++++++++++++++++++

- calculate_equatorial_indices*.m: Calculate indices for each zonal bin defined
  in zonal_analysis_region_definition.m. These datasets are the basis for most
  of the analysis. They should be stored in the <path where data for zonal
  analysis is stored>.

Date vectors are expected to be Matlab date vectors (see
https://de.mathworks.com/help/matlab/ref/datevec.html).


# ANALYSIS
# ++++++++

Fig. 1: compare_oras4_ersst*.m

Fig. 2: diagnose_zonal_feedback_lags.*

<Fig. 3: Created in MacOS's Keynote>

Fig. 4-6: lagged_feedback_strength*.m to diagnose the strengths of the feedback
          elements, and diagnose_feedback_strength_difference_significance*.m to
          test the significance

Fig. 7: total_bjerknes_feedback_from_regression*.m

Fig. 8: compare_aviso_oras4_strengths_Atl3_Nino34*.m

Figs. 9-11: decadal_feedback_variations*.m

Fig. 12: zonal_enso_properties*.m

Fig. 13: decadal_sst_event_variations*.m

Figs. A1-A2: calculate_instantaneous_feedback*.m, and
             compare_lagged_instantaneous_feedback_strengths*.m

Figs. A3-A5: total_bjerknes_feedback_from_regression*.m


# ------- #
# CONTACT #
# ------- #

Contact me via tina.dippe@gmail.com if you have questions.
+105 −0
Original line number Diff line number Diff line
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                                                                    %
%    COMMON SETTINGS FOR PLOTS OF THE ATLANTIC ASYMMETRY ANALYSIS    %
%                                                                    %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% AXES POSITION
% +++++++++++++

    % scaling of the figure?
    scale_figure_height = 0.8;
    scale_figure_width  = 1;

    % position of the axes
    x_axes_offset      = 0.02;
    y_axes_offset      = 0.05;
    width_axes_offset  = -0.1;
    height_axes_offset = -0.06;

    % hard-wire axes position
    axes_position = [0.13 0.11 0.725 0.7476];

% COLOURS AND COLORBAR
% ++++++++++++++++++++

    my_white  = [254 254 254]./255;
    my_left   = [0 76 153]./255;
    my_middle = [0 102 204]./255;
    my_right  = [0 255 255]./255;

    my_teal       = [0 149 186]./255;
    my_light_teal = [153 255 255]./255;
    my_dark_teal  = [0 47 84]./255;

    my_blue      = [0 102 204]./255;
    my_dark_blue = [0 51 102]./255;
    my_red       = [255 51 51]./255;
    my_yellow    = [255 255 51]./255;
    my_cyan      = [0 255 255]./255;
    my_purple    = [153 0 153]./255;

    % colorbar handles
    sunset   = <custom colorbar handle for absolute values>;
    plasma   = <custom colorbar handle for absolute values>;
    centered = <custom colorbar handle for differences>;

    % colours to mark negative lags
    lag_colour        = [0.35 0.35 0.35];
    lag_border_colour = my_white;
    lag_stretch_width        = 6;
    lag_stretch_border_width = 9;

% HIGHLIGHT REGIONS
% +++++++++++++++++

    % which background regions to plot? Define longitudinal extent here
    % i.e., mark WAtl, Atl3 in the Atlantic, and Nino4, Nino34, and Nino12
    % in the Pacific
    if strcmp(basin_name,'Pacific')

        % longitudinal boundaries of standard regions
        highlight_regions = [...
                160 210; ...
                190 240; ...
                270 280];

        % and the names of the regions
        region_names = {'Nino4','Nino34','Nino12'};

    else
        highlight_regions = [...
                -40 -20; ...
                -20 0];
        region_names = {'WAtl','Atl3'};
    end

    % y values for the patches; width: 0.25 months, gap: 0.2 months, maximum y value: 12.5
    if strcmp(basin_name,'Pacific')
        highlight_y_values = [11.8 12.05; 12.25 12.5; 11.8 12.05];
    else
        highlight_y_values = [12.25 12.5; 12.25 12.5];
    end

    region_colours = [my_teal; my_light_teal; my_dark_teal];

% MISCELLANEOUS
% +++++++++++++

    % font size
    scale_font_size = 2;

    % size of the significance crosses
    size_significance_marker = 8;

    % line widths for the significance markers
    line_width_significance_markers = 0.5;
    line_width_significance_outline = 4.5;

    % month shorts
    month_ids = {'J','F','M','A','M','J','J','A','S','O','N','D'};

    % mute warning from robust regression
    % query the last warning, including it's ID with
    % w = warning('query','last').
    warn_id = 'stats:statrobustfit:IterationLimit';
+88 −0
Original line number Diff line number Diff line
function strength_values = bootstrapFeedbackStrength(data1,data2,sample_size,n_bootstraps,analysis_type)

% Bootstrap two time series and produce a historgram of
% slope parameters of a linear regression.
%
% USAGE: strength_values = bootstrapRegression(data1,data2,sample_size,n_bootstraps)
%
%     OUTPUT
%         strength_values: bootstrapped slope values
%
%     INPUT
%         data1:         Time series 1, "x-values" or predictors for the regression model
%         data2:         Time series 2, "y-values" or predictands of the regression model
%         sample_size:   How many samples should be drawn from the
%                        full data?
%         n_bootstraps:  How often should the bootstrapping be repeated?
%         analysis_type: Which analysis method should be used to diagnose the
%                        feedback strength? acc (ACC), reg (regular linear regression,
%                        that regresses >data2< on >data1<), or rreg (robust
%                        regression that regresses >data2< on >data1<)

% ----------- %
% PREPARATION %
% ----------- %

    % assign an ID to the analysis type
    % for use with switch
    if strcmp(analysis_type,'acc')
        analysis_id  = 1;
    elseif strcmp(analysis_type,'reg')
        analysis_id  = 2;
    elseif strcmp(analysis_type,'rreg')
        analysis_id  = 3;
    else
        error_string = sprintf('Analysis type >%s< not known.\nChoose either of acc, reg, or rreg.',analysis_type);
        error(error_string);
    end

% --------- %
% BOOTSTRAP %
% --------- %

    % total pool of data
    n_data       = length(data1);
    all_indices  = 1:n_data;

    % storage
    strength_values = zeros(1,n_bootstraps);

    % actual bootstrapping
    for c_strap = 1:n_bootstraps

        % randomly draw samples
        random_sample = datasample(all_indices,sample_size,'Replace',false);

        sample_data1 = data1(random_sample);
        sample_data2 = data2(random_sample);

        % calculate feedback strengths
        switch analysis_id

            % correlation
            case 1

                % acc
                current_feedback = corrcoef(sample_data1,sample_data2);
                current_feedback = current_feedback(1,2);

            % normal linear regression
            case 2

                regression_parameters = polyfit(sample_data1,sample_data2,1);
                current_feedback = regression_parameters(1);

            % robust regression
            case 3

                rregression_parameters = robustfit(sample_data1,sample_data2);
                current_feedback = rregression_parameters(2);  % yes, the order is different from polyfit ...

        % switch: calculate feedback strength
        end

        % assign regression value
        strength_values(c_strap) = current_feedback;

    % for: repeat bootstrapping
    end 
+99 −0
Original line number Diff line number Diff line
function strength_diffs = bootstrapFeedbackStrengthDiff(data1,data2,sample_size,n_bootstraps,analysis_type)

% Bootstrap tuples of two time series two time series and produce a historgram of
% of differences of the feedback strength diagnosed from either ACC, linear regression
% or robust regression. The differences are calculated
% as follows: Randly sample m tuples from the time series and calculate the slope
% parameter; then use the remaining n-m tuples, again calculate the slope parameter,
% and save the difference between the two. n is total length of the time dataset,
% and m is equivalent with the parameter sample_size.
%
% USAGE: strength_diffs = bootstrapRegression(data1,data2,sample_size,n_bootstraps)
%
%     OUTPUT
%         strength_diffs: bootstrapped slope difference values
%
%     INPUT
%         data1:         Time series 1, "x-values" or predictors for the regression model
%         data2:         Time series 2, "y-values" or predictands of the regression model
%         sample_size:   How many samples should be drawn from the
%                        full data?
%         n_bootstraps:  How often should the bootstrapping be repeated?
%         analysis_type: Which analysis method should be used to diagnose the
%                        feedback strength? acc (ACC), reg (regular linear regression,
%                        that regresses >data2< on >data1<), or rreg (robust
%                        regression that regresses >data2< on >data1<)

% ----------- %
% PREPARATION %
% ----------- %

    % assign an ID to the analysis type
    % for use with switch
    if strcmp(analysis_type,'acc')
        analysis_id  = 1;
    elseif strcmp(analysis_type,'reg')
        analysis_id  = 2;
    elseif strcmp(analysis_type,'rreg')
        analysis_id  = 3;
    else
        error_string = sprintf('Analysis type >%s< not known.\nChoose either of acc, reg, or rreg.',analysis_type);
        error(error_string);
    end

% --------- %
% BOOTSTRAP %
% --------- %

    % total pool of data
    n_data       = length(data1);
    all_indices  = 1:n_data;

    % storage
    strength_diffs = zeros(1,n_bootstraps);

    % actual bootstrapping
    for c_strap = 1:n_bootstraps

        % randomly draw samples, no replacement
        random_sample = datasample(all_indices,sample_size,'Replace',false);

        % opposite sample
        opposite_sample = setdiff(all_indices,random_sample);

        % calculate feedback strength differences
        switch analysis_id

            % correlation
            case 1

                % acc
                random_feedback   = corrcoef(data1(random_sample),    data2(random_sample));
                opposite_feedback = corrcoef(data1(opposite_feedback),data2(opposite_feedback));

                feedback_difference = random_feedback(1,2)-opposite_feedback(1,2);

            % normal linear regression
            case 2

                random_pp   = polyfit(data1(random_sample),data2(random_sample),1);
                opposite_pp = polyfit(data1(opposite_sample),data2(opposite_sample),1);

                feedback_difference = random_pp(1)-opposite_pp(1);

            % robust regression
            case 3

                random_rreg   = robustfit(data1(random_sample),data2(random_sample));
                opposite_rreg = robustfit(data1(opposite_sample),data2(opposite_sample));

                feedback_difference = random_rreg(2)-opposite_rreg(2);

        % switch: calculate feedback strength
        end

        % assign regression value
        strength_diffs(c_strap) = feedback_difference;

    % for: repeat bootstrapping
    end

calcAnomalies.m

0 → 100644
+139 −0
Original line number Diff line number Diff line
function [anomalies,cropped_dates] = calcAnomalies(data,date_vector,varargin)

% Calculates anomalies relative to a user-specified trend
% and to the climatological seasonal cycle.
%
% USAGE: [anomalies,cropped_dates] = calcAnomalies(data,date_vector)
%        [anomalies,cropped_dates] = calcAnomalies(__,trend)
%        [anomalies,cropped_dates] = calcAnomalies(__,start_year,stop_year);
%        [anomalies,cropped_dates] = calcAnomalies(__,trend,start_year,stop_year);
%
%        data and anomalies are vectors. date_vector is a Matlab date vector.
%        cropped_dates is the cropped date vector.
%
% Optional parameter are
%
%        trend:      trend that is to be subtracted from the data
%                    prior to the removal of the seasonal cycle.
%                    Vector that has the same length as data.
%                    Default: When no trend is provided, the linear
%                    trend is calculated and subtracted from
%                    the data.
%
%        start_year: When should anomaly calculation begin?
%
%        stop_year:  When should it stop?
%
% IF YOU DON'T WHISH TO CALCULATE A TREND,
% specify a "dummy trend" that consists of zeros and
% has the same size as the data.

% ----------- %
% PARSE INPUT %
% ----------- %

    n_additional = length(varargin);

    % define defaults:
        trend_data = [];
        start_year = min(date_vector(:,1));
        stop_year  = max(date_vector(:,1));

    switch n_additional
        case 0
        case 1
            trend_data = varargin{1};
        case 2
            start_year = varargin{1};
            stop_year  = varargin{2};
        case 3
            trend_data = varargin{1};
            start_year = varargin{2};
            stop_year  = varargin{3};
        otherwise
            error('Wrong amount of input variables specified.');
    end

% -------------- %
% CONSTRAIN TIME %
% -------------- %

    is_time =  ( date_vector(:,1) >= start_year ) & ( date_vector(:,1) <= stop_year );

    % cropping
    date_vector = date_vector(is_time,:);
    data        = data(is_time);

    % need to crop the trend data as well if it is not empty
    if ~isempty(trend_data)
        trend_data = trend_data(is_time);
    end

    % for returning
    cropped_dates = date_vector;

    % use datenums as x-axis
    x_values = datenum(date_vector);

% ------------------ %
% DATA QUALITY CHECK %
% ------------------ %

    is_okay  = ~(isnan(data));

    % anything else but NaNs?
    if ( isempty(data(is_okay)) )
        anomalies = nan(size(data));
        return
    end

% ------------ %
% DETREND DATA %
% ------------ %

    if isempty(trend_data)

%          disp('calculating linear trend');

        % need to have NaN-free data for regression
        valid_data  = data(is_okay);
        valid_times = x_values(is_okay);

        % fitting: use only good values
        vals_poly = polyfit(valid_times,valid_data,1);

        % evaluation: use ALL values
        trend_data = polyval(vals_poly,x_values);

    % trend data compatible with data?
    else
        if ( length(data) ~= length(trend_data) )
            error('Data and trend data have different sizes.');
        end
    end

    % assign anomalies relative to the trend to
    % the correct piece of monthlyAnomalies
    detrended = data - trend_data;

% ------------------------------------ %
% REMOVE CLIMATOLOGICAL SEASONAL CYCLE %
% ------------------------------------ %

    anomalies = zeros(size(data));
    seasonal_cycle = zeros(1,12);

    for month = 1:12

        % timing
        is_month = ( date_vector(:,2) == month );


        % climatology
        climatology = nanmean(detrended(is_month));

        % assign anomalies
        anomalies(is_month)   = detrended(is_month)-climatology;
        seasonal_cycle(month) = nanmean(data(is_month));

    end
Loading