# Leave this cell as we need the analysis function to be copied for the run to work later
from pathlib import Path
from importlib import resources
with resources.path("ilamb3") as path:
    func = "my_analysis_function.py"
    dest_path = path / "analysis" / func
    dest_path.unlink(missing_ok=True)
    src_path = Path("assets") / func
    try:
        src_path.copy(dest_path)
    except FileExistsError:
        pass

Analyses¶

An analysis module in ilamb3 is a function that takes in a benchmarking and model dataset, performs mathematical operations between the two, and outputs a dataframe and datasets with results that can be used to build tables and produce plot visualizations.

An analysis can be general, or it can be specialized to use particular variables, regions, product types (gridded and/or point), or dimensions (time, space, etc.). ILAMB employs both. For example, the bias and rmse analysis functions are general and can be applied to a wide range datasets, while the runoff sensitivity and nbp analysis functions are specialized and require specific variables. Further, functions can be simple or robust. A simple function is prone to breakage, while a robust function contains logic that handles various input data and edge cases. The process of writing an analysis function follows a natural progression:

Write a simple analysis function that performs the mathematical operation you want to implement
Test the function on a variety of datasets to ensure it works before making it more robust
Enhance the analysis function to be more robust to handle differences in dataset structure
Integrate the analysis function into ilamb3
Demonstrate how to use the analysis function in an ilamb3 benchmarking study

Before you start¶

Before getting started, you should (1) clone the ilamb3 software from the GitHub repository and set up the environment in development mode, and (2) have in mind an analysis objective that does not already exist in ilamb3. For this tutorial, our objective is to write an analysis function that returns the absolute relative error between a variable in the reference and comparison datasets.

\varepsilon = \frac{|v_{\mathrm{com}} - v_{\mathrm{ref}}|}{|v_{\mathrm{ref}}|}.

(1)

We will use ref and com respectively as abbreviations and variable names throughout this tutorial.

Write a simple analysis function¶

Analysis functions most often follow a general prototype:

def my_analysis_func(ref: xr.Dataset, com: xr.Dataset) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]:

This says that the analysis function should:

Take two xr.Dataset objects as inputs, one as a reference (ref) and another as a comparison (com).
Return a pd.DataFrame that will contain any scalar or score values generated by the analysis. The dataframe can have as many rows/scalars as you like, but the ilamb3 system will expect specific columns, which we will explain below.
Return two xr.Dataset objects that will contain any intermediate datasets with information that you wish to plot. If your analysis does not have such datasets, you can return empty xr.Dataset objects.

Below is a simple implementation of our absolute relative error analysis function that follows this prototype. We will later learn how to generalize this function to be more flexible and robust.

import numpy as np
import pandas as pd
import xarray as xr

def absrelerr(ref: xr.Dataset, com: xr.Dataset, varname: str) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]:

    # calculate mean absolute relative error
    epsilon = (np.abs(ref[varname] - com[varname]) / np.abs(ref[varname])).mean()

    # build the output dataframe
    df = pd.DataFrame(
        [
            {
                "source": "Comparison",
                "region": "None",
                "analysis": "Relative Error",
                "name": "Mean Absolute Relative Error",
                "type": "scalar",
                "units": "1",
                "value": float(epsilon),
            }
        ]
    )

    return df, xr.Dataset(), xr.Dataset()

We added a keyword argument, varname, to specify the variable on which to perform the analysis. If the user doesn’t provide a variable name, we would have to hard-code a variable in the function. Next, we calculated the absolute relative error on that variable. Then, we built a dataframe that contains information that ILAMB requires. The dataframe should have the following columns:

source: The data source that the scalar is referencing: either the Reference or Comparison dataset. If your scalar is representative of both datasets, you should choose Comparison as the source because inherently the scalar is meant to evaluate the performance of the comparison dataset against the reference dataset.
region: The region label over which the scalar applies. In general, ilamb3 analyses can be performed over multiple regions. For now, we will set this as None to indicate no region is specified.
analysis: The name of the analysis. There can be multiple scalars per analysis, but this column should be the same for all scalars that belong to the same analysis. In this case, we call it Relative Error.
name: The name of the scalar generated for this analysis. The name should be descriptive, such as Mean Absolute Relative Error, because it will appear in the scalars table displayed in the ilamb3 dataset pages.
type: The type of scalar being reported, either scalar or score.
- A scalar is any statistic you compute; they could represent errors, deviations, means, etc. It can be any value that you think is important to report, and it can be in any units. They will be displayed in the scalars table on the dataset pages.
- A score is a synthesis of scalars to make it easier to compare performance across models and datasets. They are dimensionaless and values range from 0 to 1 where 1 is considered perfect. If you provide scores, ILAMB will use them to produce an additional “Overall Score.”
units: The units of the scalar, which in this case is dimensionless.
value: The value of the scalar.

Test the simple analysis function¶

Before we continue, it is a good idea to test that the function returns something sensible. You may have data available on your machine that you would like to use for testing, but here we will generate synthetic reference and comparison datasets using a function from our testing suite. This is a good practice as these tests can be later integrated into our suite of test functions to protect your new functionality.

from ilamb3.tests.test_run import generate_test_dset

ref = generate_test_dset(nyear=2, nlat=10, nlon=20, name="tas", unit="degC")
com = generate_test_dset(nyear=2, nlat=10, nlon=20, name="tas", unit="degC", seed=2)

df, _, _ = absrelerr(ref, com, "tas")
df

The returned dataframe contains a single row with the relative error between the reference and comparison datasets. The value may seem high, but our test data is coarse and random, so this is to be expected. We will look at more comprehensive validation later.

Enhance the analysis function to be more robust¶

You may have already noticed some potential points of failure in this function. For example, by subtracting com[varname] from ref[varname], we assume that the input datasets are indexed identically. In practice, the reference and comparison datasets are not structured the same. Here are differences we often see:

Both datasets contain a time dimension but use different calendars.
The temporal coverage of each dataset is different. For example, it is not meaningful to report relative error between observations that start in 1980’s and model data that start in the 1850’s.
The datasets are in different units.
The datasets may both be gridded but call their latitude and longitude dimensions different names (i.e. lon versus longitude or Longitude).
Both datasets are global, but their grids (i.e., latitude and longitude values) do not align with each other. Even if they are off by a tiny margin, xarray will treat them as different and yield nan results.

You may anticipate many other potential weak points in the code. However, ilamb3 provides some functions that we can use to handle such issues. Primarily, you can find these functions inside two ilamb3 package modules:

ilamb3.dataset contains functions that operate on a single dataset.
ilamb3.compare contains functions which embed logic for comparing two or more datasets.

To use these functions, import them from the ilamb3 package:

import ilamb3.compare as cmp
import ilamb3.dataset as dset

Below, we incorporate 5 one-liner functions from these modules to make the analysis function more robust to errors or unexpected behavior. The functions we incorporated tackle the 5 issues we listed above, plus some additional pitfalls from our simple function.

# drop non-overlapping time and ensure units/coords are aligned
ref, com = cmp.make_comparable(ref, com, varname)

# ensure a time dimension exists
if dset.is_temporal(ref[varname]):
    # handle possible calendar mismatch by calculating integrated time means
    ref[varname] = dset.integrate_time(ref, varname, mean=True)
if dset.is_temporal(com[varname]):
    com[varname] = dset.integrate_time(com, varname, mean=True)

# nest the spatial grids as explained in Collier, et al., JAMES, 2018
ref, com = cmp.nest_spatial_grids(ref, com)

# compute epsilon per gridcell
eps_gridded = np.abs(ref[varname] - com[varname]) / np.abs(ref[varname]).clip(min=1e-10)

# compute an area-weighted mean of the per-gridcell epsilon values
epsilon = dset.integrate_space(eps_gridded, varname, mean=True)

make_comparable is a catch-all comparison function that we use in many of our analysis functions. It removes non-overlapping time periods from the datasets, ensures matching units, ensures matching longitude intervals (either [-180, 180] or [0, 360]), as well as other things.
is_temporal returns True if the time dimension is present while handling variations in the name of the time dimension (i.e. time vs Time) between datasets.
integrate_time integrates away the time dimension by taking a mean. This circumvents the need to handle calendar differences. By taking this mean, we imply that we are less interested in any temporal differences, which should be noted in the docstring of our new analysis routine.
nest_spatial_grids spatially aligns the datasets using the methodology outlines in Collier, et al. 2018 to handle potential resolution differences. You could also opt to interpolate the grids to a common resolution.
integrate_space with mean=True returns the area-weighted mean of per-gridcell epsilon values. Now we have both a spatial version called eps_gridded that we can use to plot spatial maps, as well as a scalar version called epsilon that we can report in the dataframe. This is an example of how you can return intermediate datasets to be plotted, as well as scalars to be reported in the dataframe.
We add a call to clip() to the eps_gridded computation and specify a small minimum value to avoid division by zero errors.

Our additions have introduced one more potential bug: we integrate over space, but one or both of the datasets could be a site collections rather than gridded (e.g., the Fluxnet collection). While we could add logic to the function that handles point data, let’s rather restrict this function to only work on gridded datasets. The robust analysis routine should detect and flag when a method isn’t appropriate for the input data and raise an error called ilamb3.exceptions.AnalysisNotAppropriate. Let’s add the following lines to the top of the function:

if dset.is_site(reference[varname]) or dset.is_site(comparison[varname]):
    raise AnalysisNotAppropriate()

Expand the cell below to see all the changes we have made thus far. The difficult part of writing an analysis function is now complete. We have a function that performs the mathematical operation we want, and it is robust to various input data issues. The next step is to integrate this function into ilamb3 so that it can be used in benchmarking studies.

import numpy as np
import pandas as pd
import xarray as xr

import ilamb3.compare as cmp
import ilamb3.dataset as dset
from ilamb3.exceptions import AnalysisNotAppropriate

def absrelerr(
    ref: xr.Dataset, com: xr.Dataset, varname: str
) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]:

    # limit analysis to gridded data only
    if dset.is_site(ref[varname]) or dset.is_site(com[varname]):
        raise AnalysisNotAppropriate()

    # drop non-overlapping time and ensure units/coords are aligned
    ref, com = cmp.make_comparable(ref, com, varname)

    # ensure a time dimension exists
    if dset.is_temporal(ref[varname]):
        # handle possible calendar mismatch by calculating integrated time means
        ref[varname] = dset.integrate_time(ref, varname, mean=True)
    if dset.is_temporal(com[varname]):
        com[varname] = dset.integrate_time(com, varname, mean=True)

    # nest the spatial grids as explained in Collier, et al., JAMES, 2018
    ref, com = cmp.nest_spatial_grids(ref, com)

    # compute epsilon per gridcell
    eps_gridded = np.abs(ref[varname] - com[varname]) / np.abs(ref[varname]).clip(min=1e-10)

    # compute an area-weighted mean of the per-gridcell epsilon values
    epsilon = dset.integrate_space(eps_gridded, varname, mean=True)

    # build up the output dataframe
    df = pd.DataFrame(
        [
            {
                "source": "Comparison",
                "region": "None",
                "analysis": "Relative Error",
                "name": "Mean Absolute Relative Error",
                "type": "scalar",
                "units": "1",
                "value": float(epsilon),
            }
        ]
    )

    return df, xr.Dataset(), xr.Dataset()

Integrating into `ilamb3`¶

To use this function in ilamb3, it must be incorporated into a Python class that follows a specific template. This is because of two main reasons:

ilamb3 needs to know beforehand what data will be used for the analysis. Our particular analysis needs to communicate that varname must be provided in order to execute.
To call analyses consistently in ilamb3, they should always have the same signature, i.e., requiring a reference and comparison dataset. But this does not leave room for extra parameters that may be important, like varname, so we need to move it elsewhere.

To this end, analyses are not merely functions, but rather an abstract base class (ABC), which we call an ILAMBAnalysis. Lets change the function we wrote above into an ILAMBAnalysis and see how it works. If you are unfamiliar with classes in python, review this introductory tutorial.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
from pathlib import Path

import numpy as np
import pandas as pd
import xarray as xr

import ilamb3.compare as cmp
import ilamb3.dataset as dset
from ilamb3.analysis.base import ILAMBAnalysis  # import the ABC from ilamb
from ilamb3.exceptions import AnalysisNotAppropriate

class absrelerror(ILAMBAnalysis):  # define a class that inherits from the ABC
    def __init__(self, required_variable: str, **kwargs: Any):
        self.varname = required_variable

    def __call__(
        self,
        ref: xr.Dataset,
        com: xr.Dataset,
    ) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]:
        varname = self.varname

        # limit analysis to gridded data only
        if dset.is_site(ref[varname]) or dset.is_site(com[varname]):
            raise AnalysisNotAppropriate()

        # drop non-overlapping time and ensure units/coords are aligned
        ref, com = cmp.make_comparable(ref, com, varname)

        # ensure a time dimension exists
        if dset.is_temporal(ref[varname]):
            # handle possible calendar mismatch by calculating integrated time means
            ref[varname] = dset.integrate_time(ref, varname, mean=True)
        if dset.is_temporal(com[varname]):
            com[varname] = dset.integrate_time(com, varname, mean=True)

        # nest the spatial grids as explained in Collier, et al., JAMES, 2018
        ref, com = cmp.nest_spatial_grids(ref, com)

        # compute epsilon per gridcell
        eps_gridded = np.abs(ref[varname] - com[varname]) / np.abs(ref[varname]).clip(min=1e-10)

        # compute an area-weighted mean of the per-gridcell epsilon values
        epsilon = dset.integrate_space(eps_gridded, varname, mean=True)

        # build up the output dataframe
        df = pd.DataFrame(
            [
                {
                    "source": "Comparison",
                    "region": "None",
                    "analysis": "Relative Error",
                    "name": "Mean Absolute Relative Error",
                    "type": "scalar",
                    "units": "1",
                    "value": float(epsilon),
                }
            ]
        )

        return df, xr.Dataset(), xr.Dataset()

Lines that were changed have been highlighted for emphasis. In words, the above code snippet makes the following changes:

We have imported the abstract base class ILAMBAnalysis and made absrelerror a class that inherits from it.
We created a member function named __init__ and moved our varname argument to it.Python has a number of double under (a.k.a dunder) methods that have special behavior. The __init__ method is called when an instance of the class is created, and it allows us to initialize the instance’s attributes. Also note that we have renamed varname to required_variable as this is what ilamb3 will pass into the constructor.
Then, we created a member function named __call__, and we put the content from our function inside. The reference and comparison datasets are passed as arguments to this method---all analysis __call__ methods follow this pattern. The __call__ method will allow us to call an instance of the absrelerror class as if it were a function.

At the moment, it does not yet seem like we have accomplished much by this change. However, if you try to create an instance of this class we have built,

my_analysis = absrelerror("tas")

you should see that Python throws an error:

TypeError: Can't instantiate abstract class absrelerror without an implementation for abstract methods 'name', 'plots', 'required_variables'

The ILAMBAnalysis base class defines a template for all analysis functions. If you don’t implement all of the methods we need, you can’t even create an instance. You will see that the error message is complaining about not having an implementation for two functions. The first of these is name which should return the name of the analysis. This is a human friendly name that will appear in the output data dashboard HTML pages.

    def name(self) -> str:
        return "Relative Error"

The second of these functions is plots, a function that generates plots for the analysis. For the moment, add the following implementation that returns an empty dataframe with the proper columns, and we will return to this later.

    def plots(
        self, df: pd.DataFrame, ref: xr.Dataset, com: dict[str, xr.Dataset], path: Path
    ) -> pd.DataFrame:
        return pd.DataFrame(columns=["name","source","region","analysis","path"])

The last of these is required_variables, a function that returns a list of variables that the analysis needs in the input datasets. So we add another member function that returns varname in a list. ilamb3 will use this internally to query model data and prepare the input datasets for the analysis.

    def required_variables(self):
        return [self.varname]

At this point your analysis function is ready to be integrated into ilamb3. If you would like to see the full code with highlighted changes, expand the following cell.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
from pathlib import Path

import numpy as np
import pandas as pd
import xarray as xr

import ilamb3.compare as cmp
import ilamb3.dataset as dset
from ilamb3.analysis.base import ILAMBAnalysis  # import the ABC from ilamb
from ilamb3.exceptions import AnalysisNotAppropriate

class absrelerror(ILAMBAnalysis):  # define a class that inherits from the ABC
    def __init__(self, required_variable: str, **kwargs: Any):
        self.varname = required_variable

    def __call__(
        self,
        ref: xr.Dataset,
        com: xr.Dataset,
    ) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]:
        varname = self.varname

        # limit analysis to gridded data only
        if dset.is_site(ref[varname]) or dset.is_site(com[varname]):
            raise AnalysisNotAppropriate()

        # drop non-overlapping time and ensure units/coords are aligned
        ref, com = cmp.make_comparable(ref, com, varname)

        # ensure a time dimension exists
        if dset.is_temporal(ref[varname]):
            # handle possible calendar mismatch by calculating integrated time means
            ref[varname] = dset.integrate_time(ref, varname, mean=True)
        if dset.is_temporal(com[varname]):
            com[varname] = dset.integrate_time(com, varname, mean=True)

        # nest the spatial grids as explained in Collier, et al., JAMES, 2018
        ref, com = cmp.nest_spatial_grids(ref, com)

        # compute epsilon per gridcell
        eps_gridded = np.abs(ref[varname] - com[varname]) / np.abs(ref[varname]).clip(min=1e-10)

        # compute an area-weighted mean of the per-gridcell epsilon values
        epsilon = dset.integrate_space(eps_gridded, varname, mean=True)

        # build up the output dataframe
        df = pd.DataFrame(
            [
                {
                    "source": "Comparison",
                    "region": "None",
                    "analysis": self.name(),
                    "name": "Mean Absolute Relative Error",
                    "type": "scalar",
                    "units": "1",
                    "value": float(epsilon),
                }
            ]
        )

        return df, xr.Dataset(), xr.Dataset()

    def name(self) -> str:
        return "Relative Error"

    def required_variables(self) -> list[str]:
        return [self.varname]

    def plots(
        self, df: pd.DataFrame, ref: xr.Dataset, com: dict[str, xr.Dataset], path: Path
    ) -> pd.DataFrame:
        return pd.DataFrame(columns=["name","source","region","analysis","path"])

Making `ilamb3` aware of your analysis function¶

When adding a analysis function, you can work in your own directory until you are ready to register it with the ilamb3 system. To do this you will first need to clone the repository:

git clone git@github.com:rubisco-sfa/ilamb3.git
cd ilamb3

and then move the python file with your analysis function in it into the ilamb3/analysis directory. If you have been working in a file named my_function_file.py, then:

mv my_function_file.py ilamb3/analysis/

This will cause ilamb3 to automatically detect the new analysis function and use it when running. We can test this by creating a configure file that instead of running the default analyses, runs just the new function we added. You will need to create a configure file that looks like this:

my_analysis_test.yaml

WECANN-1-0:
  analyses:
  - absrelerror
  sources:
    gpp: WECANN-1-0/obs4MIPs_ColumbiaU_WECANN-1-0_mon_gpp_gn_v20260302.nc
  variable_cmap: Greens

You will need some model data and a CSV file database as explained in the quickstart guide. Here we are reusing the CanESM5.csv database we describe there. Then we can run the analysis using the ilamb run command:

ilamb run my_analysis_test.yaml --model-db CanESM5.csv --output-dir my_analysis_test

2026-05-14 23:55:22.403 | INFO     | ilamb3.run:run_single_block:324 - Start of WECANN-1-0 | CanESM5

You should see the previous informational output lines when you run your command. If an error occurs, you will see the stack trace explaining the error. If the code ran successfully, you should be able to see the output files in the my_analysis_test directory:

ls -1 ./my_analysis_test/WECANN-1-0/

CanESM5.csv
CanESM5.log
CanESM5.nc
Reference.nc
WECANN-1-0.html
post.log

At the moment, our analysis function does not do very much. To verify that your function is correct, you could open the HTML file in that directory in a web browser. Alternatively, you could use the cat command to view the contents of the CSV file:

cat ./my_analysis_test/WECANN-1-0/CanESM5.csv

source,region,analysis,name,type,units,value
CanESM5,None,Relative Error,Mean Absolute Relative Error [1],scalar,1,0.6108188406836633

References¶

Collier, N., Hoffman, F. M., Lawrence, D. M., Keppel‐Aleks, G., Koven, C. D., Riley, W. J., Mu, M., & Randerson, J. T. (2018). The International Land Model Benchmarking (ILAMB) System: Design, Theory, and Implementation. Journal of Advances in Modeling Earth Systems, 10(11), 2731–2754. 10.1029/2018ms001354

Analyses