Analyses¶
An analysis module in ilamb3 is a function that takes in a benchmarking and model dataset, performs mathematical operations between the two, and outputs a dataframe and datasets with results that can be used to build tables and produce plot visualizations.
An analysis can be general, or it can be specialized to use particular variables, regions, product types (gridded and/or point), or dimensions (time, space, etc.). ILAMB employs both. For example, the bias and rmse analysis functions are general and can be applied to a wide range datasets, while the runoff sensitivity and nbp analysis functions are specialized and require specific variables. Further, functions can be simple or robust. A simple function is prone to breakage, while a robust function contains logic that handles various input data and edge cases. The process of writing an analysis function follows a natural progression:
Write a simple analysis function that performs the mathematical operation you want to implement
Test the function on a variety of datasets to ensure it works before making it more robust
Enhance the analysis function to be more robust to handle differences in dataset structure
Integrate the analysis function into
ilamb3Demonstrate how to use the analysis function in an
ilamb3benchmarking study
Before you start¶
Before getting started, you should (1) clone the ilamb3 software from the GitHub repository and set up the environment in development mode, and (2) have in mind an analysis objective that does not already exist in ilamb3. For this tutorial, our objective is to write an analysis function that returns the absolute relative error between a variable in the reference and comparison datasets.
We will use ref and com respectively as abbreviations and variable names throughout this tutorial.
Write a simple analysis function¶
Analysis functions most often follow a general prototype:
def my_analysis_func(ref: xr.Dataset, com: xr.Dataset) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]:This says that the analysis function should:
Take two
xr.Datasetobjects as inputs, one as a reference (ref) and another as a comparison (com).Return a
pd.DataFramethat will contain any scalar or score values generated by the analysis. The dataframe can have as many rows/scalars as you like, but theilamb3system will expect specific columns, which we will explain below.Return two
xr.Datasetobjects that will contain any intermediate datasets with information that you wish to plot. If your analysis does not have such datasets, you can return emptyxr.Datasetobjects.
Below is a simple implementation of our absolute relative error analysis function that follows this prototype. We will later learn how to generalize this function to be more flexible and robust.
import numpy as np
import pandas as pd
import xarray as xr
def absrelerr(ref: xr.Dataset, com: xr.Dataset, varname: str) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]:
# calculate mean absolute relative error
epsilon = (np.abs(ref[varname] - com[varname]) / np.abs(ref[varname])).mean()
# build the output dataframe
df = pd.DataFrame(
[
{
"source": "Comparison",
"region": "None",
"analysis": "Relative Error",
"name": "Mean Absolute Relative Error",
"type": "scalar",
"units": "1",
"value": float(epsilon),
}
]
)
return df, xr.Dataset(), xr.Dataset()We added a keyword argument, varname, to specify the variable on which to perform the analysis. If the user doesn’t provide a variable name, we would have to hard-code a variable in the function. Next, we calculated the absolute relative error on that variable. Then, we built a dataframe that contains information that ILAMB requires. The dataframe should have the following columns:
source: The data source that the scalar is referencing: either theReferenceorComparisondataset. If your scalar is representative of both datasets, you should chooseComparisonas thesourcebecause inherently the scalar is meant to evaluate the performance of the comparison dataset against the reference dataset.region: The region label over which the scalar applies. In general,ilamb3analyses can be performed over multiple regions. For now, we will set this asNoneto indicate no region is specified.analysis: The name of the analysis. There can be multiple scalars per analysis, but this column should be the same for all scalars that belong to the same analysis. In this case, we call itRelative Error.name: The name of the scalar generated for this analysis. The name should be descriptive, such asMean Absolute Relative Error, because it will appear in the scalars table displayed in theilamb3dataset pages.type: The type of scalar being reported, eitherscalarorscore.A scalar is any statistic you compute; they could represent errors, deviations, means, etc. It can be any value that you think is important to report, and it can be in any units. They will be displayed in the scalars table on the dataset pages.
A score is a synthesis of scalars to make it easier to compare performance across models and datasets. They are dimensionaless and values range from 0 to 1 where 1 is considered perfect. If you provide scores, ILAMB will use them to produce an additional “Overall Score.”
units: The units of the scalar, which in this case is dimensionless.value: The value of the scalar.
Test the simple analysis function¶
Before we continue, it is a good idea to test that the function returns something sensible. You may have data available on your machine that you would like to use for testing, but here we will generate synthetic reference and comparison datasets using a function from our testing suite. This is a good practice as these tests can be later integrated into our suite of test functions to protect your new functionality.
from ilamb3.tests.test_run import generate_test_dset
ref = generate_test_dset(nyear=2, nlat=10, nlon=20, name="tas", unit="degC")
com = generate_test_dset(nyear=2, nlat=10, nlon=20, name="tas", unit="degC", seed=2)
df, _, _ = absrelerr(ref, com, "tas")
dfThe returned dataframe contains a single row with the relative error between the reference and comparison datasets. The value may seem high, but our test data is coarse and random, so this is to be expected. We will look at more comprehensive validation later.
Enhance the analysis function to be more robust¶
You may have already noticed some potential points of failure in this function. For example, by subtracting com[varname] from ref[varname], we assume that the input datasets are indexed identically. In practice, the reference and comparison datasets are not structured the same. Here are differences we often see:
Both datasets contain a
timedimension but use different calendars.The temporal coverage of each dataset is different. For example, it is not meaningful to report relative error between observations that start in 1980’s and model data that start in the 1850’s.
The datasets are in different units.
The datasets may both be gridded but call their latitude and longitude dimensions different names (i.e.
lonversuslongitudeorLongitude).Both datasets are global, but their grids (i.e., latitude and longitude values) do not align with each other. Even if they are off by a tiny margin,
xarraywill treat them as different and yieldnanresults.
You may anticipate many other potential weak points in the code. However, ilamb3 provides some functions that we can use to handle such issues. Primarily, you can find these functions inside two ilamb3 package modules:
ilamb3.datasetcontains functions that operate on a single dataset.ilamb3.comparecontains functions which embed logic for comparing two or more datasets.
To use these functions, import them from the ilamb3 package:
import ilamb3.compare as cmp
import ilamb3.dataset as dsetBelow, we incorporate 5 one-liner functions from these modules to make the analysis function more robust to errors or unexpected behavior. The functions we incorporated tackle the 5 issues we listed above, plus some additional pitfalls from our simple function.
# drop non-overlapping time and ensure units/coords are aligned
ref, com = cmp.make_comparable(ref, com, varname)
# ensure a time dimension exists
if dset.is_temporal(ref[varname]):
# handle possible calendar mismatch by calculating integrated time means
ref[varname] = dset.integrate_time(ref, varname, mean=True)
if dset.is_temporal(com[varname]):
com[varname] = dset.integrate_time(com, varname, mean=True)
# nest the spatial grids as explained in Collier, et al., JAMES, 2018
ref, com = cmp.nest_spatial_grids(ref, com)
# compute epsilon per gridcell
eps_gridded = np.abs(ref[varname] - com[varname]) / np.abs(ref[varname]).clip(min=1e-10)
# compute an area-weighted mean of the per-gridcell epsilon values
epsilon = dset.integrate_space(eps_gridded, varname, mean=True)make_comparableis a catch-all comparison function that we use in many of our analysis functions. It removes non-overlapping time periods from the datasets, ensures matching units, ensures matching longitude intervals (either [-180, 180] or [0, 360]), as well as other things.is_temporalreturns True if the time dimension is present while handling variations in the name of the time dimension (i.e.timevsTime) between datasets.integrate_timeintegrates away the time dimension by taking a mean. This circumvents the need to handle calendar differences. By taking this mean, we imply that we are less interested in any temporal differences, which should be noted in the docstring of our new analysis routine.nest_spatial_gridsspatially aligns the datasets using the methodology outlines in Collier, et al. 2018 to handle potential resolution differences. You could also opt to interpolate the grids to a common resolution.integrate_spacewithmean=Truereturns the area-weighted mean of per-gridcell epsilon values. Now we have both a spatial version calledeps_griddedthat we can use to plot spatial maps, as well as a scalar version calledepsilonthat we can report in the dataframe. This is an example of how you can return intermediate datasets to be plotted, as well as scalars to be reported in the dataframe.We add a call to
clip()to theeps_griddedcomputation and specify a small minimum value to avoid division by zero errors.
Our additions have introduced one more potential bug: we integrate over space, but one or both of the datasets could be a site collections rather than gridded (e.g., the Fluxnet collection). While we could add logic to the function that handles point data, let’s rather restrict this function to only work on gridded datasets. The robust analysis routine should detect and flag when a method isn’t appropriate for the input data and raise an error called ilamb3.exceptions.AnalysisNotAppropriate. Let’s add the following lines to the top of the function:
if dset.is_site(reference[varname]) or dset.is_site(comparison[varname]):
raise AnalysisNotAppropriate()Expand the cell below to see all the changes we have made thus far. The difficult part of writing an analysis function is now complete. We have a function that performs the mathematical operation we want, and it is robust to various input data issues. The next step is to integrate this function into ilamb3 so that it can be used in benchmarking studies.
Notebook Cell
import numpy as np
import pandas as pd
import xarray as xr
import ilamb3.compare as cmp
import ilamb3.dataset as dset
from ilamb3.exceptions import AnalysisNotAppropriate
def absrelerr(
ref: xr.Dataset, com: xr.Dataset, varname: str
) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]:
# limit analysis to gridded data only
if dset.is_site(ref[varname]) or dset.is_site(com[varname]):
raise AnalysisNotAppropriate()
# drop non-overlapping time and ensure units/coords are aligned
ref, com = cmp.make_comparable(ref, com, varname)
# ensure a time dimension exists
if dset.is_temporal(ref[varname]):
# handle possible calendar mismatch by calculating integrated time means
ref[varname] = dset.integrate_time(ref, varname, mean=True)
if dset.is_temporal(com[varname]):
com[varname] = dset.integrate_time(com, varname, mean=True)
# nest the spatial grids as explained in Collier, et al., JAMES, 2018
ref, com = cmp.nest_spatial_grids(ref, com)
# compute epsilon per gridcell
eps_gridded = np.abs(ref[varname] - com[varname]) / np.abs(ref[varname]).clip(min=1e-10)
# compute an area-weighted mean of the per-gridcell epsilon values
epsilon = dset.integrate_space(eps_gridded, varname, mean=True)
# build up the output dataframe
df = pd.DataFrame(
[
{
"source": "Comparison",
"region": "None",
"analysis": "Relative Error",
"name": "Mean Absolute Relative Error",
"type": "scalar",
"units": "1",
"value": float(epsilon),
}
]
)
return df, xr.Dataset(), xr.Dataset()Integrating into ilamb3¶
To use this function in ilamb3, it must be incorporated into a Python class that follows a specific template. This is because of two main reasons:
ilamb3needs to know beforehand what data will be used for the analysis. Our particular analysis needs to communicate thatvarnamemust be provided in order to execute.To call analyses consistently in
ilamb3, they should always have the same signature, i.e., requiring a reference and comparison dataset. But this does not leave room for extra parameters that may be important, likevarname, so we need to move it elsewhere.
To this end, analyses are not merely functions, but rather an abstract base class (ABC), which we call an ILAMBAnalysis. Lets change the function we wrote above into an ILAMBAnalysis and see how it works. If you are unfamiliar with classes in python, review this introductory tutorial.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61from pathlib import Path import numpy as np import pandas as pd import xarray as xr import ilamb3.compare as cmp import ilamb3.dataset as dset from ilamb3.analysis.base import ILAMBAnalysis # import the ABC from ilamb from ilamb3.exceptions import AnalysisNotAppropriate class absrelerror(ILAMBAnalysis): # define a class that inherits from the ABC def __init__(self, required_variable: str, **kwargs: Any): self.varname = required_variable def __call__( self, ref: xr.Dataset, com: xr.Dataset, ) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]: varname = self.varname # limit analysis to gridded data only if dset.is_site(ref[varname]) or dset.is_site(com[varname]): raise AnalysisNotAppropriate() # drop non-overlapping time and ensure units/coords are aligned ref, com = cmp.make_comparable(ref, com, varname) # ensure a time dimension exists if dset.is_temporal(ref[varname]): # handle possible calendar mismatch by calculating integrated time means ref[varname] = dset.integrate_time(ref, varname, mean=True) if dset.is_temporal(com[varname]): com[varname] = dset.integrate_time(com, varname, mean=True) # nest the spatial grids as explained in Collier, et al., JAMES, 2018 ref, com = cmp.nest_spatial_grids(ref, com) # compute epsilon per gridcell eps_gridded = np.abs(ref[varname] - com[varname]) / np.abs(ref[varname]).clip(min=1e-10) # compute an area-weighted mean of the per-gridcell epsilon values epsilon = dset.integrate_space(eps_gridded, varname, mean=True) # build up the output dataframe df = pd.DataFrame( [ { "source": "Comparison", "region": "None", "analysis": "Relative Error", "name": "Mean Absolute Relative Error", "type": "scalar", "units": "1", "value": float(epsilon), } ] ) return df, xr.Dataset(), xr.Dataset()
Lines that were changed have been highlighted for emphasis. In words, the above code snippet makes the following changes:
We have imported the abstract base class
ILAMBAnalysisand madeabsrelerrora class that inherits from it.We created a member function named
__init__and moved ourvarnameargument to it.Python has a number of double under (a.k.a dunder) methods that have special behavior. The__init__method is called when an instance of the class is created, and it allows us to initialize the instance’s attributes. Also note that we have renamedvarnametorequired_variableas this is whatilamb3will pass into the constructor.Then, we created a member function named
__call__, and we put the content from our function inside. The reference and comparison datasets are passed as arguments to this method---all analysis__call__methods follow this pattern. The__call__method will allow us to call an instance of theabsrelerrorclass as if it were a function.
At the moment, it does not yet seem like we have accomplished much by this change. However, if you try to create an instance of this class we have built,
my_analysis = absrelerror("tas")you should see that Python throws an error:
TypeError: Can't instantiate abstract class absrelerror without an implementation for abstract methods 'name', 'plots', 'required_variables'The ILAMBAnalysis base class defines a template for all analysis functions. If you don’t implement all of the methods we need, you can’t even create an instance. You will see that the error message is complaining about not having an implementation for two functions. The first of these is name which should return the name of the analysis. This is a human friendly name that will appear in the output data dashboard HTML pages.
def name(self) -> str:
return "Relative Error"The second of these functions is plots, a function that generates plots for the analysis. For the moment, add the following implementation that returns an empty dataframe with the proper columns, and we will return to this later.
def plots(
self, df: pd.DataFrame, ref: xr.Dataset, com: dict[str, xr.Dataset], path: Path
) -> pd.DataFrame:
return pd.DataFrame(columns=["name","source","region","analysis","path"])The last of these is required_variables, a function that returns a list of variables that the analysis needs in the input datasets. So we add another member function that returns varname in a list. ilamb3 will use this internally to query model data and prepare the input datasets for the analysis.
def required_variables(self):
return [self.varname]At this point your analysis function is ready to be integrated into ilamb3. If you would like to see the full code with highlighted changes, expand the following cell.
Notebook Cell
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72from pathlib import Path import numpy as np import pandas as pd import xarray as xr import ilamb3.compare as cmp import ilamb3.dataset as dset from ilamb3.analysis.base import ILAMBAnalysis # import the ABC from ilamb from ilamb3.exceptions import AnalysisNotAppropriate class absrelerror(ILAMBAnalysis): # define a class that inherits from the ABC def __init__(self, required_variable: str, **kwargs: Any): self.varname = required_variable def __call__( self, ref: xr.Dataset, com: xr.Dataset, ) -> tuple[pd.DataFrame, xr.Dataset, xr.Dataset]: varname = self.varname # limit analysis to gridded data only if dset.is_site(ref[varname]) or dset.is_site(com[varname]): raise AnalysisNotAppropriate() # drop non-overlapping time and ensure units/coords are aligned ref, com = cmp.make_comparable(ref, com, varname) # ensure a time dimension exists if dset.is_temporal(ref[varname]): # handle possible calendar mismatch by calculating integrated time means ref[varname] = dset.integrate_time(ref, varname, mean=True) if dset.is_temporal(com[varname]): com[varname] = dset.integrate_time(com, varname, mean=True) # nest the spatial grids as explained in Collier, et al., JAMES, 2018 ref, com = cmp.nest_spatial_grids(ref, com) # compute epsilon per gridcell eps_gridded = np.abs(ref[varname] - com[varname]) / np.abs(ref[varname]).clip(min=1e-10) # compute an area-weighted mean of the per-gridcell epsilon values epsilon = dset.integrate_space(eps_gridded, varname, mean=True) # build up the output dataframe df = pd.DataFrame( [ { "source": "Comparison", "region": "None", "analysis": self.name(), "name": "Mean Absolute Relative Error", "type": "scalar", "units": "1", "value": float(epsilon), } ] ) return df, xr.Dataset(), xr.Dataset() def name(self) -> str: return "Relative Error" def required_variables(self) -> list[str]: return [self.varname] def plots( self, df: pd.DataFrame, ref: xr.Dataset, com: dict[str, xr.Dataset], path: Path ) -> pd.DataFrame: return pd.DataFrame(columns=["name","source","region","analysis","path"])
Making ilamb3 aware of your analysis function¶
When adding a analysis function, you can work in your own directory until you are ready to register it with the ilamb3 system. To do this you will first need to clone the repository:
git clone git@github.com:rubisco-sfa/ilamb3.git
cd ilamb3and then move the python file with your analysis function in it into the ilamb3/analysis directory. If you have been working in a file named my_function_file.py, then:
mv my_function_file.py ilamb3/analysis/This will cause ilamb3 to automatically detect the new analysis function and use it when running. We can test this by creating a configure file that instead of running the default analyses, runs just the new function we added. You will need to create a configure file that looks like this:
WECANN-1-0:
analyses:
- absrelerror
sources:
gpp: WECANN-1-0/obs4MIPs_ColumbiaU_WECANN-1-0_mon_gpp_gn_v20260302.nc
variable_cmap: Greens
You will need some model data and a CSV file database as explained in the quickstart guide. Here we are reusing the CanESM5.csv database we describe there. Then we can run the analysis using the ilamb run command:
ilamb run my_analysis_test.yaml --model-db CanESM5.csv --output-dir my_analysis_test2026-05-14 23:55:22.403 | INFO | ilamb3.run:run_single_block:324 - Start of WECANN-1-0 | CanESM5
You should see the previous informational output lines when you run your command. If an error occurs, you will see the stack trace explaining the error. If the code ran successfully, you should be able to see the output files in the my_analysis_test directory:
ls -1 ./my_analysis_test/WECANN-1-0/CanESM5.csv
CanESM5.log
CanESM5.nc
Reference.nc
WECANN-1-0.html
post.log
At the moment, our analysis function does not do very much. To verify that your function is correct, you could open the HTML file in that directory in a web browser. Alternatively, you could use the cat command to view the contents of the CSV file:
cat ./my_analysis_test/WECANN-1-0/CanESM5.csvsource,region,analysis,name,type,units,value
CanESM5,None,Relative Error,Mean Absolute Relative Error [1],scalar,1,0.6108188406836633
- Collier, N., Hoffman, F. M., Lawrence, D. M., Keppel‐Aleks, G., Koven, C. D., Riley, W. J., Mu, M., & Randerson, J. T. (2018). The International Land Model Benchmarking (ILAMB) System: Design, Theory, and Implementation. Journal of Advances in Modeling Earth Systems, 10(11), 2731–2754. 10.1029/2018ms001354