Preprocessing ERA5¶

Please first download the data outlined in the README file.

We begin the preprocessing by starting up dask:

In [ ]:

Copied!

from dask.distributed import Client
client = Client(n_workers=4, threads_per_worker=2)
client
from dask.distributed import Client
client = Client(n_workers=4, threads_per_worker=2)
client

Now we can use the site extraction function to extract the ERA5 data per site per variable.

This preprocessed data is stored in the specified output folder as netCDF files.

If you want to add an extra variable or site, not all the data has to be reprocessed: if files already exist they are skipped. This makes it more efficient to explore the workflow.

In [ ]:

Copied!





from src import generate_training_data
from pathlib import Path

generate_training_data.extract_per_site_era5_data(
    preprocessed_ameriflux_data=Path("/home/bart/Data/EXCITED/NEE_ameriflux_transcom2.nc"),
    era5_data_folder=Path("/media/bart/OS/Data/hourly_era5"),
    output_folder=Path("/home/bart/Data/EXCITED/prep_era5"),
)
from src import generate_training_data
from pathlib import Path

generate_training_data.extract_per_site_era5_data(
    preprocessed_ameriflux_data=Path("/home/bart/Data/EXCITED/NEE_ameriflux_transcom2.nc"),
    era5_data_folder=Path("/media/bart/OS/Data/hourly_era5"),
    output_folder=Path("/home/bart/Data/EXCITED/prep_era5"),
)

After preprocessing all data, you can continue to training the model!