Hourly dataset production¶
With the model(s) from train_fluxnet_models.ipynb we can now produce the hourly datasets for GPP/NEE/respiration.
To do so, you select the model folder corresponding to the trained model and specify to which output directory the results should be written to.
In [ ]:
Copied!
from dask.distributed import Client
from excited_workflow.produce_fluxnet_dataset import produce_dataset
from pathlib import Path
client = Client(n_workers=2, threads_per_worker=2)
output_dir = Path("/data/volume_2/produced_models/test")
produce_dataset( # takes ~40 minutes for 5 years of data
model_dir="/data/volume_2/trained_models/fluxnet_gpp-lightgbm-2024-02-23_09_55",
output_dir=output_dir,
)
from dask.distributed import Client
from excited_workflow.produce_fluxnet_dataset import produce_dataset
from pathlib import Path
client = Client(n_workers=2, threads_per_worker=2)
output_dir = Path("/data/volume_2/produced_models/test")
produce_dataset( # takes ~40 minutes for 5 years of data
model_dir="/data/volume_2/trained_models/fluxnet_gpp-lightgbm-2024-02-23_09_55",
output_dir=output_dir,
)
To demonstrate the results, we can open the resulting (multi-file) dataset and plot the GPP for one timestep:
In [4]:
Copied!
import xarray as xr
ds = xr.open_mfdataset(output_dir.glob("*.nc"))
ds["GPP_NT_VUT_REF"].isel(time=3000).plot()
import xarray as xr
ds = xr.open_mfdataset(output_dir.glob("*.nc"))
ds["GPP_NT_VUT_REF"].isel(time=3000).plot()
Out[4]:
<matplotlib.collections.QuadMesh at 0x7f61fafe5180>
The time series (daily average) for a single location looks like the following:
In [4]:
Copied!
one_loc = ds["GPP_NT_VUT_REF"].sel(latitude=40, longitude=-100, method="nearest")
one_loc.resample(time="1D").mean().plot()
one_loc = ds["GPP_NT_VUT_REF"].sel(latitude=40, longitude=-100, method="nearest")
one_loc.resample(time="1D").mean().plot()
Out[4]:
[<matplotlib.lines.Line2D at 0x7ff5257c97e0>]
We can also study the average diurnal cycle at the same location:
In [5]:
Copied!
one_loc.groupby(one_loc["time"].dt.hour).mean().plot()
one_loc.groupby(one_loc["time"].dt.hour).mean().plot()
Out[5]:
[<matplotlib.lines.Line2D at 0x7ff50adc77c0>]
In [ ]:
Copied!