Forecast in a Box
Model Selection
This is a demo of using qubed to select from a set of forecast models that each produce a set of output variables.
First let’s construct some models represented as qubes:
from qubed import Qube
model_1 = Qube.from_datacube({
"levtype": "pl",
"param" : ["q", "t", "u", "v", "w", "z"],
"level" : [100, 200, 300, 400, 50, 850, 500, 150, 600, 250, 700, 925, 1000],
}) | Qube.from_datacube({
"levtype": "sfc",
"param" : ["10u", "10v", "2d", "2t", "cp", "msl", "skt", "sp", "tcw", "tp"],
})
model_1 = "model=1" / ("frequency=6h" / model_1)
model_1
root, model=1, frequency=6h
├── levtype=pl, param=q/t/u/v/w/z, level=50/100/150/200/250/300/400/500/600/700/850/9...└── levtype=sfc, param=10u/10v/2d/2t/cp/msl/skt/sp/tcw/tp
This is the most complete model. Now let’s do one with fewer variables and levels:
model_2 = Qube.from_datacube({
"levtype": "pl",
"param" : ["q", "t"],
"level" : [100, 200, 300, 400, 50, 850, 500, 150, 600, 250, 700, 925, 1000],
}) | Qube.from_datacube({
"levtype": "sfc",
"param" : ["2t", "cp", "msl"],
})
model_2 = "model=2" / ("frequency=continuous" / model_2)
model_3 = Qube.from_datacube({
"levtype": "pl",
"param" : ["q", "t"],
"level" : [100, 200, 300, 400, 50, 850, 500, 150, 600, 250, 700, 925, 1000],
}) | Qube.from_datacube({
"levtype": "sfc",
"param" : ["2t", "cp", "msl"],
})
model_3 = "model=3" / ("frequency=6h" / model_3)
model_3
root, model=3, frequency=6h
├── levtype=pl, param=q/t, level=50/100/150/200/250/300/400/500/600/700/850/9...└── levtype=sfc, param=2t/cp/msl
Now we can combine the three models into a single qube:
all_models = model_1 | model_2 | model_3
all_models
root
├── model=1, frequency=6h
│ ├── levtype=pl, param=q/t/u/v/w/z, level=50/100/150/200/250/300/400/500/600/700/850/9...│ └── levtype=sfc, param=10u/10v/2d/2t/cp/msl/skt/sp/tcw/tp├── model=2, frequency=continuous
│ ├── levtype=pl, param=q/t, level=50/100/150/200/250/300/400/500/600/700/850/9...│ └── levtype=sfc, param=2t/cp/msl└── model=3, frequency=6h
├── levtype=pl, param=q/t, level=50/100/150/200/250/300/400/500/600/700/850/9... └── levtype=sfc, param=2t/cp/msl
Now we can perform queries over the models. We can get all models that produce 2m temperature:
all_models.select({
"param" : "2t",
})
root
├── model=1, frequency=6h, levtype=sfc, param=2t├── model=2, frequency=continuous, levtype=sfc, param=2t└── model=3, frequency=6h, levtype=sfc, param=2t
Filter on both parameter and frequency:
all_models.select({
"param" : "2t",
"frequency": "continuous",
})
root, model=2, frequency=continuous, levtype=sfc, param=2tFind all models that have some overlap with this set of parameters:
all_models.select({
"param" : ["q", "t", "u", "v"],
})
root
├── model=1, frequency=6h, levtype=pl, param=q/t/u/v, level=50/100/150/200/250/300/400/500/600/700/850/9...├── model=2, frequency=continuous, levtype=pl, param=q/t, level=50/100/150/200/250/300/400/500/600/700/850/9...└── model=3, frequency=6h, levtype=pl, param=q/t, level=50/100/150/200/250/300/400/500/600/700/850/9...
Choosing a set of models based on the requested parameter set
all_models.select({
"param" : ["q", "t", "u", "v"],
"frequency": "6h",
})
root
├── model=1, frequency=6h, levtype=pl, param=q/t/u/v, level=50/100/150/200/250/300/400/500/600/700/850/9...└── model=3, frequency=6h, levtype=pl, param=q/t, level=50/100/150/200/250/300/400/500/600/700/850/9...
Using WildCards
daily_surface_means = Qube.from_datacube({
"model": "*",
"frequency": "*",
"levtype": "sfc",
"param": "*",
})
all_models & daily_surface_means
root
├── model=1, frequency=6h, levtype=sfc, param=10u/10v/2d/2t/cp/msl/skt/sp/tcw/tp├── model=2, frequency=continuous, levtype=sfc, param=2t/cp/msl└── model=3, frequency=6h, levtype=sfc, param=2t/cp/msl
daily_level_means = Qube.from_datacube({
"model": "*",
"frequency": "*",
"levtype": "pl",
"param": "*",
"level": "*"
})
all_models & daily_level_means
root
├── model=1, frequency=6h, levtype=pl, param=q/t/u/v/w/z, level=50/100/150/200/250/300/400/500/600/700/850/9...├── model=2, frequency=continuous, levtype=pl, param=q/t, level=50/100/150/200/250/300/400/500/600/700/850/9...└── model=3, frequency=6h, levtype=pl, param=q/t, level=50/100/150/200/250/300/400/500/600/700/850/9...
daily_level_mean_products = all_models & daily_surface_means
for i, identifier in enumerate(daily_level_mean_products.leaves()):
print(identifier)
if i > 10:
print("...")
break
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': '10u'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': '10v'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': '2d'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': '2t'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': 'cp'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': 'msl'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': 'skt'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': 'sp'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': 'tcw'}
{'model': '1', 'frequency': '6h', 'levtype': 'sfc', 'param': 'tp'}
{'model': '2', 'frequency': 'continuous', 'levtype': 'sfc', 'param': '2t'}
{'model': '2', 'frequency': 'continuous', 'levtype': 'sfc', 'param': 'cp'}
...