Metadata
Qubed includes the ability to store metadata which may vary for each individual leaf node. This is achieves by ‘hanging’ arrays at various points in the tree all the way down to the leaf nodes.
from qubed import Qube
example = Qube.load("../tests/example_qubes/on-demand-extremes-dt_with_metadata.json")
example.html(depth=1)
root, class=d1, dataset=on-demand-extremes-dt, stream=oper
├── expver=7777, type=fc, date=2025-04-04, time=0000, levtype=sfc, step=0/1/2/3/4/5/6, param=167├── expver=0099, type=fc, date=2025-06-16, time=0000
│ ├── levtype=ml, levelist=90, step=0/1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/..., param=75/76/130/131/132/133/246/247/260028/260155│ ├── levtype=pl, levelist=100/1000/150/200/250/300/400/50/500/600/7..., step=0/1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/..., param=75/76/129/130/131/132/133/157/246/247/248/26...│ ├── levtype=hl
│ │ └── levelist=1000/130/15/150/1500/200/2000/250/30/300/..., step=0/1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/..., param=75/76/130/131/132/133/157/246/247/248/260028...│ │ ├── levelist=100
│ │ │ ├── step=0, param=75/76/130/131/132/133/157/246/247/248/260028...│ │ │ └── step=1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/23..., param=10/75/76/130/131/132/133/157/246/247/248/260...│ └── levtype=sfc
│ ├── step=0, param=130/134/151/159/165/166/167/3073/3074/3075/1...│ ├── step=0-1, param=49/146/169/175/176/177/178/179/201/202/22822...│ ├── step=0-10/0-11/0-12/0-13/0-14/0-15/0-16/0-17/0-18/..., param=146/169/175/176/177/178/179/228228/231040/23...│ ├── step=1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/23..., param=130/134/151/159/165/166/167/207/3073/3074/30...│ └── step=1-2/10-11/11-12/12-13/13-14/14-15/15-16/16-17..., param=49/201/202/237120/260646/260647└── expver=ulfs, type=fc, date=2025-05-30, time=0000, levtype=sfc, step=0/1/2/3/4/5/6, param=167├── expver=aab0, type=fc, date=2024-08-11, time=0000
│ ├── levtype=ml, levelist=1/10/11/12/13/14/15/16/17/18/19/2/20/21/2..., step=0/1/10/11/12/13/14/15/15m/16/17/18/19/1h15m/1..., param=75/76/129/130/131/132/133/246/247/248/260028...│ ├── levtype=pl, levelist=100/1000/150/200/250/300/400/50/500/600/7..., step=0/1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/..., param=75/76/129/130/131/132/133/157/246/247/248/26...│ ├── levtype=hl
│ │ └── levelist=1000/1500/2000/3000/400/4000/500, step=0/1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/..., param=75/76/130/131/132/133/157/246/247/248/260028...│ │ ├── levelist=100/130/15/150/200/250/30/300/50/75
│ │ │ ├── step=0/1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/..., param=75/76/130/131/132/133/157/246/247/248/260028...│ │ │ └── step=15m/1h15m/1h30m/1h45m/2h15m/2h30m/2h45m/30m/4..., param=130/131/132/133/157/246/247/260155│ └── levtype=sfc
│ ├── step=0, param=129/130/134/137/151/159/165/166/167/172/3008...│ ├── step=0-1, param=146/169/175/176/177/178/179/201/202/231040/2...│ ├── step=0-10/0-11/0-12/0-13/0-14/0-15/0-16/0-17/0-18/..., param=146/169/175/176/177/178/179/231040/231041/23...│ ├── step=0-15m/0-30m/0-45m, param=146/169/175/176/177/178/179/231040/231041/23...│ ├── step=1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/23..., param=130/134/151/159/165/166/167/3008/3073/3074/3...│ ├── step=1-1h15m/1-1h30m/1-1h45m/2-2h15m/2-2h30m/2-2h4..., param=235017/235018/237120│ ├── step=1-2/10-11/11-12/12-13/13-14/14-15/15-16/16-17..., param=201/202/235017/235018/237120/238105/238382/2...│ └── step=15m/1h15m/1h30m/1h45m/2h15m/2h30m/2h45m/30m/4..., param=130/134/151/159/165/166/167/3073/3074/3075/1...
Hovering over nodes will give some debug information about them and what metadata is attached. We can iterate over leaf nodes including their metadata using Qube.leaves_with_metadata()
next(example.leaves(metadata=True))
({'class': 'd1',
'dataset': 'on-demand-extremes-dt',
'stream': 'oper',
'expver': '0099',
'type': 'fc',
'date': datetime.date(2025, 6, 16),
'time': '0000',
'levtype': 'hl',
'levelist': '100',
'step': '0',
'param': 75},
{'length': 237,
'offset': 16580770295,
'host': 'databridge-prod-store4-ope.ewctest.link',
'path': '/data/prod_4/fdb/d1:on-demand-extremes-dt:0099:oper:20250616:0000/fc:hl:u4usq2.20250624.053437.databridge-prod-store4.novalocal.1854313475342336.data',
'scheme': 'fdb',
'port': 10000})
In this case we see that each individual field of this Qube stores a path to a file and an offset and length into that file. The path string is actually stored one level up the tree because it is common to many individual leaves.
Recipes
Extracting the set of metadata values
In the case of metadata which sits at levels above the leaf nodes it would be ineficient to use Qube.leaves, instead one can use Qube.walk like this:
def get_metadata_key(qube, key):
m = []
def getter(qube):
for k, v in qube.metadata.items():
if k == key:
m.extend(v.flatten())
qube.walk(getter)
return m
m = get_metadata_key(example, "path")
m[:5]
[np.str_('/data/prod_4/fdb/d1:on-demand-extremes-dt:0099:oper:20250616:0000/fc:hl:u4usq2.20250624.053437.databridge-prod-store4.novalocal.1854313475342336.data'),
np.str_('/data/prod_2/fdb/d1:on-demand-extremes-dt:0099:oper:20250616:0000/fc:ml:u4usq2.20250624.053520.databridge-prod-store2.novalocal.1897323277844480.data'),
np.str_('/data/prod_1/fdb/d1:on-demand-extremes-dt:0099:oper:20250616:0000/fc:pl:u4usq2.20250626.070046.databridge-prod-store1.novalocal.2364589949845504.data'),
np.str_('/data/prod_1/fdb/d1:on-demand-extremes-dt:0099:oper:20250616:0000/fc:sfc:u4usq2.20250619.121642.databridge-prod-store1.novalocal.696012335218688.data'),
np.str_('/data/prod_3/fdb/d1:on-demand-extremes-dt:7777:oper:20250404:0000/fc:sfc:u15rxs.20250623.133832.databridge-prod-store3.novalocal.1706132808663040.data')]
Getting the total size in bytes used by metadata
from collections import defaultdict
def count_metadata_bytes(q: Qube):
totals = defaultdict(lambda: 0)
def measure(q: Qube):
for key, values in q.metadata.items():
totals[key] += values.nbytes
q.walk(measure)
return dict(totals)
count_metadata_bytes(example)
{'scheme': 12,
'port': 8,
'host': 119496,
'path': 463200,
'length': 637672,
'offset': 637672}