Back all `AllFires` and `Fire` objects with 2 dataframes
This branch uses two dataframes: an allpixels dataframe with 1 row per pixel and an allfires geodataframe with one row per-fire/per-t. The core concept is that if you use a dataframe to back the allfires and fire objects there are well-defined ways to serialize that to disk whenever you like (aka no more pickles!).
Here's a bit of an overview of the lifecycle of each of these dataframes:
allpixels:
- At the start of
Fire_Forwardall of the preprocessed pixel data is loaded and concatenated into one long dataframe. - Each row represents a fire pixel and there is a unique id per row.
- As
Fire_Forwarditerates through the timesteps of interest theallpixelsdataframe is updated in place. - Each
Fireobject refers to theallpixelsobject as the source of truth and does not hold pixel data but instead refers to subsets of theallpixelsdataframe to returnn_pixelsornewpixels. - Merging fires at a particular
tcan update theallpixelsat a former timestep. - When
Fire_Forwardis complete, theallpixelsobject can be serialized to csv (or any tabular format) optionally partitioned into files byt. - This dataframe can be used:
- together with
allfires_gdfto rehydrate theallfiresobject at the latesttin order to runFire_Forwardon one new ingest file. - independently to write the
nplistoutput file for largefires
- together with
allfires_gdf:
- At the start of
Fire_Forwarda new geodataframe object is initialized. It has a column for each of theFireattributes that take a non-trivial amount of time to compute (ftype,hull,fline...). - As
Fire_Forwarditerates through the timesteps of interest it writes a row for every fire that is burning (aka has new pixels) at thet. - So each row contains the information about one fire at one
t. The index is a MultiIndex of(fid, t) - Merging fires at a particular
tupdates themergeidon the existing rows (this part I am not totally confident is correct). - When
Fire_Forwardis complete, theallfires_gdfobject can be serialized to geoparquet (this is the best choice since it contains multiple geometry columns) optionally partitioned into files byt. - This geodataframe can be used:
- together with
allpixelsto rehydrate theallfiresobject at the latesttin order to runFire_Forwardon one new ingest file. - independently to write all the snapshot and largefires output files.
- together with
Side note: I like that in this branch the allpixels dataframe is referenced by all the Fire objects but it isn't copied around. This is different from how it works in preprocess where each Fire object (at each t) has its own dataframe. It is also different than the original version of this algorithm where each Fire object (at each t) holds a bunch of lists.