This branch uses two dataframes: an allpixels dataframe with 1 row per pixel and an allfires geodataframe with one row per-fire/per-t. The core concept is that if you use a dataframe to back the allfires and fire objects there are well-defined ways to serialize that to disk whenever you like (aka no more pickles!).
Here's a bit of an overview of the lifecycle of each of these dataframes:
allpixels:
- At the start of
Fire_Forward
all of the preprocessed pixel data is loaded and concatenated into one long dataframe. - Each row represents a fire pixel and there is a unique id per row.
- As
Fire_Forward
iterates through the timesteps of interest theallpixels
dataframe is updated in place. - Each
Fire
object refers to theallpixels
object as the source of truth and does not hold pixel data but instead refers to subsets of theallpixels
dataframe to returnn_pixels
ornewpixels
. - Merging fires at a particular
t
can update theallpixels
at a former timestep. - When
Fire_Forward
is complete, theallpixels
object can be serialized to csv (or any tabular format) optionally partitioned into files byt
. - This dataframe can be used:
- together with
allfires_gdf
to rehydrate theallfires
object at the latestt
in order to runFire_Forward
on one new ingest file. - independently to write the
nplist
output file for largefires
- together with
allfires_gdf:
- At the start of
Fire_Forward
a new geodataframe object is initialized. It has a column for each of theFire
attributes that take a non-trivial amount of time to compute (ftype
,hull
,fline
...). - As
Fire_Forward
iterates through the timesteps of interest it writes a row for every fire that is burning (aka has new pixels) at thet
. - So each row contains the information about one fire at one
t
. The index is a MultiIndex of(fid, t)
- Merging fires at a particular
t
updates themergeid
on the existing rows (this part I am not totally confident is correct). - When
Fire_Forward
is complete, theallfires_gdf
object can be serialized to geoparquet (this is the best choice since it contains multiple geometry columns) optionally partitioned into files byt
. - This geodataframe can be used:
- together with
allpixels
to rehydrate theallfires
object at the latestt
in order to runFire_Forward
on one new ingest file. - independently to write all the snapshot and largefires output files.
- together with
Side note: I like that in this branch the allpixels
dataframe is referenced by all the Fire
objects but it isn't copied around. This is different from how it works in preprocess
where each Fire
object (at each t
) has its own dataframe. It is also different than the original version of this algorithm where each Fire
object (at each t
) holds a bunch of lists.