Migrate renaming into FEDS algorithm.

assigned to @tmccabe

By Tempest McCabe on 2023-06-26T19:09:02 (imported from GitLab)

@greg Could you drop a link to the column renaming you are doing as part of the API export?

By Tempest McCabe on 2023-07-10T20:30:24 (imported from GitLab)

Sorry, the reason this took so long is b/c of course it's more complex than i realized. This is how I see things changing. It doesn't mean I see everything accurately or that it's the way we should do it. But thought I'd outline a bit with context so we don't have a lot of confusion. Let me know if you want to chat about it after reviewing the call sites below.

Context:

Currently we have a single function (copy_from_maap_to_veda_s3) in the CONUS and CombineLF workflows that does the file renaming and s3 cp from MAAP to VEDA:

CONUS workflows have all file renaming happens whenever this function is called. Here is the actual rename happening. Please note that the original file here was written out as a different name and therefore the .fgb basically has a header with that original name in it. So the s3 cp doesn't actually change that header info. That's important for later
CombineLF workflows do the same thing in a same way here but it doesn't have to rename the file before the s3 cp and therefore the headers in .fgb are the same

What We Want:

Overall we want to take the copy_from_mapp_to_veda_s3 function and make a new function will do more than just a file copy (and in the CONUS flow a file rename). Ordered by priority these new items will be:

~~build composite primary keys so we don't have to have a bunch of collections with different regional names and then we use -upsert mode during ingestion~~
~~ADDITION: we want to add a column and explore "flagging" geometries that we think can be filtered out in the API (those in Kansas for example)~~
Run manually for time intervals in the ADE to build up the new S3 archive. While that's happening we can do the rest of the changes below
~~Greg to backport FireConst and os environ variables into Eli's branch~~
~~SNAPSHOT RUN: read in the MAAP output from the last step and choose which columns to take and rename columns from the SQL list below~~
~~SNAPSHOT RUN: follow the same file renaming pattern that already exists for CONUS workflow~~
~~SNAPSHOT RUN: instead of an s3 cp we should actually be writing out to VEDA s3 with geopandas via gdf.to_file b/c then the file names match the header info~~
~~Add inputs to DPS jobs that will take any FireRun name so we can reuse the same image for fire tracking algo runs by region~~ * regionName * bbox * startTime * EndTime
~~Jamison and Greg finish porting 5,6,7,8 to the combined_largerfire.py runs~~
~~Jamison and Greg to finish ingest logic by adding indices to the correct columns (especially primarykey and region_name)~~
~~Jamison And Greg to change copy_from_maap_to_veda_s3 to write GeoPackage files~~
Jamison and Greg to change Event Bridge in MCP to look for GeoPackage files
Test algorithms in ADE and DPS and make sure everything looks good:
- tag primarykeyv2 branch as 0.99.0
- register algorithms and wait
- manually kick of jobs and see if they fail/succeed
Revert changes make to primarkeyv2 for testing
Figure out the merge order for Airflow and primarykeyv2

SQL Column Rename List:

Sorry, but these are from the perspective of the final file names that copy_from_maap_to_veda_s3 does. They should which columns are important and which ones (via as) are renames:

snapshot_newfirepix_nrt: "SELECT fireID, mergeid, t_ed as t from <file>"
snapshot_fireline_nrt: "SELECT fireID, mergeid, t_ed as t from <file>"

snapshot_perimeter_nrt: "SELECT n_pixels, n_newpixels, farea, fperim, flinelen, duration, pixden, meanFRP, isactive, t_ed as t, fireID from <file>"

lf_nfplist_nrt: "SELECT x, y, frp, DS, DT, ampm, datetime as t, sat, id as fireID from <file>"
lf_nfplist_archive: "SELECT x, y, frp, DS, DT, ampm, datetime as t, sat, id as fireID from <file>"

lf_newfirepix_nrt: "SELECT id as fireID, t from <file>"
lf_newfirepix_archive: "SELECT id as fireID, t from <file>"
lf_fireline_nrt: "SELECT id as fireID, t from <file>"
lf_fireline_archive: "SELECT id as fireID, t from <file>"

lf_perimeter_nrt: "SELECT n_pixels, n_newpixels, farea, fperim, flinelen, duration, pixden, meanFRP, t, id as fireID from <file>"
lf_perimeter_archive: "SELECT n_pixels, n_newpixels, farea, fperim, flinelen, duration, pixden, meanFRP, t, id as fireID from <file>"

By Gregory on 2023-08-02T17:27:29 (imported from GitLab)

added InProgress label

mentioned in merge request !61

Migrate renaming into FEDS algorithm.

Designs

Child items ...

Activity