Right now, the API migration script has a fair amount of column renaming (@greg could you drop a link to the renaming code?). It would be better as part of the algorithm itself, because the algorithm could just write out the correct column names. @tmccabe mentioned that some column names may be appropriate for the API but not for the raw data. There also could be places where the data should just have a different column name.
@tmccabe will look at save_gpkgsfs to see places where the column name should just be changed.
Designs
Child items
...
Linked items
0
Link issues together to show that they're related.
Learn more.
Sorry, the reason this took so long is b/c of course it's more complex than i realized. This is how I see things changing. It doesn't mean I see everything accurately or that it's the way we should do it. But thought I'd outline a bit with context so we don't have a lot of confusion. Let me know if you want to chat about it after reviewing the call sites below.
Context:
Currently we have a single function (copy_from_maap_to_veda_s3) in the CONUS and CombineLF workflows that does the file renaming and s3 cp from MAAP to VEDA:
CONUS workflows have all file renaming happens whenever this function is called. Here is the actual rename happening. Please note that the original file here was written out as a different name and therefore the .fgb basically has a header with that original name in it. So the s3 cp doesn't actually change that header info. That's important for later
CombineLF workflows do the same thing in a same way here but it doesn't have to rename the file before the s3 cp and therefore the headers in .fgb are the same
What We Want:
Overall we want to take the copy_from_mapp_to_veda_s3 function and make a new function will do more than just a file copy (and in the CONUS flow a file rename). Ordered by priority these new items will be:
build composite primary keys so we don't have to have a bunch of collections with different regional names and then we use -upsert mode during ingestion
ADDITION: we want to add a column and explore "flagging" geometries that we think can be filtered out in the API (those in Kansas for example)
Run manually for time intervals in the ADE to build up the new S3 archive. While that's happening we can do the rest of the changes below
Greg to backport FireConst and os environ variables into Eli's branch
SNAPSHOT RUN: read in the MAAP output from the last step and choose which columns to take and rename columns from the SQL list below
SNAPSHOT RUN: follow the same file renaming pattern that already exists for CONUS workflow
SNAPSHOT RUN: instead of an s3 cp we should actually be writing out to VEDA s3 with geopandas via gdf.to_file b/c then the file names match the header info
Add inputs to DPS jobs that will take any FireRun name so we can reuse the same image for fire tracking algo runs by region* regionName* bbox* startTime* EndTime
Jamison and Greg finish porting 5,6,7,8 to the combined_largerfire.py runs
Jamison and Greg to finish ingest logic by adding indices to the correct columns (especially primarykey and region_name)
Jamison And Greg to change copy_from_maap_to_veda_s3 to write GeoPackage files
Jamison and Greg to change Event Bridge in MCP to look for GeoPackage files
Test algorithms in ADE and DPS and make sure everything looks good:
tag primarykeyv2 branch as 0.99.0
register algorithms and wait
manually kick of jobs and see if they fail/succeed
Revert changes make to primarkeyv2 for testing
Figure out the merge order for Airflow and primarykeyv2
SQL Column Rename List:
Sorry, but these are from the perspective of the final file names that copy_from_maap_to_veda_s3 does. They should which columns are important and which ones (via as) are renames:
snapshot_newfirepix_nrt: "SELECT fireID, mergeid, t_ed as t from <file>"snapshot_fireline_nrt: "SELECT fireID, mergeid, t_ed as t from <file>"snapshot_perimeter_nrt: "SELECT n_pixels, n_newpixels, farea, fperim, flinelen, duration, pixden, meanFRP, isactive, t_ed as t, fireID from <file>"lf_nfplist_nrt: "SELECT x, y, frp, DS, DT, ampm, datetime as t, sat, id as fireID from <file>"lf_nfplist_archive: "SELECT x, y, frp, DS, DT, ampm, datetime as t, sat, id as fireID from <file>"lf_newfirepix_nrt: "SELECT id as fireID, t from <file>"lf_newfirepix_archive: "SELECT id as fireID, t from <file>"lf_fireline_nrt: "SELECT id as fireID, t from <file>"lf_fireline_archive: "SELECT id as fireID, t from <file>"lf_perimeter_nrt: "SELECT n_pixels, n_newpixels, farea, fperim, flinelen, duration, pixden, meanFRP, t, id as fireID from <file>"lf_perimeter_archive: "SELECT n_pixels, n_newpixels, farea, fperim, flinelen, duration, pixden, meanFRP, t, id as fireID from <file>"
By Gregory on 2023-08-02T17:27:29 (imported from GitLab)