This Merge Request tries to reorganize the code to see how much work we can do on the initial files without needing to know anything about what happened at other timesteps. So start from the pixel and see how far we can get.
So far what we've got is:
description | preceeds | per-region | per-ingest-file | per-t | timing (WesternUS, Aug 2022) | |
---|---|---|---|---|---|---|
preprocess_region |
takes outline of region and the static_sources file and creates the "swiss cheese" shape | preprocess_region_t |
x | 30 seconts | ||
preprocess_monthly_file |
takes the monthly file, normalizes columns, and splits into half-day | preprocess_region_t |
x | 30 seconds | ||
preprocess_region_t |
takes the half-day file and the region and does filtering and initial clustering | x | x | 5 minutes |
Guiding principles:
- No pickles
- Organize code around pipeline steps
- Write a new file at the end of each pipeline step
- Prefer numpy arrays over lists
- if a function is only used once, consider putting it into the parent