Skip to content
Snippets Groups Projects

Draft: [DO NOT MERGE] Setting up preprocessing pipelines

Closed Julia Signell requested to merge preprocess into primarykeyv2

This Merge Request tries to reorganize the code to see how much work we can do on the initial files without needing to know anything about what happened at other timesteps. So start from the pixel and see how far we can get.

So far what we've got is:

description preceeds per-region per-ingest-file per-t timing (WesternUS, Aug 2022)
preprocess_region takes outline of region and the static_sources file and creates the "swiss cheese" shape preprocess_region_t x 30 seconts
preprocess_monthly_file takes the monthly file, normalizes columns, and splits into half-day preprocess_region_t x 30 seconds
preprocess_region_t takes the half-day file and the region and does filtering and initial clustering x x 5 minutes

Guiding principles:

  • No pickles
  • Organize code around pipeline steps
  • Write a new file at the end of each pipeline step
  • Prefer numpy arrays over lists
  • if a function is only used once, consider putting it into the parent
Edited by Julia Signell

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading