Skip to content
Snippets Groups Projects
Commit 43635c87 authored by Ian's avatar Ian Committed by Ian
Browse files

docs: update run_on_maapy.py guide

docs: update download instructions to use download_from_workspace.py script
parent f054ad44
No related branches found
No related tags found
No related merge requests found
...@@ -15,7 +15,7 @@ Here are minimal instructions to run the NMBIM algorithm on MAAP for a given spa ...@@ -15,7 +15,7 @@ Here are minimal instructions to run the NMBIM algorithm on MAAP for a given spa
- Must only include areas with valid HSE and k_allom values - Must only include areas with valid HSE and k_allom values
- Multiple polygons supported but must not overlap - Multiple polygons supported but must not overlap
p3. Get or create a configuration file 3. Get or create a configuration file
- Option A: Use an existing config file (see config/config.yaml in the ni-meister-gedi-biomass repository for a default) - Option A: Use an existing config file (see config/config.yaml in the ni-meister-gedi-biomass repository for a default)
- Option B: Create new config file: - Option B: Create new config file:
- Name as `config.yaml` or `config.yml` - Name as `config.yaml` or `config.yml`
...@@ -33,16 +33,17 @@ p3. Get or create a configuration file ...@@ -33,16 +33,17 @@ p3. Get or create a configuration file
git clone https://github.com/i-c-grant/ni-meister-gedi-biomass.git git clone https://github.com/i-c-grant/ni-meister-gedi-biomass.git
``` ```
5. Upload files to MAAP workspace 5. Upload files to your MAAP workspace bucket
- Navigate my-private-bucket in the MAAP ADE graphical file browser - Navigate my-private-bucket in the MAAP ADE graphical file browser
- Upload hse.tif, k_allom.tif, boundary.gpkg, and config.yaml using the interface - Upload hse.tif, k_allom.tif, boundary.gpkg, and config.yaml using the interface
6. Run processing script 6. Run processing script
- Creates a local output directory (`run_output_<YYYYMMDD_HHMMSS>`) containing `run.log` and a copy of your config file
```bash ```bash
# Navigate to the cloned repository # Navigate to the cloned repository
cd ni-meister-gedi-biomass cd ni-meister-gedi-biomass
# Run the script # Run the script with additional options
python run_on_maap.py \ python run_on_maap.py \
--username {your_username} \ --username {your_username} \
--tag {unique_processing_id} \ --tag {unique_processing_id} \
...@@ -50,7 +51,12 @@ p3. Get or create a configuration file ...@@ -50,7 +51,12 @@ p3. Get or create a configuration file
--hse s3://maap-ops-workspace/{username}/my-private-bucket/hse.tif \ --hse s3://maap-ops-workspace/{username}/my-private-bucket/hse.tif \
--k_allom s3://maap-ops-workspace/{username}/my-private-bucket/k_allom.tif \ --k_allom s3://maap-ops-workspace/{username}/my-private-bucket/k_allom.tif \
--algo_id nmbim_biomass_index \ --algo_id nmbim_biomass_index \
--algo_version main --algo_version main \
--boundary s3://maap-ops-workspace/{username}/my-private-bucket/boundary.gpkg \
--date_range "2019-01-01,2020-12-31" \
--job_limit 100 \
--redo-of previous_tag \
--no-redo
``` ```
This script will figure out what GEDI files are necessary to cover the query and submit the necessary jobs to the MAAP DPS. This script will figure out what GEDI files are necessary to cover the query and submit the necessary jobs to the MAAP DPS.
...@@ -61,7 +67,7 @@ The script will display a progress bar showing: ...@@ -61,7 +67,7 @@ The script will display a progress bar showing:
- Current status counts (Succeeded, Failed, Running) - Current status counts (Succeeded, Failed, Running)
- Time of last status update (updates occur infrequently for very large job batches) - Time of last status update (updates occur infrequently for very large job batches)
Wait until all or most jobs are complete. If jobs are hung in 'Offline' status for a long time, you can safely cancel the run with Ctrl-C Ctrl-C. The completed jobs will still be available for download. Wait until all or most jobs are complete. Press Ctrl-C once to suspend monitoring (you can then choose to resume, resubmit failed jobs, or exit). Press Ctrl-C again at the prompt or select 'x' to cancel the run; any completed jobs will remain available for download.
8. Get temporary MAAP credentials 8. Get temporary MAAP credentials
In order to download the outputs from the MAAP s3 bucket, it is necessary to obtain temporary credentials to access the bucket. In order to download the outputs from the MAAP s3 bucket, it is necessary to obtain temporary credentials to access the bucket.
...@@ -74,18 +80,40 @@ In order to download the outputs from the MAAP s3 bucket, it is necessary to obt ...@@ -74,18 +80,40 @@ In order to download the outputs from the MAAP s3 bucket, it is necessary to obt
9. Download and process results 9. Download and process results
The temporary credentials can be used with AWS CLI tools locally to download the results: You can use the provided `download_from_workspace.py` script along with your temporary AWS credentials JSON file to download all compressed GeoPackages for your run:
```bash
# Save temporary credentials to creds.json
python - <<'EOF'
from maap.maap import MAAP
maap = MAAP(maap_host='api.maap-project.org')
import json
creds = maap.aws.workspace_bucket_credentials()
print(json.dumps(creds, indent=2))
EOF > creds.json
# Download results for your run
python download_from_workspace.py \
--credentials creds.json \
--output-dir ./run_results \
--algorithm nmbim_biomass_index \
--version main \
--tag {unique_processing_id}
# Decompress downloaded GeoPackages
bunzip2 run_results/*.gpkg.bz2
```
Alternatively, you can still use AWS CLI directly:
```bash ```bash
# List output files # List output files
aws s3 ls s3://maap-ops-workspace/{username}/dps_output/nmbim_biomass_index/main/{unique_processing_id}/ --recursive | grep '.gpkg.bz2$' aws s3 ls s3://maap-ops-workspace/{username}/dps_output/nmbim_biomass_index/main/{unique_processing_id}/ --recursive | grep '.gpkg.bz2$'
# Download compressed GeoPackages # Download compressed GeoPackages
aws s3 cp s3://maap-ops-workspace/{username}/{unique_processing_id}/ . \ aws s3 cp s3://maap-ops-workspace/{username}/dps_output/nmbim_biomass_index/main/{unique_processing_id}/ ./run_results \
--recursive --exclude "*" --include "*.gpkg.bz2" --recursive --exclude "*" --include "*.gpkg.bz2"
# Decompress files # Decompress files
bunzip2 *.gpkg.bz2 bunzip2 run_results/*.gpkg.bz2
``` ```
## Deployment Overview ## Deployment Overview
...@@ -154,12 +182,15 @@ The script requires several mandatory arguments: ...@@ -154,12 +182,15 @@ The script requires several mandatory arguments:
- algo_version: The algorithm version (typically "main") - algo_version: The algorithm version (typically "main")
Optional arguments allow you to: Optional arguments allow you to:
- Restrict processing to a specific geographic boundary - Restrict processing to a specific geographic boundary (`--boundary`)
- Filter by date range - Filter by date range (`--date_range`)
- Limit the number of jobs submitted - Limit the number of jobs submitted (`--job_limit`)
- Adjust the job status checking interval - Adjust the job status checking interval (`--check_interval`)
- Reprocess a previous run (`--redo-of <tag>`)
Note: all file arguments (config, hse, and k_allom) should be passed as s3 paths, not paths within the locally mounted MAAP filesystem. Passing local paths will result in a high load on the MAAP ADE cluster, as each transfer of a file argument to a worker will be routed through the MAAP ADE in order to resolve the local path into the s3 path. Accordingly, the configuration file and HSE and k_allom rasters should be stored in the `my-private-bucket` or `my-public-bucket` folders of your MAAP environment. - Force reuse of same tag for redo (`--force-redo`)
- Disable automatic resubmission of failed jobs (`--no-redo`)
Note: all file arguments (config, hse, k_allom, and boundary) should be passed as `s3://` paths, not paths within the locally mounted MAAP filesystem. Passing local paths will result in a high load on the MAAP ADE cluster, as each transfer of a file argument to a worker will be routed through the MAAP ADE in order to resolve the local path into the s3 path. Accordingly, the configuration file and rasters should be stored in your MAAP workspace (e.g., `my-private-bucket` or `my-public-bucket`).
### Job Management ### Job Management
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment