I want to try the following experiment over the next day and see if there's any bad noise:
- take out one of our
python FireRun.py
calls fromrun-dps.sh
(but obviously leave the other one) - change the cadence of the job scheduler to be every 2 hours
- over the course of the next 24 hours check each job id against results to see if they are completing or if we are running into that issue we were worried about