Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default run_uid behavior not conducive to HPC #64

Open
danielsf opened this issue Oct 29, 2021 · 0 comments
Open

default run_uid behavior not conducive to HPC #64

danielsf opened this issue Oct 29, 2021 · 0 comments

Comments

@danielsf
Copy link

I just tried to submit ~ 300 jobs to our HPC cluster without specifying run_uid. The default behavior run_uid="YYYY_MM_DD_HH_mm" means that, because dozens of jobs were starting at the same minute, I had collisions occur as they all tried to write to the same {run_uid}_generator.json and {run_uid}_inference.json (for ease on myself, I was writing all of my inferred data products to the same directory). I'm not sure if something should be done about this or not. Options are

  1. Add some random salt to the default run_uid. This will make it hard to associate run_uid_generator.json files with finished data products after the fact (I'm not sure if that is a concern or not)

  2. Make run_uid a required parameter.

  3. Test for the existence of "run_uid_generator.json" and emit a warning "trying to write file {run_uid_generator.json}, but that file already exists" so that users are more rapidly able to infer why their jobs crashed.

Maybe this is just user error on my part, but it took a bit of digging for me to figure out what was going wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant