Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regular expression matching on files #11

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

regular expression matching on files #11

wants to merge 15 commits into from

Conversation

deanmalmgren
Copy link
Owner

from the design specification, the command line interface should have something like

workflow data/per_file --filename-root=a_specific_filename

to limit the data/per_file task to only depending on the file data/first/a_specific_filename.dat and only creating the file data/per_file/a_specific_filename.dat, as defined here:

---
alias: data/per_file
creates: data/per_file/{{filename_root}}.dat
depends:
  - src/process_each_file.py
  - data/first/(?P<filename_root>\w+).dat
  - data/first/
command: python {{depends[0]}} {{depends[1]}} > {{creates}}

@deanmalmgren
Copy link
Owner Author

I started to get the source code in examples/image-colors implemented but I haven't been able to start enabling this feature within workflow itself yet

@deanmalmgren
Copy link
Owner Author

After spending a decent amount of time coding this up and playing with it in practice, I'm not terribly convinced that this is the best way to approach this issue. Among other reasons, regular expressions are inherently difficult for users to understand and this feature adds significant complexity to the source code when the same type of functionality --- namely being able to iterate on specific cases during development --- might be more easily addressed by some combination of #52 and #44.

For the time being, I'm going to abandon this feature but I'll leave the issue open for the foreseeable future in the event that others find it compelling. I just want to make sure that this will actually be useful before we unnecessarily clutter the functionality with difficult-to-maintain features. In the meantime, here's a quick list of things that I can think of that would really be required to be able to merge in this pull request.

  • Confirm that downstream tasks are handled properly as well. Right now we dynamically add tasks to the workflow when regex tasks are encountered inline; it would be good to confirm that this approach still works properly when there are other downstream tasks specified in the image-colors example.
  • Decide how --dry-run should behave (probably clone regex Tasks, too)
  • Edit the source code and then do workflow run --subdirectory testImages_abstract --rootname abstract_0151 and then workflow run. this does not run the analysis on all other images as I'd expect in this case
  • Address any other REGEX TODOs that are noted in the source code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant