Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to pass in the custom parser during add_resource so it can be used in stream_remote_resources #84

Open
strets123 opened this issue Sep 27, 2017 · 3 comments

Comments

@strets123
Copy link

In order to submit an issue, please ensure you can check the following. Thanks!

  • [3.6 ] Declare which version of Python you are using (python --version)
  • [Ubuntu ] Declare which operating system you are using

Currently I have to create a sseparate task in order to use a custom parser like this:

https://github.com/strets123/frictionless-pres/blob/master/smdataproject/stream_remote_resources_custom.py

This breaks pep8

@akariv
Copy link
Member

akariv commented Dec 31, 2017

Hey @strets123 - can you explain your use case here?

@strets123
Copy link
Author

I would like to be able to make datapackage pipelines connecting to many disparate JSON, XML and HTML data sources. Often this requires changes to custom parsers of tabulator but I cannot then easily re-use the stream_remote_resources code. I do not want to copy and paste a whole module just to change one line. Therefore I resorted to the above hack of importing the dpp mopdule at the abottom of the file.

In the above case I had a JSON parser for the JSON API spec that also did pagination. This follows a similar pattern to the SQL data parser. I also have a similar one for SPARQL endpoints.

To frame the issue in another way problem when reusing code from the dpp project is that the CSV dump code is class based and can be easily overridden but the stream_remote_resources module has import time logic making its re-use difficult.

@strets123
Copy link
Author

strets123 commented Jan 1, 2018

If there is a desire to retain the simplicity of functional approach for dpp modules, might it be possible to have a magic "run" function. This would retain backwards compatibility but if users wanted they could put their import time logic in the run function instead.

This would allow users to import specific functions and override others without a full class-based approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants