Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI Testing Tool for Parsing Results to Standard Output #363

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

adreichert
Copy link
Contributor

@adreichert adreichert commented Aug 24, 2024

Summary

This PR adds a simple CLI tool that wraps the LlamaParse constructor.

It is intended to make testing easier. It's useful for quickly parsing files to see the results, visually comparing different models, or quickly inspecting JSON output to see the different fields. Hopefully, it will also help people get started with the tool more quickly. It is not intended to be complete. I've included the options I vary most of the time.

  • The output JSON is typically passed to jq -r
  • I've been using is for the past several months.

I think others would benefit. I have a more complicated version that will fetch past jobs' results or status using job ids. I'll add more functionality to this file if this PR is approved.

Example Usage

python -m llama_parse.tool parse foo.pdf

Testing

Help Message

python -m llama_parse.tool parse --help

Usage: python -m llama_parse.tool parse [OPTIONS] FILE

  Parse the given file and output the result to the STDOUT

  All supported arguments match those of the LlamaParse constructor. Please
  refer to the official documentation for more information.

Options:
  --api-key <api-key>             Defaults to $LLAMA_CLOUD_API_KEY
  --vendor-multimodal-model-name <model>
  --vendor-multimodal-api-key <vendor-api-key>
  --invalidate-cache
  --result-type <result-type>
  --help                          Show this message and exit.

Example Script

Everything parsed

python -m llama_parse.tool parse \
    $FILE \
    --invalidate-cache > ~/Desktop/a.md

python -m llama_parse.tool parse \
    $FILE \
    --result-type='text' \
    --invalidate-cache > ~/Desktop/a.txt

python -m llama_parse.tool parse \
    $FILE \
    --result-type='json' \
    --invalidate-cache > ~/Desktop/a.json

python -m llama_parse.tool parse \
    $FILE \
    --vendor-multimodal-model-name='openai-gpt4o' \
    --vendor-multimodal-api-key=$VENDOR_KEY \
    --invalidate-cache > ~/Desktop/b.md


python -m llama_parse.tool parse \
    $FILE \
    --result-type='json' \
    --api-key=$LLAMA_CLOUD_API_KEY \
    --vendor-multimodal-model-name='openai-gpt4o' \
    --vendor-multimodal-api-key=$VENDOR_KEY \
    --invalidate-cache > ~/Desktop/b.json

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@adreichert adreichert changed the title CLI Tool for Parsing Results to Standard Output CLI Testing Tool for Parsing Results to Standard Output Aug 24, 2024
@adreichert adreichert marked this pull request as ready for review August 24, 2024 04:48
@logan-markewich
Copy link
Contributor

This looks great! Thanks for the contribution

A few notes:

  1. It looks like click wasn't added to the toml dependencies, so any one using this will run into an import error if it's not installed
  2. It seems like not every option is included (which is totally fine, there's tons). Maybe @hexapode can comment on if any should be added
  3. We can actually make this usable on the command line without needing python -m -- check out this example: https://stackoverflow.com/questions/59286983/how-to-run-a-script-using-pyproject-toml-settings-and-poetry
  4. Let's add a section in the readme for CLI usage

@adreichert
Copy link
Contributor Author

adreichert commented Sep 10, 2024

  • Updated Readme
  • Added Click
  • Add CLI command lp-tool

@adreichert
Copy link
Contributor Author

@logan-markewich Any feedback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants