Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROB: fix extract_text() issues on damaged PDFs #99

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

art049
Copy link
Member

@art049 art049 commented Jul 18, 2024

This pull request was created automatically by CodSpeed to track performance changes of the pull request py-pdf/pypdf#2760.

The original branch is fork-2760-pubpub-zz/iss2702

Copy link

codspeed-hq bot commented Jul 18, 2024

CodSpeed Performance Report

Merging #99 will not alter performance

Comparing fork-2760-pubpub-zz/iss2702 (10bc22b) with main (1a18f7f)

Summary

✅ 6 untouched benchmarks

@art049 art049 force-pushed the main branch 4 times, most recently from 79e6939 to 85297ae Compare July 22, 2024 09:40
stefan6419846 and others added 13 commits July 28, 2024 17:16
* DEV: Test against Python 3.13

* fix typo

* add missing setup-python

* fix another typo

* update Pillow version

* attempt to update coverage package

* update number of expected coverage files
…ayout mode (py-pdf#2788)

* Handle Sequence as an IndirectObject

The spec allows an int or float to be an IndirectObject as well, but this commit does not address that theoretical possibility.

* Update pypdf/_text_extraction/_layout_mode/_font.py

Co-authored-by: Stefan <[email protected]>

* Address PR comments

-Rename w_1 to w_next_entry
-Utilize ParseError instead of PdfReadError
-Write a test (both positive and negative)

* Handle unlikely case of IndirectObjects for float/int width elements

Also adds a comment to clarify that we don't explicitly handle the IndexError exception. Rather, we let it be raised as an IndexError.

* Yoda condition I removed

* Last commit was a bad patch, confused by non-committed changes

* Use test files from URL rather than resources

* Update tests/test_text_extraction.py

Co-authored-by: pubpub-zz <[email protected]>

* Fix code style warnings in range() call

---------

Co-authored-by: Stefan <[email protected]>
Co-authored-by: pubpub-zz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants