Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading parquet files with timestamp column containing 9999-12-23 23:59:59 yields 1816-03-22 05:56:07.066277376 #44112

Open
matthiasgomolka opened this issue Sep 13, 2024 · 0 comments

Comments

@matthiasgomolka
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

I've stumbled upon a weird issue, where I don't get the underlying isse.

I read a parquet file which contains a timestamp column. This timestamp column contains the value 9999-12-23 23:59:59. When I read this file using pyarrow (or with pandas and pyarrow engine an dtype_backend), the rows with 9999-12-23 23:59:59 show the value 1816-03-22 05:56:07.066277376.

I'm pretty certain that 9999-12-23 23:59:59 is the correct value, because this is much more plausible (and that's what duckdb and Impala say as well).

When I write the respective row to parquet using duckdb and read this file using pyarrow, I get the correct value of 9999-12-23 23:59:59.

I've already checked if this is a problem with the parquet version, but both files are version 1.0. What else might cause this?

Unfortunately, I can't share the parquet file in question because it contains confidential data.

Component(s)

Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant