Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bit64 support #590

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add bit64 support #590

wants to merge 5 commits into from

Conversation

kdkavanagh
Copy link

Allow for 64bit integers to be passed between Py <-> R using the bit64 package, inspired by https://gallery.rcpp.org/articles/creating-integer64-and-nanotime-vectors/

src/python.cpp Outdated
vec[i] = PyInt_AsLong(PyList_GetItem(x, i));
for (Py_ssize_t i = 0; i<len; i++) {
long num = PyInt_AsLong(PyList_GetItem(x, i));
if(num > std::numeric_limits<int>::max()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could loop thru array once first to check max value, then create the right vector type the first time

@kevinushey
Copy link
Collaborator

I don't think it's a good idea to have the output vector type depend on the runtime value -- I think we need to ensure type stability, and the user has to opt-in to requesting bit64 vectors in some way. This is especially true since there are operations that can be performed on 'native' R integers that will not work with a bit64 vector.

I don't yet know what this interface should look like, though. My instinct says that there should be some way for users to provide custom converters, that accept R objects and return Python objets, and vice versa.

Also worth saying: the underlying SEXP type of an integer64 vector is REALSXP; we'd likely need to audit a lot of reticulate code to ensure that we're doing the right thing when we receive a REALSXP that happens to be an integer64 vector.

@kdkavanagh
Copy link
Author

I don't disagree w.r.t opting in with some option. I've never built an enterprise-grade R package before, so not sure how to be incorporate that, but could imagine that being configured by the user after loading the reticulate package

@eddelbuettel
Copy link
Contributor

It's the right call. I am using bit64 aka integer64 aka nanotime very carefully at work to transfer full resolution timestamo, and some int64_t index values. One has to be really careful to 'cast corectly' and treat it correctly on the receiving end or else one mistakes it for a standard numeric filled with ... garbage bits. This works, but only with some care. It is probably not the right tool for a general purpose converter.

@kdkavanagh
Copy link
Author

I will take a stab at implementing an option tonight

@OssiLehtinen
Copy link

Any news on this one? Looking forward to a solution, as this would solve a bunch of problems!

@kdkavanagh
Copy link
Author

Added two options which default to FALSE

  options(reticulate.long_as_bit64=TRUE)
  options(reticulate.ulong_as_bit64=TRUE)

@kevinushey
Copy link
Collaborator

Thanks -- I'll try to re-review this soon.

Copy link
Collaborator

@kevinushey kevinushey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but I really wonder if we should either always or never convert to integer64. I worry that there will be surprises if this only happens sometimes.

If we really want to allow this to happen automatically, we could allow for the option to be FALSE, TRUE, or "auto" or something like that. I would advocate for just allowing the conversion to always happen or never happen though.

T getConfig(std::string config, T defValue) {
Environment base( "package:base" ) ;
Function getOption = base["getOption"];
SEXP s = getOption(config, defValue);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we could use Rf_GetOption directly here.


else if (scalarType == INTSXP) {
long val = PyLong_AsLong(x);
if((val > std::numeric_limits<int>::max() || val < std::numeric_limits<int>::min()) && convertLongToBit64()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry a bit that this could be confusing if values were converted to integer64 only "sometimes". That is, I'm not sure if the runtime type should depend on the runtime value. Would it make sense to always make this conversion if the user has opted in?

vec[i] = PyInt_AsLong(PyList_GetItem(x, i));
for (Py_ssize_t i = 0; i<len; i++) {
long num = PyLong_AsLong(PyList_GetItem(x, i));
if((num > std::numeric_limits<int>::max() || num < std::numeric_limits<int>::min()) && convertLongToBit64()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern here as above.

if (LENGTH(sexp) == 1) {
double value = REAL(sexp)[0];
return PyFloat_FromDouble(value);
if(isInt64) {
return PyInt_FromLong(reinterpret_cast<long&>(value));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really safe to reinterpret a value as a reference? I've usually seen type punning done as e.g.

*(long*)(&value)

This is still technically undefined behavior, though. My understanding is that the only standards-compliant way of performing type punning is with memcpy(). See: https://stackoverflow.com/a/17790026/1342082

(In C++20, we will have bitcast: https://en.cppreference.com/w/cpp/numeric/bit_cast)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://godbolt.org/z/VDCcVF for an example of a helper function that could be used for casting here.

@@ -58,6 +58,27 @@ std::wstring s_python_v3;
std::string s_pythonhome;
std::wstring s_pythonhome_v3;

const std::string CONFIG_LONG_AS_BIT64="reticulate.long_as_bit64";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be static const? (since they should have internal linkage)

}

int narrow_array_typenum(PyArray_Descr* descr) {
return narrow_array_typenum(descr->type_num);
return narrow_array_typenum(typenum(descr->type_num));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return narrow_array_typenum(typenum(descr->type_num));
return narrow_array_typenum(typenum(descr));

if((num > std::numeric_limits<int>::max() || num < std::numeric_limits<int>::min()) && convertLongToBit64()) {
//We need to start over an interpret as 64 bit int
Rcpp::NumericVector nVec(len);
long long* res_ptr = (long long*) dataptr(nVec);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that using long long* explicitly requires us to use C++11. This is okay but I think we need to update SystemRequirements to indicate that.

@kevinushey
Copy link
Collaborator

Thanks for the pull request! We require a Contributor License Agreement before we can accept pull requests on RStudio repositories. Would you be willing to fill one out, and send it to the e-mail indicated in the form? (Please also reference this PR in your e-mail.)

@colearendt
Copy link
Member

colearendt commented Apr 20, 2020

This is really exciting stuff! Unfortunately, I gave it a try and ran into a few painful points:

  • something weird in my environment is causing certain operations to crash the rsession (import pickle, create a big integer, etc.), so I haven't been able to create a reprex. (NOTE: this is happening for the CRAN released version of reticulate as well, so I suspect this problem is elsewhere).
  • the print method if bit64 has not been loaded/referenced is pretty confusing:
sync_res$items[[1]]$id
[1] 1.196682e-314
attr(,"class")
[1] "integer64"
  • However, just calling a bit64 function directly fixes:
> bit64::as.integer64(1234)
integer64
[1] 1234
> sync_res$items[[1]]$id
integer64
[1] 2422110596

There are definitely some nice things here, though! Thanks!

EDIT: The crashing was due to #723, I believe and reinstalling Rcpp fixed! 😄 Reprex below:

library(reticulate)
options(reticulate.long_as_bit64=TRUE)
options(reticulate.ulong_as_bit64=TRUE)

reticulate::py_run_string('myvar = 2422110596;')
py$myvar
#> [1] 1.196682e-314
#> attr(,"class")
#> [1] "integer64"
bit64::as.integer64(1234)
#> integer64
#> [1] 1234
py$myvar
#> integer64
#> [1] 2422110596

Created on 2020-04-19 by the reprex package (v0.3.0)

@eddelbuettel
Copy link
Contributor

However, just calling a bit64 function directly fixes

Our nanotime package also wraps around bit64, and I have seen this too. Now, it is a little tricky as we generally want an optional component to be truly optional, but generally speaking an if (requireNamespace(...))) test (and load of the package) before code blocks with 64-bit ints seems to help (and I think that is what data.table does too). Making it a globally imported package helps too, but that may be too blunt.

@GreenGrassBlueOcean
Copy link

Dear @kevinushey @kdkavanagh

Are there any plans to accept this pull request in to reticulate?
This would be very useful for my own r-package as this api often returns large integers which then turn out to be -1.

@spittssj
Copy link

Is there any update on this issue?

I am starting to use rgee which wraps the Python Google Earth Engine API through reticulate and I ran into this issue with the time field that comes with the ImageCollections of time series satellite data.

@ssh352

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants