-
Notifications
You must be signed in to change notification settings - Fork 346
use Rcpp to convert R Dates to Python #643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I'd like some more assurance (if possible) that the layout of structs in datetime.h will not change across different versions of Python (or at least hasn't in the 3.x series)
| items[[1]] | ||
| else | ||
| r_to_py_impl(items, convert) | ||
| r_convert_date(x, convert) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need to handle length-one lists still (to ensure they're returned as scalars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure exactly what you mean? I was trying to find what this
Line 172 in 64f685a
| if (length(items) == 1) |
was for but didn't manage to get into the
else branch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g.
> library(reticulate)
> r_to_py(Sys.Date())
2019-11-15
> r_to_py(c(Sys.Date(), Sys.Date()))
[datetime.date(2019, 11, 15), datetime.date(2019, 11, 15)]
That is, we make a Python list for R vectors of length > 1, and a 'scalar' for R vectors of length 1. Is this behavior still preserved in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got it, thanks! Does this look better now?
src/datetime.h
Outdated
|
|
||
|
|
||
| /* Define structure for C API. */ | ||
| typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to verify that the size / layout of this struct is stable across different versions of Python? (It seems to not be for Python 2.7, if I understand your comment on the PR correctly)
|
Hi Kevin, thanks for reviewing! If I see correctly, the most recent changes to any Now I'm not really sure what to do - does this kind of volatility basically rule it out? Or should I try to test with different versions of Python 3 (would >= 3.5 suffice in that case?) Reg. 2.7, I was wondering if one could just branch, |
The fact that this struct is not stable across versions does scare me, especially since it seems there's no attempt to maintain the struct layout as new versions of Python are released. I think rather than attempting to import and use I think we would still see similar performance gains, although perhaps not quite as the same level as if we could use the C API directly. |
|
Oh, yes, I think I see what you mean! Honestly I was assuming this (https://docs.python.org/3/c-api/datetime.html) would be the only way to do it using cpython ... but the alternative would be using |
|
Yep, exactly -- I think we should be able to use |
|
Thanks! Then I'll change it like that tomorrow :-) |
|
Done! Looks like it even got a bit faster: |
src/python.cpp
Outdated
| PyObject* py_date = PyDate_FromDate(date.getYear(), date.getMonth(), date.getDay()); | ||
| PyObject* datetime = PyImport_ImportModule("datetime"); | ||
| PyObject* py_date = PyObject_CallMethod( | ||
| datetime, "date", "iii", date.getYear(), date.getMonth(), date.getDay()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very pedantic nitpick: you may want to explicit cast date.getYear() and friends to int, e.g.
static_cast<int>(date.getYear())
While the getYear() method does produce an int, in general, when using variadic C functions, it's good practice to cast arguments to ensure they match the type expected / declared in the format string.
The code is correct as is, but I just want to call it out as a kind of best practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
| if (py_date == NULL) { | ||
| stop(py_fetch_error()); | ||
| } | ||
| return py_ref(py_date, convert); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need py_date.detach()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this one to PyObject* now too
| if (py_date == NULL) { | ||
| stop(py_fetch_error()); | ||
| } | ||
| int res = PyList_SetItem(list, i, py_date); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyList_SetItem "steals" a reference, and so the associated py_date item should either not be wrapped in PyObjectPtr, or explicitly detached. See for example:
Lines 1309 to 1313 in 64f685a
| PyObject* item = r_to_py(RObject(VECTOR_ELT(sexp, i)), convert); | |
| // NOTE: reference to added value is "stolen" by the list | |
| int res = PyList_SetItem(list, i, item); | |
| if (res != 0) | |
| stop(py_fetch_error()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is one of the frustrating bits of the Python API: most functions return new references, but some 'borrow' or 'steal' a reference and so have implications on how reference counting is performed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I should have connected the dots ... (had read about e.g. PyList_SetItem() and also seen PyObjectPtr* in the piece of code I was taking as an example...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hi Kevin,
is this acceptable (it adds
datetime.h).Execution time is about 1/4 for this:
dates <- replicate(100, Sys.Date(), simplify = FALSE)Old:
New:
The code doesn't run on Python 2, but before investigating I wanted to ask if we can live with adding
datetime.h... if yes, there may be more places we can use it ...Thanks for reviewing! As an aside, how long are we going to continue to support Python 2?