Skip to content

Conversation

@skeydan
Copy link

@skeydan skeydan commented Nov 13, 2019

Hi Kevin,

is this acceptable (it adds datetime.h).

Execution time is about 1/4 for this: dates <- replicate(100, Sys.Date(), simplify = FALSE)

Old:

Unit: milliseconds
           expr      min       lq     mean   median       uq
 r_to_py(dates) 47.23566 51.19988 65.46687 53.35797 57.47948
      max neval
 1155.049   100

New:

Unit: milliseconds
           expr      min      lq     mean   median       uq      max
 r_to_py(dates) 11.89173 12.7865 14.36604 13.35968 16.22332 22.91708
 neval
   100

The code doesn't run on Python 2, but before investigating I wanted to ask if we can live with adding datetime.h... if yes, there may be more places we can use it ...

Thanks for reviewing! As an aside, how long are we going to continue to support Python 2?

Copy link
Collaborator

@kevinushey kevinushey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'd like some more assurance (if possible) that the layout of structs in datetime.h will not change across different versions of Python (or at least hasn't in the 3.x series)

items[[1]]
else
r_to_py_impl(items, convert)
r_convert_date(x, convert)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to handle length-one lists still (to ensure they're returned as scalars)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure exactly what you mean? I was trying to find what this

if (length(items) == 1)

was for but didn't manage to get into the else branch...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g.

> library(reticulate)
> r_to_py(Sys.Date())
2019-11-15
> r_to_py(c(Sys.Date(), Sys.Date()))
[datetime.date(2019, 11, 15), datetime.date(2019, 11, 15)]

That is, we make a Python list for R vectors of length > 1, and a 'scalar' for R vectors of length 1. Is this behavior still preserved in this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, got it, thanks! Does this look better now?

src/datetime.h Outdated


/* Define structure for C API. */
typedef struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to verify that the size / layout of this struct is stable across different versions of Python? (It seems to not be for Python 2.7, if I understand your comment on the PR correctly)

@skeydan
Copy link
Author

skeydan commented Nov 13, 2019

Hi Kevin,

thanks for reviewing! If I see correctly, the most recent changes to any structs in datetime.h took place about 2 years ago: python/cpython#5032...

Now I'm not really sure what to do - does this kind of volatility basically rule it out? Or should I try to test with different versions of Python 3 (would >= 3.5 suffice in that case?)

Reg. 2.7, I was wondering if one could just branch, r_to_py.Date, on major version (or else I'd investigate) - but the Python 2 question is really only relevant if that other point doesn't rule it out?

@kevinushey
Copy link
Collaborator

kevinushey commented Nov 13, 2019

Now I'm not really sure what to do - does this kind of volatility basically rule it out? Or should I try to test with different versions of Python 3 (would >= 3.5 suffice in that case?)

The fact that this struct is not stable across versions does scare me, especially since it seems there's no attempt to maintain the struct layout as new versions of Python are released.

I think rather than attempting to import and use datetime.h, a safer approach would be to simply iterate and evaluate Python code from the C++ side. That is, we would still call back into "regular" Python evaluation, but just do so from the C++ side rather than through R (where there is more overhead). Does that sound reasonable to you?

I think we would still see similar performance gains, although perhaps not quite as the same level as if we could use the C API directly.

@skeydan
Copy link
Author

skeydan commented Nov 13, 2019

Oh, yes, I think I see what you mean! Honestly I was assuming this (https://docs.python.org/3/c-api/datetime.html) would be the only way to do it using cpython ... but the alternative would be using PyImport_ImportModule to import datetime, and then go from there as usual?

@kevinushey
Copy link
Collaborator

Yep, exactly -- I think we should be able to use PyObject_CallMethod(), after we get a reference to the datetime.time function (which would require PyImport_ImportModule and, I think, PyObject_GetAttrString)

@skeydan
Copy link
Author

skeydan commented Nov 13, 2019

Thanks! Then I'll change it like that tomorrow :-)

@skeydan
Copy link
Author

skeydan commented Nov 15, 2019

Done! Looks like it even got a bit faster:

> microbenchmark(r_to_py(dates))
Unit: milliseconds
           expr      min      lq     mean   median       uq
 r_to_py(dates) 10.85329 11.3757 12.61398 11.81957 13.18172
      max neval
 19.31118   100

src/python.cpp Outdated
PyObject* py_date = PyDate_FromDate(date.getYear(), date.getMonth(), date.getDay());
PyObject* datetime = PyImport_ImportModule("datetime");
PyObject* py_date = PyObject_CallMethod(
datetime, "date", "iii", date.getYear(), date.getMonth(), date.getDay());
Copy link
Collaborator

@kevinushey kevinushey Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very pedantic nitpick: you may want to explicit cast date.getYear() and friends to int, e.g.

static_cast<int>(date.getYear())

While the getYear() method does produce an int, in general, when using variadic C functions, it's good practice to cast arguments to ensure they match the type expected / declared in the format string.

The code is correct as is, but I just want to call it out as a kind of best practice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

if (py_date == NULL) {
stop(py_fetch_error());
}
return py_ref(py_date, convert);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need py_date.detach()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this one to PyObject* now too

if (py_date == NULL) {
stop(py_fetch_error());
}
int res = PyList_SetItem(list, i, py_date);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyList_SetItem "steals" a reference, and so the associated py_date item should either not be wrapped in PyObjectPtr, or explicitly detached. See for example:

reticulate/src/python.cpp

Lines 1309 to 1313 in 64f685a

PyObject* item = r_to_py(RObject(VECTOR_ELT(sexp, i)), convert);
// NOTE: reference to added value is "stolen" by the list
int res = PyList_SetItem(list, i, item);
if (res != 0)
stop(py_fetch_error());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is one of the frustrating bits of the Python API: most functions return new references, but some 'borrow' or 'steal' a reference and so have implications on how reference counting is performed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I should have connected the dots ... (had read about e.g. PyList_SetItem() and also seen PyObjectPtr* in the piece of code I was taking as an example...)

Copy link
Collaborator

@kevinushey kevinushey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kevinushey kevinushey merged commit 4f95aa2 into rstudio:master Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants