-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommendation to always setDT() an object from disk is not well-documented #6888
Comments
"I’d like to propose improving documentation for #6886 by: Any suggestions from your side or may I proceed with drafting these changes? |
I'm fairly certain this is already documented somewhere, so please thoroughly review the Rd /vignette documentation and top StackOverflow answers in the data.table tag to see if I missed something. There is already a pretty extensive warning produced by |
After checking through official documentation ,I still unable to find anything that recommend to setDT() an object from disk. But below this Stack Overflow answer aligns with the truelength behavior described in the documentation. It provides a consequence of not handling truelength == 0 and suggests to use alloc.col() or setDT() before performing by-reference modifications. |
data.table/vignettes/datatable-faq.Rmd Line 623 in 2cb0316
Hi @MichaelChirico , I think it is a bit related section in datatable-faq.rmd that discusses how *.RDS and *.RData files lose column over-allocation when storing data.tables. It recommends using setalloccol() after loading the data, but it does not mention setDT(). Since the issue is about explicitly recommending setDT() for objects loaded from disk, should we:
|
@MichaelChirico I don't think you can document it anywhere where users will reliably find it when they need to know that they need |
I think the FAQ is where I'd go. Generally one of these will hit you: Line 122 in 476de7e
Line 520 in 476de7e
Masking seems workable, but I'm not personally a fan of masking {base} functions, not sure how the other maintainers feel.
Nice find! Yes, let's extend that section to be sure the part about |
Thanks for the clarification, @MichaelChirico Updating the FAQ to explicitly mention setDT() alongside setalloccol() will help ensure users are aware of this recommendation when loading data.table objects from disk. I will proceed with updating that section to clearly state this and ensure that both ?setDT and ?truelength link to the FAQ for better visibility. |
For #6886, I was trying to find where we document this recommendation and came up short. I checked in several places:
?setDT
?truelength
This last one is the only place where one could reasonably infer the recommendation:
data.table/man/truelength.Rd
Line 32 in 77b4394
But even there it basically requires foreknowledge of the issue to have distilled it correctly.
I am sure it's documented somewhere, but anyway, it should also be linked in at least one vignette, one Rd page.
The text was updated successfully, but these errors were encountered: