Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various minor comments related to the text I noticed #3

Closed
bkamins opened this issue Oct 7, 2023 · 13 comments
Closed

Various minor comments related to the text I noticed #3

bkamins opened this issue Oct 7, 2023 · 13 comments

Comments

@bkamins
Copy link
Collaborator

bkamins commented Oct 7, 2023

In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L175:

We can check that the representation is a valid representation of a table using the `Tables.istable` function:

is not 100% accurate. Tables.istable can return false and still the passed object can be a valid table. I would just skip it. Tables.istable is meant mostly for package developers that know the internals of Tables.jl well.

This is a soft comment


In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L168C2-L168C2 I would explain why you use Symdol (and above you used strings). Also maybe consider using CategoricalArrays.jl if you feel that GENDER is categorical?

This is a soft comment


https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L199

This row-major representation can be useful to process data that

Is not 100% accurate. You use "This" word. And the one you use does not have this property. The point is that in general row-wise representation is for larger than RAM data, but not this specific one you present. Also, often "larger than RAM" data will return Tables.istable as false, and only at run-time it is checked that data is indeed tabular.

@juliohm
Copy link
Member

juliohm commented Oct 7, 2023

Thank you for pointing out these issues. If you prefer to submit a PR with suggestions, I can review them there as well.

@juliohm
Copy link
Member

juliohm commented Oct 8, 2023

Tables.istable can return false and still the passed object can be a valid table.

That is a bit surprising. So this function is not mandatory for table types? I will remove the sentence from the text as you suggested.

In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L168C2-L168C2 I would explain why you use Symdol (and above you used strings).

I will try to add a sentence explaining the difference.

Also, often "larger than RAM" data will return Tables.istable as false, and only at run-time it is checked that data is indeed tabular.

That is surprising too. I thought that this trait function should always return true for a table type regardless of the implementation details.

@juliohm
Copy link
Member

juliohm commented Oct 9, 2023

@bkamins do we have concrete examples of infinite tables in Julia currently? Maybe we can add these examples as references in the sentence, and replace "this" by "the" as suggested.

@juliohm
Copy link
Member

juliohm commented Oct 9, 2023

Updated the examples and removed Tables.jl from the dependencies.

@bkamins
Copy link
Collaborator Author

bkamins commented Oct 9, 2023

So this function is not mandatory for table types?

No. See its docstring:

Check if an object has specifically defined that it is a table.
Note that not all valid tables will return true, since it's possible to satisfy the Tables.jl interface at "run-time"

do we have concrete examples of infinite tables in Julia currently?

You can have a table that is an iterator of rows that reads from a stream and returns e.g. a NamedTuple as each row. Generators could often work like this. And generators typically do not define their return type (it is typically Any).

@bkamins bkamins closed this as completed Oct 9, 2023
@ronisbr
Copy link
Collaborator

ronisbr commented Oct 9, 2023

No. See its docstring:

I think this is another example of a problem in the documentation that we have a little time ago (I forgot where). The documentation in https://tables.juliadata.org/stable/#Implementing-the-Interface-(i.e.-becoming-a-Tables.jl-source) states that to become a Tables.jl source, you must have Tables.istable returning true.

IMHO, it will be somewhat problematic to treat those cases in which the object can implement the interface but does not provide information that it indeed implements the interface. Probably we can define something that will return wrong information instead of an error by assuming that an object implements Tables.jl API whereas it does not.

My opinion is that we always treat objects that returns Tables.istable == false as something that does not comply with Tables.jl (it can be a table, but must not be accessed using Tables.jl API).

@juliohm
Copy link
Member

juliohm commented Oct 9, 2023

I agree with @ronisbr, and I feel that something could be done differently.

IMHO, it all boils down to this attempt to represent "infinite" and "finite" tables with the same traits.

@ronisbr
Copy link
Collaborator

ronisbr commented Oct 9, 2023

I found the discussion here:

ronisbr/PrettyTables.jl#220

@bkamins
Copy link
Collaborator Author

bkamins commented Oct 9, 2023

it all boils down to this attempt to represent "infinite" and "finite" tables with the same traits.

Not only. This is a general issue that a vector of structs is a table and since this could be any struct it is hard to cover this. See:

julia> DataFrame([Dict(1=>2), Dict(3=>4)])
2×8 DataFrame
 Row │ slots                              keys                               vals                               ndel   count  age     idxfloor  maxprobe
     │ Array…                             Array…                             Array…                             Int64  Int64  UInt64  Int64     Int64
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ UInt8[0x00, 0x00, 0x00, 0x00, 0x…  [2, 5, 11, 12, 14, 18, 19, 24, 2…  [2, 5, 11, 12, 14, 18, 19, 24, 2…      0      1       1        15         0
   2 │ UInt8[0x00, 0x00, 0x00, 0x00, 0x…  [2, 5, 9, 10, 12, 16, 17, 21, 24…  [2, 5, 9, 10, 12, 16, 17, 21, 24…      0      1       1        14         0

(i.e. vector of dicts is a table)

@bkamins
Copy link
Collaborator Author

bkamins commented Oct 9, 2023

I opened JuliaData/Tables.jl#348 to discuss the issue.

@ronisbr
Copy link
Collaborator

ronisbr commented Oct 9, 2023

Hum, isn't this example a good case for my point? For example, since this is a vector of dictionaries, which reports Tables.istable as false, I think the way PrettyTables prints it is not wrong:

julia> pretty_table([Dict(1=>2), Dict(3=>4)])
┌────────────┐
│     Col. 1 │
├────────────┤
│ Dict(1=>2) │
│ Dict(3=>4) │
└────────────┘

@ronisbr
Copy link
Collaborator

ronisbr commented Oct 9, 2023

Oops, sorry, did not see your comment before posting :) Let's discuss there.

@bkamins
Copy link
Collaborator Author

bkamins commented Oct 9, 2023

That behavior is what I called "stricter rule" followed by a package in my post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants