Skip to content

Additional metadata for datasets #629

@dangotbanned

Description

@dangotbanned

Note

Originally posted by @dangotbanned in vega/altair#3631 (comment)
I thought this would be easier to discuss in a new issue over here

Let me know if there is anything you need from the vega-daatsets repo side. Would it make sense for some logic to live there so that it gets updated if we add a new datasets for example?

@domoritz right now the additional metadata I'm adding is described above, and a preview of metadata.parquet is in vega/altair#3631 (comment)

I'm not too concerned about the logic for this living in altair.
The only static part is ext_supported - which is somewhat altair/python-specific:

 Extension: TypeAlias = Literal[".csv", ".json", ".tsv", ".arrow", ".parquet"] 
 EXTENSION_SUFFIXES = (".csv", ".json", ".tsv", ".arrow", ".parquet") 
  
  
 def is_ext_read(suffix: Any) -> TypeIs[Extension]: 
     return suffix in {".csv", ".json", ".tsv", ".arrow", ".parquet"} 

For the current datasets, this is pretty much just to avoid .png.
Maybe that could just be a tabular flag?

I think we could benefit from other kinds of metadata that may be less trivial to work out on our end:

  • Is spatial (comment)
    • Also if it factors into reading the file - (Geo|Topo)JSON
  • Schema/datatypes/columns
    • Depending on outcome of (comment) - format string(s)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions