Frictionless: another JSON-based tabular metadata system #206
Replies: 2 comments 1 reply
-
Yes! In fact, Recap uses frictionless for CSV/TSV schema inference. I even wrote a Fricitonless <> Recap schema converter. :) Indeed, I've been surveying a bunch of different systems. There is some good discussion here: https://github.com/recap-cloud/recap-schema-spec/issues/7
There is certainly overlap with Frictionless' table spec, but I think our uses cases are a bit differ in a way that affects choices we'd make about the type system. This is particularly acute in the type system. Frictionless borrows heavily from the JSON Schema spec. I'm borrowing much more heavily from IDL systems and Arrow, which have a much more robust type system. Recap's constraints, however, have a lot of overlap with Frictionless, which is very validating. (pun intended) |
Beta Was this translation helpful? Give feedback.
-
Hi, It would be interesting if we could support Recap Schema as a part of the Data Package, as it doesn't require using only Table Schema for describing tables. Is there a place where one can read the Recap spec? |
Beta Was this translation helpful? Give feedback.
-
Just wondering if y'all had looked at using or extending existing systems for defining metadata for tabular data.
One we've used a bit is the Frictionless Framework and their tabular data package standard which looks like it covers everything that currently exists in your
recap.metadata.Schema
and more.In the last couple of years they've branched out to support annotating many more tabular data storage formats -- it used to jsut be focused on CSV, but now works with Parquet, generic SQL, pandas dataframes, spreadsheets, etc. Maybe you could help extend it to work with data warehouse systems too?
It also tries to infer schemas automatically based on the data, and allows constraints to be specified and checked. Last time I looked it was much more row-oriented, but I imagine that extending it to also work with vectorized columnar data would be useful to a lot of people! Deeper integration with all the great validation tooling provided by @pydantic seems like it could be a great contribution too.
I think @roll is the lead maintainer. It's a long-running project of @frictionlessdata and @okfn
Beta Was this translation helpful? Give feedback.
All reactions