Skip to content

docs: new dataset source, from vega_dataset to altair.dataset#3859

Merged
mattijn merged 18 commits intomainfrom
new-dataset-source
Oct 26, 2025
Merged

docs: new dataset source, from vega_dataset to altair.dataset#3859
mattijn merged 18 commits intomainfrom
new-dataset-source

Conversation

@mattijn
Copy link
Contributor

@mattijn mattijn commented Jul 16, 2025

Updated test expectations and example files to work with the new vega-datasets source. Fixed field name references from underscores to spaces (eg., IMDB_Rating → IMDB Rating) and updated row count expectations in transformed data tests.

@jonmmease
Copy link
Contributor

ValueError: DataFusion error: Object Store error: Generic HTTP error: Header

Is that then end of the error? Was expecting the nature of the header error to follow.

@mattijn
Copy link
Contributor Author

mattijn commented Jul 17, 2025

I was able to make a minimal working example reproducing the error with only Vega and VegaFusion:

Open the Chart in the Vega Editor

import json
import vegafusion as vf

vega_spec = json.loads("""
{
  "$schema": "https://vega.github.io/schema/vega/v6.json",
  "data": [
    {
      "name": "source_0",
      "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v3.2.0/data/co2-concentration.csv",
      "format": {
        "type": "csv",
        "parse": {
          "Date": "date"
        }
      }
    }
  ]
}
""")

# This will raise the HTTP error
datasets, warnings = vf.runtime.pre_transform_datasets(vega_spec, ["source_0"])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 24
      4 vega_spec = json.loads("""
      5 {
      6   "$schema": "https://vega.github.io/schema/vega/v6.json",
   (...)
     20 
     21 """)
     23 # This will raise the HTTP error
---> 24 datasets, warnings = vf.runtime.pre_transform_datasets(vega_spec, ["source_0"])

File ~/miniconda3/envs/stable/lib/python3.12/site-packages/vegafusion/runtime.py:535, in VegaFusionRuntime.pre_transform_datasets(self, spec, datasets, local_tz, default_input_tz, row_limit, inline_datasets, trim_unused_columns, dataset_format)
    527 # Serialize inline datasets
    528 inline_arrow_dataset = self._import_inline_datasets(
    529     inline_datasets,
    530     inline_dataset_usage=get_inline_column_usage(spec)
    531     if trim_unused_columns
    532     else None,
    533 )
--> 535 values, warnings = self.runtime.pre_transform_datasets(
    536     spec,
    537     pre_tx_vars,
    538     local_tz=local_tz,
    539     default_input_tz=default_input_tz,
    540     row_limit=row_limit,
    541     inline_datasets=inline_arrow_dataset,
    542 )
    544 def normalize_timezones(
    545     dfs: list[nw.DataFrame[IntoFrameT] | nw.LazyFrame[IntoFrameT]],
    546 ) -> list[DataFrameLike]:
    547     # Convert to `local_tz` (or, set to UTC and then convert if starting
    548     # from time-zone-naive data), then extract the native DataFrame to return.
    549     processed_datasets = []

ValueError: DataFusion error: Object Store error: Generic HTTP error: Header

@mattijn mattijn changed the title doc: new dataset source, from vega_dataset to altair.dataset docs: new dataset source, from vega_dataset to altair.dataset Sep 24, 2025
@mattijn
Copy link
Contributor Author

mattijn commented Oct 11, 2025

Finally all tests are passing! @joelostblom would you be willing to do a review?

@dsmedia
Copy link
Contributor

dsmedia commented Oct 12, 2025

@mattijn Looks like in 577ed87 you had to remove iris.json because it's in the legacy altair-viz/vega_datasets but not in vega/vega-datasets. Do I understand that right?

If so, would it help you if I added iris.json to vega/vega-datasets? For future development plans, I'd like to have a clean match between the datasets pulled in by example-gallery examples from all the vega visualization libraries and the vega-datasets repo. So this would be beneficial even if you decide not to undo 577ed87.

@mattijn
Copy link
Contributor Author

mattijn commented Oct 12, 2025

That dataset was removed from vega/vega-datasets a few years ago, in this PR vega/vega-datasets#187.

@joelostblom
Copy link
Contributor

Thanks for all the work you have done on getting the sample data integrating into altair @mattijn ! I will aim to have a look at this in more detail during next week. One question I have now already is how we are releasing and announcing the changes to the sample data import convention.

My understanding is that we are not breaking anything from vega-datasets and that old code will still work as before #3848. So this is probably released with a minor version bump? Still, the fact that the column names have changed in the examples might be confusing without an explicit note somewhere. Maybe it is enough to clearly spell this out in the release notes and we don't need to also note it in the documentation? Although when we added the new method-based syntax for channel options, we included a note in the docs like so:

image

Do you think we should include a similar callout somewhere in the docs (maybe in the "Specifying Data" page) to emphasize the move to altair.datasets?

@mattijn
Copy link
Contributor Author

mattijn commented Oct 12, 2025

On the main branch, we moved to vegalite version 6 already, so I think we can bump this as part of a major release. I like the suggestion to have a note in the specifying data section, "With the release of Altair 6 etc"

@mattijn
Copy link
Contributor Author

mattijn commented Oct 12, 2025

Add a note in this commit b1c648f at the end of the first paragraph in this section: https://altair-viz.github.io/gallery/index.html, where the source of the datasets in the examples are discussed.

Copy link
Contributor

@joelostblom joelostblom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! Thanks for all your work getting this updated! I think it is really convenient that the same import convention is kept while having the data in the altair package directly.

I clicked through the entire user guides and the examples, and noticed a few places where the charts were not rendering, which I have commented on inline. I noticed a couple of things not related to this PR too which I will fix separately.

Add a note in this commit b1c648f at the end of the first paragraph in this section: https://altair-viz.github.io/gallery/index.html, where the source of the datasets in the examples are discussed.

Looks good!

@mattijn mattijn dismissed joelostblom’s stale review October 26, 2025 14:52

address all feedback from @joelostblom by the following commit: aa5d353

@mattijn
Copy link
Contributor Author

mattijn commented Oct 26, 2025

Thanks @joelostblom for the review! Addressed all of them. Had a thorough check as well, and all charts are rendering now, as far as I can see. Merging this PR since all tests are green on CI 🥳

@mattijn mattijn merged commit 3f87ca1 into main Oct 26, 2025
25 checks passed
dsmedia added a commit to dsmedia/vega-datasets that referenced this pull request Oct 27, 2025
Altair PR #3859 (merged 2025-10-26) migrated from vega_datasets package
to altair.datasets module with canonical vega-datasets naming. This
updates the gallery examples collection to track Altair v6+ main branch.

Changes:
- Empty [altair.name_mapping] section (was: londonBoroughs → london_boroughs)
- Comments now document legacy v5.x support instead of temporary workaround
- Add pattern for fully qualified altair.datasets.data.X.url syntax
- Refactor extract_altair_api_datasets() with explicit name_mapping parameter
- Regenerate gallery_examples.json (470 examples, all with canonical names)

Type safety improvements:
- extract_altair_api_datasets() now accepts name_mapping as parameter
  instead of accessing global _config directly
- Explicit None default for Altair v6+ (no mapping needed)
- Better testability and separation of concerns

Backward compatibility:
- Mapping section preserved (empty) with documentation for v5.x users
- Historical camelCase examples commented out for reference
- Function signature supports both v5 (with mapping) and v6 (without)

Configuration notes:
- Currently tracks Altair main branch (v6+ development)
- Git ref hardcoded in Python script (line 1135) - documented in TOML
- Stability note added: consider pinning to release tag when v6.0.0 available
- Testing procedure documented for v5.x regression testing

All three galleries (Vega, Vega-Lite, Altair) now use consistent
canonical dataset naming from datapackage.json.

Related: vega/altair#3859

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants