Skip to content

[8.19] Load huggingface content datasets (#224543)#226219

Merged
dgieselaar merged 5 commits intoelastic:8.19from
dgieselaar:backport/8.19/pr-224543
Jul 3, 2025
Merged

[8.19] Load huggingface content datasets (#224543)#226219
dgieselaar merged 5 commits intoelastic:8.19from
dgieselaar:backport/8.19/pr-224543

Conversation

@dgieselaar
Copy link
Contributor

Backport

This will backport the following commits from main to 8.19:

Questions ?

Please refer to the Backport tool documentation

Implements a huggingface dataset loader for RAG evals - see
[x-pack/platform/packages/shared/kbn-ai-tools-cli/src/hf_dataset_loader/README.md](https://github.com/dgieselaar/kibana/blob/hf-dataset-loader/x-pack/platform/packages/shared/kbn-ai-tools-cli/src/hf_dataset_loader/README.md).
Additionally, a `@kbn/cache-cli` tool was added that allows tooling
authors to cache to disk (possibly remote storage later).

Used o3 for finding datasets on HuggingFace and doing an initial pass on
a line-by-line dataset processor ([see
conversation](https://chatgpt.com/share/6853e49a-e870-8000-9c65-f7a5a3a72af0))

Libraries added:

- `cache-manager`, `cache-manager-fs-hash`, `keyv`,
`@types/cache-manager-fs-hash`: caching libraries and plugins. could not
find any existing caching libraries in the repo.
- `@huggingface/hub`: api client for HF.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit 7d20301)

# Conflicts:
#	.github/CODEOWNERS
#	tsconfig.base.json
#	yarn.lock
@dgieselaar dgieselaar requested a review from kibanamachine as a code owner July 2, 2025 13:51
@dgieselaar dgieselaar added the backport This PR is a backport of another PR label Jul 2, 2025
@dgieselaar dgieselaar enabled auto-merge (squash) July 2, 2025 13:51
…t --include-path /api/status --include-path /api/alerting/rule/ --include-path /api/alerting/rules --include-path /api/actions --include-path /api/security/role --include-path /api/spaces --include-path /api/dashboards --include-path /api/maintenance_window --update --no-serverless'
@elasticmachine
Copy link
Contributor

elasticmachine commented Jul 2, 2025

💔 Build Failed

Failed CI Steps

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/ai-tools-cli - 11 +11
@kbn/cache-cli - 9 +9
total +20

Any counts in public APIs

Total count of every any typed public API. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats any for more detailed information.

id before after diff
@kbn/ai-tools-cli - 1 +1

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
@kbn/cache-cli - 2 +2
Unknown metric groups

API count

id before after diff
@kbn/ai-tools-cli - 18 +18
@kbn/cache-cli - 9 +9
total +27

History

@dgieselaar
Copy link
Contributor Author

@elasticmachine merge upstream

@dgieselaar dgieselaar merged commit e10a8dc into elastic:8.19 Jul 3, 2025
8 checks passed
@dgieselaar dgieselaar deleted the backport/8.19/pr-224543 branch July 3, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR is a backport of another PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants