-
Notifications
You must be signed in to change notification settings - Fork 8.6k
Load huggingface content datasets #224543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
69f5893
e8fb7a2
1abf55a
f7cfc25
4a32350
99dc7dc
b0d6ba8
35cb56f
ca667ff
dd99ec4
d9691a9
4a479b4
025cd81
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -426,6 +426,7 @@ src/platform/packages/shared/kbn-apm-utils @elastic/obs-ux-infra_services-team | |
| src/platform/packages/shared/kbn-avc-banner @elastic/security-defend-workflows | ||
| src/platform/packages/shared/kbn-axe-config @elastic/appex-qa | ||
| src/platform/packages/shared/kbn-babel-register @elastic/kibana-operations | ||
| src/platform/packages/shared/kbn-cache-cli @elastic/kibana-operations | ||
| src/platform/packages/shared/kbn-calculate-auto @elastic/obs-ux-management-team | ||
| src/platform/packages/shared/kbn-calculate-width-from-char-count @elastic/kibana-visualizations | ||
| src/platform/packages/shared/kbn-cases-components @elastic/response-ops | ||
|
|
@@ -839,6 +840,7 @@ x-pack/platform/packages/shared/file-upload-common @elastic/ml-ui | |
| x-pack/platform/packages/shared/index-lifecycle-management/index_lifecycle_management_common_shared @elastic/kibana-management | ||
| x-pack/platform/packages/shared/index-management/index_management_shared_types @elastic/kibana-management | ||
| x-pack/platform/packages/shared/kbn-ai-assistant @elastic/search-kibana @elastic/obs-ai-assistant | ||
| x-pack/platform/packages/shared/kbn-ai-tools-cli @elastic/appex-ai-infra | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we please add it to CodeQL ignore paths?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what does that do and why should this be excluded? is it a blanket policy for CLI tools?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've briefly covered the reason in your other PR - #218694 (review). Ideally, we wouldn't exclude or ignore anything at all, and hopefully, we'll get there eventually. However, due to our current constraints (CodeQL is slow, resource-hungry, and not super friendly to incremental tests/changes), we're trying to be pragmatic and only cover non-dev/non-test code. At the same time, I'm not yet confident it'll become a permanent policy (at least I hope so), so we haven't documented this anywhere and are handling it on a case-by-case basis.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @azasypkin thanks Oleg and apologies for missing that. Is this something we can automate? The package has the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the interest of time I've added them to the CodeQL ignore paths – it'd be great if we can automate this but I also understand it might not be worth it if we are not sure what the long term plan is.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we definitely want and need to add more automation here, and we're moving (albeit slowly) in that direction. |
||
| x-pack/platform/packages/shared/kbn-alerting-comparators @elastic/response-ops | ||
| x-pack/platform/packages/shared/kbn-apm-types @elastic/obs-ux-infra_services-team | ||
| x-pack/platform/packages/shared/kbn-cloud-security-posture/common @elastic/kibana-cloud-security-posture | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # @kbn/cache-cli | ||
|
|
||
| Centralised caching helpers for scripts and CLIs in the Kibana repo. | ||
|
|
||
| The goal is to make it easy for engineers to cache computationally or I/O expensive operations on disk, or in the future, possible remote. | ||
|
|
||
| --- | ||
|
|
||
| ## Quick start | ||
|
|
||
| ```ts | ||
| import { fromCache, createLocalDirDiskCacheStore } from '@kbn/cache-cli'; | ||
| import { createCache } from 'cache-manager'; | ||
|
|
||
| const DOC_CACHE = createCache({ | ||
| stores: [createLocalDirDiskCacheStore({ dir: 'my_docs', ttl: 60 * 60 /* 1h */ })], | ||
| }); | ||
|
|
||
| const docs = await fromCache('docs', DOC_CACHE, async () => fetchDocs()); | ||
| ``` | ||
|
|
||
| `fromCache(key, cache, cb, validator?)` semantics: | ||
|
|
||
| 1. Tries `cache.get(key)` (skipped when `process.env.DISABLE_KBN_CACHE` is truthy). | ||
| 2. Runs the optional `validator(cached)` – return `false` to force a refresh. | ||
| 3. Calls `cb()` if the cache miss / invalid. | ||
| 4. Persists the fresh value via `cache.set(key, value)` and returns it. | ||
|
|
||
| --- | ||
|
|
||
| ## Available cache stores | ||
|
|
||
| `@kbn/cache-cli` wraps [`cache-manager`](https://github.com/node-cache-manager/node-cache-manager) so any **Keyv compatible** store works. The helpers below ship out-of-the-box: | ||
|
|
||
| | Helper | Backing store | Typical use-case | | ||
| | --------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------------- | | ||
| | `createLocalDirDiskCacheStore({ dir, ttl? })` | `cache-manager-fs-hash` on `<REPO_ROOT>/data/{dir}` | Persist in `./data` with an unknown ttl | | ||
| | `createTmpDirDiskCacheStore({ dir, ttl? })` | `cache-manager-fs-hash` on `<OS_TMP_DIR>/{dir}` | Persist in os tmp dir which might be cleared over restarts | | ||
|
|
||
| --- | ||
|
|
||
| ## Cache invalidation strategies | ||
|
|
||
| 1. **Manual bypass** – set `DISABLE_KBN_CACHE=true` to force fresh data (useful in CI workflows). | ||
| 2. **Time-to-live (TTL)** – pass `ttl` when creating a store to let the backend expire entries automatically. | ||
| 3. **Programmatic validation** – supply the `cacheValidator` callback to `fromCache()`; it receives the cached value and should return `true` when it is still valid. | ||
| 4. **Clear on disk** – delete the relevant directory under `data/` if you need a hard reset. | ||
|
|
||
| Choose whichever fits your script. They can be combined (e.g. a TTL plus a validator). |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| export { createLocalDirDiskCacheStore } from './src/stores/create_local_disk_cache_store'; | ||
| export { createTmpDirDiskCacheStore } from './src/stores/create_tmp_dir_disk_cache_store'; | ||
|
|
||
| export { fromCache } from './src/from_cache'; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| module.exports = { | ||
| preset: '@kbn/test/jest_node', | ||
| rootDir: '../../../../..', | ||
| roots: ['<rootDir>/src/platform/packages/shared/kbn-cache-cli'], | ||
| }; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| { | ||
| "type": "shared-common", | ||
| "id": "@kbn/cache-cli", | ||
| "owner": "@elastic/kibana-operations", | ||
| "group": "platform", | ||
| "visibility": "shared", | ||
| "devOnly": true | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| { | ||
| "name": "@kbn/cache-cli", | ||
| "private": true, | ||
| "version": "1.0.0", | ||
| "license": "Elastic License 2.0 OR AGPL-3.0-only OR SSPL-1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| import { fromCache } from './from_cache'; | ||
| import type { Cache } from 'cache-manager'; | ||
|
|
||
| function createMockCache(): { store: Map<string, unknown>; cache: Cache } { | ||
| const backing = new Map<string, unknown>(); | ||
| const cache = { | ||
| get: jest.fn(async (key: string) => backing.get(key)), | ||
| set: jest.fn(async (key: string, value: unknown) => { | ||
| backing.set(key, value); | ||
| }), | ||
| } as Partial<Cache>; | ||
| return { store: backing, cache: cache as Cache }; | ||
| } | ||
|
|
||
| describe('fromCache', () => { | ||
| const KEY = 'test-key'; | ||
| const NEW_VAL = 'fresh-value'; | ||
|
|
||
| afterEach(() => { | ||
| jest.clearAllMocks(); | ||
| delete process.env.DISABLE_KBN_CLI_CACHE; | ||
| }); | ||
|
|
||
| it('returns the cached value when present', async () => { | ||
| const { cache, store } = createMockCache(); | ||
| store.set(KEY, 'cached-value'); | ||
|
|
||
| const cb = jest.fn().mockResolvedValue(NEW_VAL); | ||
| const result = await fromCache(KEY, cache, cb); | ||
|
|
||
| expect(result).toBe('cached-value'); | ||
| expect(cb).not.toHaveBeenCalled(); | ||
| expect(cache.get).toHaveBeenCalledWith(KEY); | ||
| // value should not be overwritten, but invalidated | ||
| expect(cache.set).toHaveBeenCalledWith(KEY, 'cached-value'); | ||
| }); | ||
|
|
||
| it('bypasses cache when DISABLE_KBN_CACHE env var is set', async () => { | ||
| process.env.DISABLE_KBN_CLI_CACHE = 'true'; | ||
| const { cache } = createMockCache(); | ||
| const cb = jest.fn().mockResolvedValue(NEW_VAL); | ||
|
|
||
| const result = await fromCache(KEY, cache, cb); | ||
|
|
||
| expect(cb).toHaveBeenCalledTimes(1); | ||
| expect(result).toBe(NEW_VAL); | ||
|
|
||
| // still updates the cache with the new value | ||
| expect(cache.set).toHaveBeenCalledWith(KEY, NEW_VAL); | ||
| }); | ||
|
|
||
| it('validates cached value with cacheValidator and recomputes when invalid', async () => { | ||
| const { cache, store } = createMockCache(); | ||
| store.set(KEY, 'stale'); | ||
|
|
||
| const cb = jest.fn().mockResolvedValue(NEW_VAL); | ||
|
|
||
| const validator = jest.fn((val: string) => val === 'fresh-value'); | ||
|
|
||
| const result = await fromCache(KEY, cache, cb, validator); | ||
|
|
||
| expect(validator).toHaveBeenCalledWith('stale'); | ||
| expect(cb).toHaveBeenCalledTimes(1); | ||
| expect(result).toBe(NEW_VAL); | ||
| expect(cache.set).toHaveBeenCalledWith(KEY, NEW_VAL); | ||
| }); | ||
|
|
||
| it('stores newly computed value in cache when no cached value exists', async () => { | ||
| const { cache } = createMockCache(); | ||
| const cb = jest.fn().mockResolvedValue(NEW_VAL); | ||
|
|
||
| const result = await fromCache(KEY, cache, cb); | ||
|
|
||
| expect(result).toBe(NEW_VAL); | ||
| expect(cb).toHaveBeenCalledTimes(1); | ||
| expect(cache.set).toHaveBeenCalledWith(KEY, NEW_VAL); | ||
| }); | ||
| }); |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
| import { Cache } from 'cache-manager'; | ||
|
|
||
| export async function fromCache<T>( | ||
| key: string, | ||
| store: Cache, | ||
| cb: () => Promise<T>, | ||
| cacheValidator?: (val: T) => boolean | ||
| ): Promise<T> { | ||
| let val = process.env.DISABLE_KBN_CLI_CACHE ? undefined : await store.get<T>(key); | ||
|
|
||
| if (val !== undefined && cacheValidator) { | ||
| val = cacheValidator(val) ? val : undefined; | ||
| } | ||
|
|
||
| if (val === undefined) { | ||
| val = await cb(); | ||
| } | ||
|
|
||
| store.set(key, val); | ||
| return val; | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| import DiskStore from 'cache-manager-fs-hash'; | ||
| import { KeyvAdapter } from 'cache-manager'; | ||
| import Path from 'path'; | ||
| import { REPO_ROOT } from '@kbn/repo-info'; | ||
| import { Keyv } from 'keyv'; | ||
|
|
||
| export interface LocalDiskCacheOptions { | ||
| dir: string; | ||
| ttl?: number; | ||
| } | ||
|
|
||
| export function createLocalDirDiskCacheStore(opts: LocalDiskCacheOptions): Keyv { | ||
| const adapter = new KeyvAdapter( | ||
| DiskStore.create({ | ||
| store: DiskStore, | ||
| options: { path: Path.join(REPO_ROOT, 'data', opts.dir), ttl: opts.ttl }, | ||
| }) | ||
| ); | ||
|
|
||
| return new Keyv({ store: adapter }); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| import DiskStore from 'cache-manager-fs-hash'; | ||
| import { KeyvAdapter } from 'cache-manager'; | ||
| import Os from 'os'; | ||
| import Path from 'path'; | ||
| import { Keyv } from 'keyv'; | ||
|
|
||
| export interface TmpDirDiskCacheOptions { | ||
| dir: string; | ||
| ttl?: number; | ||
| } | ||
|
|
||
| export function createTmpDirDiskCacheStore(opts: TmpDirDiskCacheOptions): Keyv { | ||
| const adapter = new KeyvAdapter( | ||
| DiskStore.create({ | ||
| store: DiskStore, | ||
| options: { path: Path.join(Os.tmpdir(), opts.dir), ttl: opts.ttl }, | ||
| }) | ||
| ); | ||
|
|
||
| return new Keyv(adapter); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| { | ||
| "extends": "../../../../../tsconfig.base.json", | ||
| "compilerOptions": { | ||
| "outDir": "target/types", | ||
| "types": [ | ||
| "jest", | ||
| "node" | ||
| ] | ||
| }, | ||
| "include": [ | ||
| "**/*.ts", | ||
| ], | ||
| "exclude": [ | ||
| "target/**/*" | ||
| ], | ||
| "kbn_references": [ | ||
| "@kbn/repo-info", | ||
| ] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please add it to CodeQL ignore paths?