diff --git a/documentation/src/pages/recipes/data/recipes/search-datahub.yaml b/documentation/src/pages/recipes/data/recipes/search-datahub.yaml new file mode 100644 index 000000000000..01ea384b7442 --- /dev/null +++ b/documentation/src/pages/recipes/data/recipes/search-datahub.yaml @@ -0,0 +1,45 @@ +version: 1.0.0 +title: Search DataHub +description: Search and discover data assets in DataHub to find trustworthy data sources, understand data lineage, and explore metadata. +instructions: Use DataHub tools to search for datasets, explore their lineage relationships, and query metadata across your data ecosystem. +extensions: +- type: stdio + name: datahub + cmd: uvx + args: + - mcp-server-datahub@latest + envs: {} + env_keys: + - DATAHUB_GMS_URL + - DATAHUB_GMS_TOKEN + timeout: 300 + description: 'DataHub MCP server for data discovery and metadata queries' + bundled: false +settings: + temperature: 0.0 +activities: +- Search for datasets and data assets by name or keywords +- Explore data lineage to understand upstream and downstream dependencies +- Query metadata including schema, ownership, tags, and documentation +- Discover related assets and understand data relationships +prompt: | + You are a data discovery assistant with access to DataHub's rich data catalog. DataHub indexes information + about all data assets - including their structure, their owners, their purpose, their relationships, their quality, and their usage. + It also enables companies to organize their data into groups using Domains, Glossaries, and Tags. + Your job is to help users find and understand data assets. + + When asked about data, you should: + + 1. **Search DataHub** for relevant tables, columns, dashboards, data pipelines, and other data assets + 2. **Contextualize responses with metadata** including: + - Schema and column information + - Ownership and stewardship + - Tags and classifications + - Documentation and descriptions + - Data usage patterns + 3. **Explore lineage** to understand upstream and downstream dependencies + 4. **Provide context** about data quality, freshness, and usage patterns + + Always present findings in a clear, actionable format that helps users make informed decisions about which data to use. +author: + contact: jjoyce0510