PR Summary: Refactor Embedding Handling and Environment Configuration#7
Merged
HillviewCap merged 5 commits intoDEV/LLMS-TXT-INTEGRATIONfrom Apr 1, 2025
Merged
Conversation
- Install langchain-ollama package - Update embedding_manager.py to use langchain_ollama.embeddings - Correct EMBEDDING_BASE_URL in env_vars.json (remove /v1) - Add detailed error logging in embedding_manager.py - Comment out verbose node insertion failure message in run_processing.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR introduces a more flexible embedding generation system, improves environment variable handling for background processing, and updates configurations to support different embedding models (specifically nomic-embed-text via Ollama).
Key Changes
Embedding Abstraction (EmbeddingManager)
Introduced EmbeddingManager to handle embedding generation logic, allowing for different providers and models
Removed the previous direct OpenAI embedding function (get_embedding) from agent_tools.py
Refactored relevant components to utilize the new EmbeddingManager
Environment Variable Propagation
Modified documentation.py to explicitly pass environment variables to subprocess calls
Ensures processing scripts use the correct settings configured in the Streamlit UI
Configuration Updates
Updated SQL schema for vector dimensions (768 for nomic-embed-text)
Added VECTOR_DIMENSION input field to the environment configuration page
Added langchain-community to requirements.txt
Debugging
Added debug logging statements for better insight into embedding generation and database queries