You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature Request: Hybrid Caching Strategy for Similar Questions
Overview
This feature proposal introduces a hybrid caching strategy for Maeser that leverages both fuzzy string matching and embedding-based semantic similarity. The goal is to efficiently cache answers for questions asked by many students—even when the wording varies slightly—without requiring an excessively large cache.
Key Objectives:
Reduce API Costs: Avoid unnecessary API calls by caching similar questions.
Improve Response Time: Quickly return answers for similar or near-identical queries.
Maintain Accuracy: Use semantic analysis when fuzzy matching indicates a moderate similarity.
Ease of Management: Implement an eviction and update policy to keep the cache current and within size limits.
Proposed Approach
When a new question is asked, the following two-step process is used:
Fuzzy Matching (Lexical Check):
High Threshold (≥95%):
If a cached question scores 95% or higher using fuzzy matching (e.g., with the fuzzywuzzy library), we assume it is the same and return the cached answer immediately.
Moderate Threshold (80% to 95%):
If the fuzzy match score is between 80% and 95%, use embedding-based semantic comparison for further verification.
Low Threshold (<80%):
If the fuzzy match score is less than 80%, the question is considered too different, and a fresh API call will be made.
Embedding-Based Semantic Comparison:
For questions with moderate fuzzy scores, generate embeddings (using, for example, OpenAI's text-embedding-ada-002).
Compare the new question's embedding with that of the cached question using cosine similarity.
If the cosine similarity is above a set threshold (e.g., around 0.9), treat it as a cache hit.
Fresh API Call:
If no cached answer is found through the above checks, call the API, cache the new question and answer, and return the result.
Caching Process
1. Adding to the Cache
When a new answer is generated, cache the following:
The question text
The generated answer
The embedding for the question
A timestamp for tracking purposes (for TTL or update policies)
Below is the Python code to generate embeddings and cache an answer:
importtimeimportnumpyasnpimportopenai# Set your OpenAI API keyopenai.api_key="YOUR_OPENAI_API_KEY"defget_embedding(text: str, model: str="text-embedding-ada-002") ->np.ndarray:
""" Generate an embedding for the provided text using the OpenAI API. """response=openai.Embedding.create(input=[text], model=model)
embedding=response['data'][0]['embedding']
returnnp.array(embedding)
# In-memory cache: {question: {"embedding": np.ndarray, "answer": str, "timestamp": float}}embedding_cache= {}
defcache_answer(question: str, answer: str):
""" Cache the answer along with the generated embedding and a timestamp. """embedding=get_embedding(question)
embedding_cache[question] = {
"embedding": embedding,
"answer": answer,
"timestamp": time.time()
}
2. Retrieving from the cache
A. Fuzzy Matching Code (Using fuzzywuzzy)
Below is the fuzzy matching code compares the new question to cached questions and returns a hit if the score is high:
fromfuzzywuzzyimportfuzzdefget_cached_answer_with_fuzzy(new_question: str, threshold: int=95):
""" Return a cached answer if any cached question has a fuzzy match score above the given threshold. """forcached_question, datainembedding_cache.items():
score=fuzz.token_set_ratio(new_question.lower(), cached_question.lower())
ifscore>=threshold:
print(f"Fuzzy match hit: '{cached_question}' with score {score}")
returndata["answer"]
returnNone
B. Combined Retrieval Using Fuzzy Matching and Embedding-Based Validation
For cases where the fuzzy match score is moderate (80%-95%), we use embeddings to verify semantic similarity:
defcosine_similarity(a: np.ndarray, b: np.ndarray) ->float:
""" Calculate the cosine similarity between two vectors. """returnnp.dot(a, b) / (np.linalg.norm(a) *np.linalg.norm(b))
defget_cached_answer(new_question: str, fuzzy_high: int=95, fuzzy_low: int=80, cosine_threshold: float=0.9):
""" Retrieve a cached answer by first using fuzzy matching to find the best candidate. If the best fuzzy match score is: - ≥ fuzzy_high: Return the cached answer immediately. - Between fuzzy_low and fuzzy_high: Validate with embedding cosine similarity. - < fuzzy_low: Consider it too different, and return None. """best_score=0best_candidate=None# Step 1: Fuzzy matching to find the best candidateforcached_question, datainembedding_cache.items():
score=fuzz.token_set_ratio(new_question.lower(), cached_question.lower())
ifscore>best_score:
best_score=scorebest_candidate= (cached_question, data)
ifbest_candidateisNone:
returnNone# If fuzzy score is very high, return immediately.ifbest_score>=fuzzy_high:
print(f"Immediate fuzzy match: '{best_candidate[0]}' with score {best_score}")
returnbest_candidate[1]["answer"]
# For moderate fuzzy scores, verify with embeddings.iffuzzy_low<=best_score<fuzzy_high:
new_embedding=get_embedding(new_question)
candidate_embedding=best_candidate[1]["embedding"]
cosine_sim=cosine_similarity(new_embedding, candidate_embedding)
ifcosine_sim>=cosine_threshold:
print(f"Embedding match: '{best_candidate[0]}' with cosine similarity {cosine_sim:.2f}")
returnbest_candidate[1]["answer"]
# If no candidate is sufficiently similar, return None.returnNone
C. Usage Example: Combining Both Methods
Below is an example function that demonstrates how to combine fuzzy matching and embedding-based verification. It first checks for a strong fuzzy match, then falls back on embedding-based comparison if needed:
defretrieve_answer(new_question: str):
""" Retrieve an answer from the cache using a hybrid approach: 1. Try a direct fuzzy match (threshold ≥ 95%). 2. If not, try combined fuzzy and embedding-based verification. 3. If still no match, return None (to trigger an API call). """# First, try a direct fuzzy match.answer=get_cached_answer_with_fuzzy(new_question, threshold=95)
ifanswer:
returnanswer# Next, try the combined approach.answer=get_cached_answer(new_question, fuzzy_high=95, fuzzy_low=80, cosine_threshold=0.9)
returnanswer# Example usage:question="How do I implement caching for similar questions?"cached_answer=retrieve_answer(question)
ifcached_answer:
print("Cached answer found:", cached_answer)
else:
print("No cached answer found. Proceeding with API call.")
3. Removing and Updating Cache Entries
A. Cache Eviction (TTL-Based)
Remove entries older than a specified Time-To-Live (TTL):
CACHE_TTL=3600# 1 hour in secondsdefremove_expired_entries():
""" Remove cache entries that have exceeded their TTL. """current_time=time.time()
keys_to_remove= [keyforkey, datainembedding_cache.items()
ifcurrent_time-data["timestamp"] >CACHE_TTL]
forkeyinkeys_to_remove:
delembedding_cache[key]
B. Updating an Existing Cache Entry
If the answer for a cached question changes or needs refreshing:
defupdate_cache(question: str, new_answer: str):
""" Update an existing cache entry with a new answer and a refreshed embedding. """embedding=get_embedding(question)
embedding_cache[question] = {
"embedding": embedding,
"answer": new_answer,
"timestamp": time.time()
}
Overall Process Flow
User Query:
Attempt to retrieve an answer from the cache using retrieve_answer(new_question).
If a cached answer is found (either via direct fuzzy match or combined verification), return it.
API Call & Caching:
If no cached answer is found, perform an API call to obtain the answer.
Cache the new question and answer using cache_answer(question, answer).
Maintenance:
Regularly run remove_expired_entries() to keep the cache manageable.
Monitor cache hit rates and adjust thresholds (for both fuzzy matching and cosine similarity) as needed.
Effectiveness and Tuning
Fuzzy Matching Thresholds:
≥95%: Treat as an exact or nearly identical match.
80%–95%: Trigger embedding-based semantic checks.
<80%: Consider queries different, leading to a fresh API call.
Embedding Similarity:
A cosine similarity threshold of around 0.9 is a good starting point but may require tuning with actual data.
Benefits:
Speed: Fuzzy matching is fast and handles near-identical queries quickly.
Accuracy: Embedding verification ensures semantically similar queries are recognized.
Cost Efficiency: Reduces expensive API calls by caching similar questions intelligently.
Scalability: Regular eviction and update processes keep the cache size manageable.
Conclusion
Implementing this hybrid caching strategy—combining fuzzy string matching with embedding-based semantic checks—will enable Maeser to quickly and efficiently respond to similar questions from students. This approach minimizes API costs, improves response times, and maintains answer accuracy. Feedback and fine-tuning based on real-world usage will be key to achieving optimal performance.
The text was updated successfully, but these errors were encountered:
r-i-c-e-b-o-y
changed the title
Add caching to quickly answer repeated questions
UPDATED: Add caching to quickly answer repeated questions
Mar 28, 2025
It would be interesting if you could keep track of how many times the question is made. This way frequently asked questions would have more weight than questions that are only made once.
Feature Request: Hybrid Caching Strategy for Similar Questions
Overview
This feature proposal introduces a hybrid caching strategy for Maeser that leverages both fuzzy string matching and embedding-based semantic similarity. The goal is to efficiently cache answers for questions asked by many students—even when the wording varies slightly—without requiring an excessively large cache.
Key Objectives:
Proposed Approach
When a new question is asked, the following two-step process is used:
Fuzzy Matching (Lexical Check):
If a cached question scores 95% or higher using fuzzy matching (e.g., with the
fuzzywuzzy
library), we assume it is the same and return the cached answer immediately.If the fuzzy match score is between 80% and 95%, use embedding-based semantic comparison for further verification.
If the fuzzy match score is less than 80%, the question is considered too different, and a fresh API call will be made.
Embedding-Based Semantic Comparison:
text-embedding-ada-002
).Fresh API Call:
Caching Process
1. Adding to the Cache
When a new answer is generated, cache the following:
Below is the Python code to generate embeddings and cache an answer:
2. Retrieving from the cache
A. Fuzzy Matching Code (Using fuzzywuzzy)
Below is the fuzzy matching code compares the new question to cached questions and returns a hit if the score is high:
B. Combined Retrieval Using Fuzzy Matching and Embedding-Based Validation
For cases where the fuzzy match score is moderate (80%-95%), we use embeddings to verify semantic similarity:
C. Usage Example: Combining Both Methods
Below is an example function that demonstrates how to combine fuzzy matching and embedding-based verification. It first checks for a strong fuzzy match, then falls back on embedding-based comparison if needed:
3. Removing and Updating Cache Entries
A. Cache Eviction (TTL-Based)
Remove entries older than a specified Time-To-Live (TTL):
B. Updating an Existing Cache Entry
If the answer for a cached question changes or needs refreshing:
Overall Process Flow
User Query:
retrieve_answer(new_question)
.API Call & Caching:
cache_answer(question, answer)
.Maintenance:
remove_expired_entries()
to keep the cache manageable.Effectiveness and Tuning
Fuzzy Matching Thresholds:
Embedding Similarity:
Benefits:
Conclusion
Implementing this hybrid caching strategy—combining fuzzy string matching with embedding-based semantic checks—will enable Maeser to quickly and efficiently respond to similar questions from students. This approach minimizes API costs, improves response times, and maintains answer accuracy. Feedback and fine-tuning based on real-world usage will be key to achieving optimal performance.
The text was updated successfully, but these errors were encountered: