Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: pagefind in public page #2992

Merged
merged 5 commits into from
Dec 20, 2024
Merged

feature: pagefind in public page #2992

merged 5 commits into from
Dec 20, 2024

Conversation

cdxker
Copy link
Member

@cdxker cdxker commented Dec 20, 2024

Added a pagefind option onto public page settings.

This does require new env's mainly. We have an s3 bucket that we use internally, PM me for the ACCESS_KEY and SECRET_KEY. Afterwards, your index will be able to get uploaded into the bucket.

SEARCH_COMPONENT_URL="http://localhost:8000"
PAGEFIND_CDN_BASE_URL="https://pagefind-testing-index.trieve.ai"
S3_ENDPOINT_PAGEFIND=https://pagefind-index-west.s3.us-west-1.amazonaws.com
S3_ACCESS_KEY_PAGEFIND=**************
S3_SECRET_KEY_PAGEFIND=****************************************
S3_BUCKET_PAGEFIND=pagefind-index-west
AWS_REGION_PAGEFIND=us-west-1

image
image

Comment on lines 264 to 310
if dataset_config.PAGEFIND_ENABLED {
let pagefind_worker_message = PagefindIndexWorkerMessage {
dataset_id: payload.dataset_id,
created_at: chrono::Utc::now().naive_utc(),
attempt_number: 0,
};

let serialized_message =
serde_json::to_string(&pagefind_worker_message).map_err(|_| {
ServiceError::InternalServerError(
"Failed to serialize message".to_string(),
)
});

let maybe_redis = redis_pool
.get()
.await
.map_err(|err| ServiceError::BadRequest(err.to_string()));

let response: Result<(), ServiceError> = match (serialized_message.clone(), maybe_redis) {
(Ok(message), Ok(mut redis_conn)) => {
redis::cmd("lpush")
.arg("pagefind-index-ingestion")
.arg(&message)
.query_async::<_, ()>(&mut *redis_conn)
.await
.map_err(|err| ServiceError::BadRequest(err.to_string()))
.map(|_| ())
},
(Err(serial_error), Ok(_)) => {
Err(ServiceError::InternalServerError(format!("couldn't get serialized message {:?}", serial_error)))
}
(Ok(_), Err(redis_error)) => {
Err(ServiceError::InternalServerError(format!("couldn't get redis conn {:?}", redis_error)))
}
(Err(serial_error), Err(redis_error)) => {
Err(ServiceError::InternalServerError(format!("couldn't get serialize message and couldn't redis conn {:?} {:?}", serial_error, redis_error)))
}
};

match response {
Ok(_) => log::info!("Queue'd dataset for pagefind indexing"),
Err(e) => log::error!("Failed to start pagefind indexing {:?}", e)
}

}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we auto trigger a full pagefind reingest, anytime a batch of chunks is updated, we will likely move this logic, if we get charged too much

@cdxker cdxker force-pushed the cd/pagefind-in-publicpage branch from 434a445 to adcb79e Compare December 20, 2024 22:02
@cdxker cdxker force-pushed the cd/pagefind-in-publicpage branch from adcb79e to d2f385e Compare December 20, 2024 22:07
Copy link

@fedhacks fedhacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@fedhacks fedhacks merged commit 98ef056 into main Dec 20, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants