Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose source_model parameter for vector-enabled collections #1606

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Hazel-Datastax
Copy link
Contributor

What this PR does:
Expose source_model parameter for vector-enabled collections

Which issue(s) this PR fixes:
Fixes JiraC2-3495

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

Comment on lines -282 to -297
// convert a vector jsonNode from cql table comment to vectorConfig, used for collection
private static VectorConfig.ColumnVectorDefinition fromJson(
JsonNode jsonNode, ObjectMapper objectMapper) {
// dimension, similarityFunction, must exist
int dimension = jsonNode.get("dimension").asInt();
SimilarityFunction similarityFunction =
SimilarityFunction.fromString(jsonNode.get("metric").asText());

return VectorConfig.ColumnVectorDefinition.fromJson(
DocumentConstants.Fields.VECTOR_EMBEDDING_TEXT_FIELD,
dimension,
similarityFunction,
jsonNode,
objectMapper);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused method. It's been removed to VectorConfig class

@Hazel-Datastax Hazel-Datastax marked this pull request as ready for review October 28, 2024 20:46
@Hazel-Datastax Hazel-Datastax requested a review from a team as a code owner October 28, 2024 20:46
Comment on lines 15 to 29
SimilarityFunction.DOT_PRODUCT,
"openai_v3_small",
"openai-v3-small",
SimilarityFunction.DOT_PRODUCT,
"openai_v3_large",
"openai-v3-large",
SimilarityFunction.DOT_PRODUCT,
"bert",
SimilarityFunction.DOT_PRODUCT,
"gecko",
SimilarityFunction.DOT_PRODUCT,
"nv_qa_4",
"nv-qa-4",
SimilarityFunction.DOT_PRODUCT,
"cohere_v3",
"cohere-v3",
SimilarityFunction.DOT_PRODUCT,
"other",
SimilarityFunction.COSINE);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the document, here should be hyphens not underscores. Since changing the value in here will affect the code in tables. I will create another PR to replace those hardcoded values to Emun.

Comment on lines +200 to 229
String sourceModel = vector.sourceModel();
String metric = vector.metric();

// decide sourceModel and metric value
if (sourceModel != null) {
if (metric == null) {
// (1) sourceModel is provided but metric is not - set metric to cosine or dot_product based
// on the map
metric = SUPPORTED_SOURCES.get(sourceModel).getMetric();
}
// (2) both sourceModel and metric are provided - do nothing
} else {
if (metric != null) {
// (3) sourceModel is not provided but metric is - set sourceModel to 'other'
sourceModel = SourceModel.OTHER.getSourceModel();
} else {
// (4) both sourceModel and metric are not provided - set sourceModel to 'other' and metric
// to 'cosine'
sourceModel = SourceModel.OTHER.getSourceModel();
metric = SimilarityFunction.COSINE.getMetric();
}
}

if (service != null) {
// Validate service configuration and auto populate vector dimension.
vectorDimension = validateVectorize.validateService(service, vectorDimension);
vector =
new CreateCollectionCommand.Options.VectorSearchConfig(
vectorDimension, vector.metric(), vector.vectorizeConfig());
vectorDimension, metric, sourceModel, vector.vectorizeConfig());
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to optimize the sourceModel for service? For example, if the user only specifies openai and text-embedding-3-small in the service, do we want to optimize the sourceModel to openai-v3-small. Currently, we will use other as the default value.

Copy link
Collaborator

@vkarpov15 vkarpov15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants