Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@
"rating desc"
],
"queryType": "simple",
"sessionId": "mysessionid",
"featuresMode": "enabled",
"scoringStatistics": "global",
"scoringParameters": [
"currentLocation--122.123,44.77233"
],
Expand Down Expand Up @@ -74,6 +77,16 @@
"Fancy <em>Hotel</em>"
]
},
"@search.features": {
"description": {
"uniqueTokenMatches": 1.0,
"similarityScore": 0.023745812
},
"title": {
"uniqueTokenMatches": 1.0,
"similarityScore": 0.016049799
}
},
"description": "Best hotel in town",
"docId": "2",
"title": "Fancy Hotel"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@
"search": "nice hotels",
"searchFields": "title,description",
"searchMode": "any",
"sessionId": "mysessionid",
"featuresMode": "enabled",
"scoringStatistics": "global",
"select": "docId,title,description",
"skip": 0,
"top": 10
Expand Down Expand Up @@ -56,6 +59,9 @@
"minimumCoverage": null,
"orderby": "search.score() desc,rating desc",
"queryType": "simple",
"sessionId": "mysessionid",
"featuresMode": "enabled",
"scoringStatistics": "global",
"scoringParameters": [
"currentLocation--122.123,44.77233"
],
Expand Down Expand Up @@ -86,6 +92,16 @@
"Fancy <em>Hotel</em>"
]
},
"@search.features": {
"description": {
"uniqueTokenMatches": 1.0,
"similarityScore": 0.023745812
},
"title": {
"uniqueTokenMatches": 1.0,
"similarityScore": 0.016049799
}
},
"description": "Best hotel in town",
"docId": "2",
"title": "Fancy Hotel"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,51 @@
"name": "SearchParameters"
}
},
{
"name": "featuresMode",
"in": "query",
"type": "string",
"enum": [
"disabled",
"enabled"
],
"x-ms-enum": {
"name": "FeaturesMode",
"modelAsString": false
},
"x-nullable": false,
"description": "A value that specifies whether the results should include scoring features such as per field similarity.",
"x-ms-parameter-grouping": {
"name": "SearchParameters"
}
},
{
"name": "scoringStatistics",
"in": "query",
"type": "string",
"enum": [
"local",
"global"
],
"x-ms-enum": {
"name": "ScoringStatistics",
"modelAsString": false
},
"x-nullable": false,
"description": "A value that specifies whether we want to calculate scoring statistics (such as document frequency) globally for more consistent scoring, or locally, for lower latency.",
"x-ms-parameter-grouping": {
"name": "SearchParameters"
}
},
{
"name": "sessionId",
"in": "query",
"type": "string",
"description": "A value to be used to create a sticky session, which can help to get more consistent results. As long as the same sessionId is used, a best-effort attempt will be made to target the same replica set. Be wary that reusing the same sessionID values repeatedly can interfere with the load balancing of the requests across replicas and adversely affect the performance of the search service. The value used as sessionId cannot start with a '_' character.",
"x-ms-parameter-grouping": {
"name": "SearchParameters"
}
},
{
"name": "$select",
"in": "query",
Expand Down Expand Up @@ -857,6 +902,27 @@
"additionalProperties": true,
"description": "A single bucket of a facet query result. Reports the number of documents with a field value falling within a particular range or having a particular value or interval."
},
"SearchFeatures": {
"properties": {
"uniqueTokenMatches": {
"type": "number",
"readOnly": true,
"format": "double",
"description": "The number of unique tokens from the search query that matched this field."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always a whole number? what does it being a decimal type mean?

Copy link
Copy Markdown
Member Author

@shmed shmed Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, we only have those 2 features (unique tokens and similarity score). While we expect "unique tokens" to be whole numbers, almost all features we intend on adding won't be whole numbers. We also expect most customers to simply flatten out those structures into vectors of floats to be used for machine learning use cases. While features are defined as strongly typed object, we really see them as a dictionary of floats (where the label/key describe what the float is).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. It's all number to JS fortunately for me. 😄

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If customers want to use this value in a vector of floats for ML training or prediction, that's super easy for them to do regardless of how we return the property. My bigger concern is that every person using this API is going to wonder "Why is that a double? Am I missing something here?" That was my first thought too, anyway.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. That's good feedback, I can see people asking themselves the question. Our vision of the feature response was really just a bag of float describing the query-to-document relationship. The names are labels to describe how the values were calculated, but we never intended for those values to be used as discrete values. As we evolve it, there's even a chance that we start to apply normalization factors on those values to make them more useful. I don't necessarily mind converting some of those values to integer in the swagger if you think it would otherwise make their usage too confusing, but also see value in keeping them the way they are on the server-side.

},
"similarityScore": {
"type": "number",
"readOnly": true,
"format": "double",
"description": "The similarity score computed between the search query and this field."
}
},
"required": [
"uniqueTokenMatches",
"similarityScore"
],
"description": "A list of features describing the scoring of a specific field against the search query."
},
"DocumentSearchResult": {
"properties": {
"@odata.count": {
Expand Down Expand Up @@ -930,6 +996,15 @@
"readOnly": true,
"x-ms-client-name": "Highlights",
"description": "Text fragments from the document that indicate the matching search terms, organized by each applicable field; null if hit highlighting was not enabled for the query."
},
"@search.features": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/SearchFeatures"
},
"readOnly": true,
"x-ms-client-name": "Features",
"description": "description for the feature"
}
},
"additionalProperties": true,
Expand Down Expand Up @@ -1039,6 +1114,30 @@
},
"description": "Specifies the syntax of the search query. The default is 'simple'. Use 'full' if your query uses the Lucene query syntax."
},
"FeaturesMode": {
"type": "string",
"enum": [
"disabled",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this is an enum of disabled`enabled` rather than a simple boolean? Do we expect there to be a third state someday?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we expect to extend this enum as we start adding new "features" through versioning. We will likely add new values such as maybe "enableV2" (not sure on the names yet)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's already shipped on the server side, but maybe having it be "None", "V1", "V2" would make more semantic sense? But yeah, I get the difficulty here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Unfortunately the preview API is already shipped and being used. However I'll bring that suggestion as we revisit the API for GA.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoRest has extensions that allow you to change the name of a swagger enum but not the value you send out over the wire: https://github.com/Azure/autorest/tree/master/docs/extensions#x-ms-enum

If we think we can come up with better names, it would be great to change that sooner than later so we can expose them properly in our new client libraries.

I don't think we need to block this PR on getting that done now though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is we aren't yet sure how the feature API will evolve. We published this as preview to get early feedback on it. We wanted to make the API flexible enough to allow us to use versioning, but aren't really ready to commit to it yet, that's why we started with enabled/disabled with the option to add more values to the enum in the future if needed.

"enabled"
],
"x-ms-enum": {
"name": "FeaturesMode",
"modelAsString": false
},
"description": "A value that specifies whether the results should include scoring features, such as per field similarity. The default is 'disabled'. Use 'enabled' to expose additional scoring features."
},
"ScoringStatistics": {
"type": "string",
"enum": [
"local",
"global"
],
"x-ms-enum": {
"name": "ScoringStatistics",
"modelAsString": false
},
"description": "A value that specifies whether we want to calculate scoring statistics (such as document frequency) globally for more consistent scoring, or locally, for lower latency. The default is 'local'. Use 'global' to aggregate scoring statistics globally before scoring. Using global scoring statistics can increase latency of search queries."
},
"AutocompleteMode": {
"type": "string",
"enum": [
Expand Down Expand Up @@ -1103,6 +1202,18 @@
"$ref": "#/definitions/QueryType",
"description": "A value that specifies the syntax of the search query. The default is 'simple'. Use 'full' if your query uses the Lucene query syntax."
},
"featuresMode": {
"$ref": "#/definitions/FeaturesMode",
"description": "A value that specifies whether the results should include scoring features, such as per field similarity. The default is 'disabled'. Use 'enabled' to expose additional scoring features."
},
"scoringStatistics": {
"$ref": "#/definitions/ScoringStatistics",
"description": "A value that specifies whether we want to calculate scoring statistics (such as document frequency) globally for more consistent scoring, or locally, for lower latency. The default is 'local'. Use 'global' to aggregate scoring statistics globally before scoring. Using global scoring statistics can increase latency of search queries."
},
"sessionId": {
"type": "string",
"description": "A value to be used to create a sticky session, which can help getting more consistent results. As long as the same sessionId is used, a best-effort attempt will be made to target the same replica set. Be wary that reusing the same sessionID values repeatedly can interfere with the load balancing of the requests across replicas and adversely affect the performance of the search service. The value used as sessionId cannot start with a '_' character."
},
"scoringParameters": {
"type": "array",
"items": {
Expand Down