Skip to content

Conversation

@benwtrent
Copy link
Member

This commit makes the two following changes (along with some refactoring)

  • Nlp results will now indicate if the input was truncated or not
  • The default truncation is now none instead of first

@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Oct 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent
Copy link
Member Author

@elasticmachine update branch

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Override
public int hashCode() {
return Objects.hash(Arrays.deepHashCode(inference), resultsField);
int result = Objects.hash(super.hashCode(), resultsField);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int result = Objects.hash(super.hashCode(), resultsField);
return Objects.hash(super.hashCode(), Arrays.deepHashCode(inference), resultsField);

@Override
public int hashCode() {
return Objects.hash(Arrays.hashCode(inference), resultsField);
int result = Objects.hash(super.hashCode(), resultsField);
Copy link
Member

@davidkyle davidkyle Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment; why not return Objects.hash(super.hashCode(), resultsField, Arrays.hashCode(inference))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were auto generated by Intellij :D. So I can update them.

int numTokens = withSpecialTokens ? wordPieceTokens.size() + 2 : wordPieceTokens.size();
boolean isTruncated = false;
if (numTokens > maxSequenceLength) {
isTruncated = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be tempted to put this is the switch case for FIRST & SECOND because NONE does not truncate it throws instead. Same result just a matter of preference

@benwtrent benwtrent added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Oct 28, 2021
@elasticsearchmachine elasticsearchmachine merged commit 375fc77 into elastic:master Oct 28, 2021
@benwtrent benwtrent deleted the feature/ml-update-nlp-truncation branch October 28, 2021 14:41
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 28, 2021
…uncated (elastic#79942)

This commit makes the two following changes (along with some
refactoring)  - Nlp results will now indicate if the input was truncated
or not  - The default truncation is now `none` instead of `first`
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.0

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Oct 28, 2021
…formance

* upstream/master: (153 commits)
  [ML] update truncation default & adding field output when input is truncated (elastic#79942)
  [ML] stop using isAllowedByLicense for model license checks (elastic#79908)
  [ML] Retain built-in ML roles granting Kibana privileges (elastic#80014)
  [Transform] remove old mixed cluster BWC layers, not required for 8x (elastic#79927)
  Increase test timeout for CoordinatorTests testAllSearchesExecuted
  [Transform] add rolling upgrade tests for upgrade endpoint (elastic#79721)
  [ML] Update trained model docs for truncate parameter for bert tokenization (elastic#79652)
  `CoordinatorTests` sometimes needs three term bumps (elastic#79574)
  [ML] Account for service being triggered twice in tests (elastic#80000)
  SearchContext: remove unused variable (elastic#79917)
  Revert "Deprecate resolution loss on date field (elastic#78921)" (elastic#79914)
  Re-enable GeoIpDownloaderIT#testStartWithNoDatabases() (elastic#79907)
  Fix SnapshotBasedIndexRecoveryIT#testSeqNoBasedRecoveryIsUsedAfterPrimaryFailOver (elastic#79469)
  Fix RecoverySourceHandlerTests (elastic#79546)
  SQL: stabilize SqlSearchPageTimeoutIT (elastic#79928)
  Wait 3 seconds for the server to reload trust (elastic#79778)
  Skip automatically preserved request headers when rewriting (elastic#79973)
  Check whether stdout is a real console (elastic#79882)
  Convert remote license checker to use LicensedFeature (elastic#79876)
  Miscellaneous fixes for LDAP SDK v6 upgrade (elastic#79891)
  ...

# Conflicts:
#	libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java
#	libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathGeneratorFilteringTests.java
#	libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathTests.java
elasticsearchmachine pushed a commit that referenced this pull request Oct 28, 2021
…uncated (#79942) (#80022)

This commit makes the two following changes (along with some
refactoring)  - Nlp results will now indicate if the input was truncated
or not  - The default truncation is now `none` instead of `first`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :ml Machine learning >non-issue Team:ML Meta label for the ML team v8.0.0-beta1 v8.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants