diff --git a/README.md b/README.md index 407b52a..474749e 100644 --- a/README.md +++ b/README.md @@ -71,12 +71,14 @@ Download or [clone](https://www.mathworks.com/help/matlab/matlab_prog/use-source ## Example: Classify Text Data Using BERT The simplest use of a pretrained BERT model is to use it as a feature extractor. In particular, you can use the BERT model to convert documents to feature vectors which you can then use as inputs to train a deep learning classification network. -The example [`ClassifyTextDataUsingBERT.m`](./ClassifyTextDataUsingBERT.m) shows how to use a pretrained BERT model to classify failure events given a data set of factory reports. +The example [`ClassifyTextDataUsingBERT.m`](./ClassifyTextDataUsingBERT.m) shows how to use a pretrained BERT model to classify failure events given a data set of factory reports. This example requires the `factoryReports.csv` data set from the Text Analytics Toolbox example [Prepare Text Data for Analysis](https://www.mathworks.com/help/textanalytics/ug/prepare-text-data-for-analysis.html). ## Example: Fine-Tune Pretrained BERT Model To get the most out of a pretrained BERT model, you can retrain and fine tune the BERT parameters weights for your task. -The example [`FineTuneBERT.m`](./FineTuneBERT.m) shows how to fine-tune a pretrained BERT model to classify failure events given a data set of factory reports. +The example [`FineTuneBERT.m`](./FineTuneBERT.m) shows how to fine-tune a pretrained BERT model to classify failure events given a data set of factory reports. This example requires the `factoryReports.csv` data set from the Text Analytics Toolbox example [Prepare Text Data for Analysis](https://www.mathworks.com/help/textanalytics/ug/prepare-text-data-for-analysis.html). + +The example [`FineTuneBERTJapanese.m`](./FineTuneBERTJapanese.m) shows the same workflow using a pretrained Japanese-BERT model. This example requires the `factoryReportsJP.csv` data set from the Text Analytics Toolbox example [Analyze Japanese Text Data](https://www.mathworks.com/help/textanalytics/ug/analyze-japanese-text.html), available in R2023a or later. ## Example: Analyze Sentiment with FinBERT FinBERT is a sentiment analysis model trained on financial text data and fine-tuned for sentiment analysis. diff --git a/predictMaskedToken.m b/predictMaskedToken.m index 7aeeb6b..45a8452 100644 --- a/predictMaskedToken.m +++ b/predictMaskedToken.m @@ -6,7 +6,7 @@ % replaces instances of mdl.Tokenizer.MaskToken in the string text with % the most likely token according to the BERT model mdl. -% Copyright 2021 The MathWorks, Inc. +% Copyright 2021-2023 The MathWorks, Inc. arguments mdl {mustBeA(mdl,'struct')} str {mustBeText} @@ -44,7 +44,7 @@ tokens = fulltok.tokenize(pieces(i)); if ~isempty(tokens) % "" tokenizes to empty - awkward - x = cat(2,x,fulltok.encode(tokens)); + x = cat(2,x,fulltok.encode(tokens{1})); end if i