-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[text analytics] add sample stories and improve documents #15429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f56c9c1
93015f1
fd2c6ec
a8412ab
ac52fa3
2fbb59c
8552faa
d82a295
0314c20
db1b14d
2b4e831
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -147,7 +147,7 @@ The input for each operation is passed as a **list** of documents. | |
|
|
||
| Each document can be passed as a string in the list, e.g. | ||
| ```python | ||
| documents = ["I hated the movie. It was so slow!", "The movie made it into my top ten favorites.", "What a great movie!"] | ||
| documents = ["I hated the movie. It was so slow!", "The movie made it into my top ten favorites. What a great movie!"] | ||
| ``` | ||
|
|
||
| or, if you wish to pass in a per-item document `id` or `language`/`country_hint`, they can be passed as a list of | ||
|
|
@@ -158,8 +158,7 @@ or a dict-like representation of the object: | |
| ```python | ||
| documents = [ | ||
| {"id": "1", "language": "en", "text": "I hated the movie. It was so slow!"}, | ||
| {"id": "2", "language": "en", "text": "The movie made it into my top ten favorites."}, | ||
| {"id": "3", "language": "en", "text": "What a great movie!"} | ||
| {"id": "2", "language": "en", "text": "The movie made it into my top ten favorites. What a great movie!"}, | ||
| ] | ||
| ``` | ||
|
|
||
|
|
@@ -210,7 +209,7 @@ endpoint="https://<region>.api.cognitive.microsoft.com/" | |
| text_analytics_client = TextAnalyticsClient(endpoint, credential) | ||
|
|
||
| documents = [ | ||
| "I did not like the restaurant. The food was too spicy.", | ||
| "I did not like the restaurant. The food was somehow both too spicy and underseasoned. Additionally, I thought the location was too far away from the playhouse.", | ||
| "The restaurant was decorated beautifully. The atmosphere was unlike any other restaurant I've been to.", | ||
| "The food was yummy. :)" | ||
| ] | ||
|
|
@@ -244,8 +243,10 @@ endpoint="https://<region>.api.cognitive.microsoft.com/" | |
| text_analytics_client = TextAnalyticsClient(endpoint, credential) | ||
|
|
||
| documents = [ | ||
| "Microsoft was founded by Bill Gates and Paul Allen.", | ||
| "Redmond is a city in King County, Washington, United States, located 15 miles east of Seattle.", | ||
| """ | ||
| Microsoft was founded by Bill Gates and Paul Allen. Its headquarters are located in Redmond. Redmond is a | ||
| city in King County, Washington, United States, located 15 miles east of Seattle. | ||
| """, | ||
| "Jeff bought three dozen eggs because there was a 50% discount." | ||
| ] | ||
|
|
||
|
|
@@ -280,7 +281,7 @@ endpoint="https://<region>.api.cognitive.microsoft.com/" | |
| text_analytics_client = TextAnalyticsClient(endpoint, credential) | ||
|
|
||
| documents = [ | ||
| "Microsoft was founded by Bill Gates and Paul Allen.", | ||
| "Microsoft was founded by Bill Gates and Paul Allen. Its headquarters are located in Redmond.", | ||
| "Easter Island, a Chilean territory, is a remote volcanic island in Polynesia." | ||
| ] | ||
|
|
||
|
|
@@ -318,8 +319,10 @@ endpoint="https://<region>.api.cognitive.microsoft.com/" | |
| text_analytics_client = TextAnalyticsClient(endpoint, credential) | ||
|
|
||
| documents = [ | ||
| "The employee's SSN is 859-98-0987.", | ||
| "The employee's phone number is 555-555-5555." | ||
| """ | ||
| We have an employee called Parker who cleans up after customers. The employee's | ||
| SSN is 859-98-0987, and their phone number is 555-555-5555. | ||
| """ | ||
| ] | ||
| response = text_analytics_client.recognize_pii_entities(documents, language="en") | ||
| result = [doc for doc in response if not doc.is_error] | ||
|
|
@@ -351,8 +354,10 @@ text_analytics_client = TextAnalyticsClient(endpoint, credential) | |
|
|
||
| documents = [ | ||
| "Redmond is a city in King County, Washington, United States, located 15 miles east of Seattle.", | ||
| "I need to take my cat to the veterinarian.", | ||
| "I will travel to South America in the summer." | ||
| """ | ||
| I need to take my cat to the veterinarian. He has been sick recently, and I need to take him | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😿
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so sorry |
||
| before I travel to South America for the summer. | ||
| """, | ||
| ] | ||
|
|
||
| response = text_analytics_client.extract_key_phrases(documents, language="en") | ||
|
|
@@ -379,7 +384,10 @@ endpoint="https://<region>.api.cognitive.microsoft.com/" | |
| text_analytics_client = TextAnalyticsClient(endpoint, credential) | ||
|
|
||
| documents = [ | ||
| "This is written in English.", | ||
| """ | ||
| This whole document is written in English. In order for the whole document to be written | ||
| in English, every sentence also has to be written in English, which it is. | ||
| """, | ||
| "Il documento scritto in italiano.", | ||
| "Dies ist in deutsche Sprache verfasst." | ||
| ] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,11 +37,11 @@ async def alternative_document_input(self): | |
| text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key)) | ||
|
|
||
| documents = [ | ||
| {"id": "0", "language": "en", "text": "I had the best day of my life."}, | ||
| {"id": "1", "language": "en", | ||
| {"id": "0", "country_hint": "US", "text": "I had the best day of my life. I decided to go sky-diving and it made me appreciate my whole life so much more. I developed a deep-connection with my instructor as well."}, | ||
| {"id": "1", "country_hint": "GB", | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just curious, why in some cases the whole object is in one line, and in some it is divided? I can't find the pattern
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For dicts idk if it would work to have it on multiple lines, I'm not sure how it would render. Otherwise, I try to make it as document-like as possible. In cases where I have to print the document, I can't make it as document-like as possible, since that format introduces whitespace that looks weird when printed |
||
| "text": "This was a waste of my time. The speaker put me to sleep."}, | ||
| {"id": "2", "language": "es", "text": "No tengo dinero ni nada que dar..."}, | ||
| {"id": "3", "language": "fr", | ||
| {"id": "2", "country_hint": "MX", "text": "No tengo dinero ni nada que dar..."}, | ||
| {"id": "3", "country_hint": "FR", | ||
| "text": "L'hôtel n'était pas très confortable. L'éclairage était trop sombre."} | ||
| ] | ||
| async with text_analytics_client: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,6 +13,10 @@ | |
| This sample demonstrates how to analyze sentiment in documents. | ||
| An overall and per-sentence sentiment is returned. | ||
|
|
||
| In this sample we will be a skydiving company going through reviews people have left for our company. | ||
| We will extract the reviews that we are certain have a positive sentiment and post them onto our | ||
| website to attract more divers. | ||
|
|
||
| USAGE: | ||
| python sample_analyze_sentiment_async.py | ||
|
|
||
|
|
@@ -28,6 +32,14 @@ | |
| class AnalyzeSentimentSampleAsync(object): | ||
|
|
||
| async def analyze_sentiment_async(self): | ||
| print( | ||
| "In this sample we will be combing through reviews customers have left about their" | ||
| "experience using our skydiving company, Contoso." | ||
| ) | ||
| print( | ||
| "We start out with a list of reviews. Let us extract the reviews we are sure are " | ||
| "positive, so we can display them on our website and get even more customers!" | ||
| ) | ||
| # [START analyze_sentiment_async] | ||
| from azure.core.credentials import AzureKeyCredential | ||
| from azure.ai.textanalytics.aio import TextAnalyticsClient | ||
|
|
@@ -36,38 +48,64 @@ async def analyze_sentiment_async(self): | |
| key = os.environ["AZURE_TEXT_ANALYTICS_KEY"] | ||
|
|
||
| text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key)) | ||
|
|
||
| documents = [ | ||
| "I had the best day of my life.", | ||
| "This was a waste of my time. The speaker put me to sleep.", | ||
| "No tengo dinero ni nada que dar...", | ||
| "L'hôtel n'était pas très confortable. L'éclairage était trop sombre." | ||
| """I had the best day of my life. I decided to go sky-diving and it made me appreciate my whole life so much more. | ||
| I developed a deep-connection with my instructor as well, and I feel as if I've made a life-long friend in her.""", | ||
| """This was a waste of my time. All of the views on this drop are extremely boring, all I saw was grass. 0/10 would | ||
| not recommend to any divers, even first timers.""", | ||
| """This was pretty good! The sights were ok, and I had fun with my instructors! Can't complain too much about my experience""", | ||
| """I only have one word for my experience: WOW!!! I can't believe I have had such a wonderful skydiving company right | ||
| in my backyard this whole time! I will definitely be a repeat customer, and I want to take my grandmother skydiving too, | ||
| I know she'll love it!""" | ||
| ] | ||
|
|
||
| async with text_analytics_client: | ||
| result = await text_analytics_client.analyze_sentiment(documents) | ||
|
|
||
| docs = [doc for doc in result if not doc.is_error] | ||
|
|
||
| print("Let's visualize the sentiment of each of these documents") | ||
| for idx, doc in enumerate(docs): | ||
| print("Document text: {}".format(documents[idx])) | ||
| print("Overall sentiment: {}".format(doc.sentiment)) | ||
| # [END analyze_sentiment_async] | ||
| print("Overall confidence scores: positive={}; neutral={}; negative={} \n".format( | ||
| doc.confidence_scores.positive, | ||
| doc.confidence_scores.neutral, | ||
| doc.confidence_scores.negative, | ||
| )) | ||
| for sentence in doc.sentences: | ||
| print("Sentence '{}' has sentiment: {}".format(sentence.text, sentence.sentiment)) | ||
| print("...Sentence is {} characters from the start of the document and is {} characters long".format( | ||
| sentence.offset, len(sentence.text) | ||
| )) | ||
| print("...Sentence confidence scores: positive={}; neutral={}; negative={}".format( | ||
| sentence.confidence_scores.positive, | ||
| sentence.confidence_scores.neutral, | ||
| sentence.confidence_scores.negative, | ||
| )) | ||
| print("------------------------------------") | ||
|
|
||
| print("Now, let us extract all of the positive reviews") | ||
| positive_reviews = [doc for doc in docs if doc.sentiment == 'positive'] | ||
|
|
||
| print("We want to be very confident that our reviews are positive since we'll be posting them on our website.") | ||
| print("We're going to confirm our chosen reviews are positive using two different tests") | ||
|
|
||
| print( | ||
| "First, we are going to check how confident the sentiment analysis model is that a document is positive. " | ||
| "Let's go with a 90% confidence." | ||
| ) | ||
| positive_reviews = [ | ||
| review for review in positive_reviews | ||
| if review.confidence_scores.positive >= 0.9 | ||
| ] | ||
|
|
||
| print( | ||
| "Finally, we also want to make sure every sentence is positive so we only showcase our best selves!" | ||
| ) | ||
| positive_reviews_final = [] | ||
| for idx, review in enumerate(positive_reviews): | ||
| print("Looking at positive review #{}".format(idx + 1)) | ||
| any_sentence_not_positive = False | ||
| for sentence in review.sentences: | ||
| print("...Sentence '{}' has sentiment '{}' with confidence scores '{}'".format( | ||
| sentence.text, | ||
| sentence.sentiment, | ||
| sentence.confidence_scores | ||
| ) | ||
| ) | ||
| if sentence.sentiment != 'positive': | ||
| any_sentence_not_positive = True | ||
| if not any_sentence_not_positive: | ||
| positive_reviews_final.append(review) | ||
|
|
||
| print("We now have the final list of positive reviews we are going to display on our website!") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lol, great imagination :) |
||
|
|
||
|
|
||
| async def main(): | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the """ """ type string adds newlines to output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry. I added that to make it more clear this is a document visually. There are no print statements though, so I'm going to ignore your nit for now