Skip to content

Commit 837bcf8

Browse files
[formrecognizer] v2.1 (#15448)
* [formrecognizer] 2.1-preview.2 gen and impl (#14929) * regen * add invoice impl * regen with image/bmp content type for all analyze methods * support word appearance (gen code only) * support page_range kwonly arg on content API * add bounding_box to FormTable * fix spacing * add invoice tests * add invoice samples * revert erroneous search commit * add docstring samples for business cards and invoices * add aka link for fields to business card and invoice samples * add invoice to readme * add support for image/bmp * regen with new changes and hook up gen code * update changelog * add tests for content - specify pages * regen with text appearance on line instead of word * add appearance to exposed model for FormLine * update testcase * fix and add table bounding box check to testcase * skip receipt/business cards tests until features deployed; uncomment some tests where bugs were fixed * don't record large request bodies * fixes to tests/testcase * more updates to tests * update logging tests * update content samples * moving recordings to another PR * fix pylint * fix * let invoices return tables * pylint fix * update readme examples to include invoice for now * update content_type docstring description * update grammar * adding v2.1 recordings (receipt/business card skipped for now) (#14928) * rerecord an invoice test Co-authored-by: iscai-msft <[email protected]> * [formrecognizer] add copy tests for new features in v2.1 (#14987) * add copy tests for new features in v2.1 * rerecord test * test that submodels are the same and fix get custom model for composed model * [formrecognizer] adds language param (#14984) * regen latest swagger in PR * add language to form recognizer clients and update changelog * add basic tests for language * update docstrings * newlines * docstring feedback * update language docstring to point to service docs for available language codes * [form recognizer] add tests for invoice multipage (#15012) * [formrecognizer] fix sphinx errors and unify transform testcase for FormFields (#15056) * fix sphinx errors * unify method for checking transform of FormFields and update tests * switch invoices test to use unified form fields transform check * accidental whitespace * [formrecognizer] allow None for required params when passing a continuation_token into an LRO method (#15010) * update clients such that when using a continuation token, you can pass None for any required params in method * update continuation token tests * receipt test - total value fixed (#15159) * add python 3.9 to setup classifiers (#15208) * [formrecognizer] adds tests for forms in other languages (#15173) * adding language option to testcase to get blob url of form * add sync tests * add async tests * add async recordings * fix testcase for recordings * [formrecognizer] unskip receipt/business card tests and re-record with preview.2 (#15270) * rerecord business card and receipt tests with preview.2 * preface commented out test assertions with FIXME * [formrecognizer] Remove business card ContactNames page_number workaround (#15275) * revert setting business card ContactNames page number now that service returns this value * remove workaround in tests * uncomment receipt test assertions - regression fixed (#15345) * [formrecognizer] doc updates (#15346) * docs fixes, wip * more docs updates * more doc updates * [formrecognizer] try using existing resource in live tests (#15483) * try using existing FR resource * rename * fix env var name (#15486) * [formrecognizer] updates to test in prod (#15491) * add python 3.9 to live tests matrix * sleep for 10 mins between resource creating and calls to endpoints * receipt regression not fixed in prod yet - comment assertions out * update timeout for live runs for FR (#15492) * remove sample tests - will be run in smoke tests (#15494) Co-authored-by: iscai-msft <[email protected]> Co-authored-by: iscai-msft <[email protected]>
1 parent 7ef0743 commit 837bcf8

File tree

491 files changed

+90232
-1099214
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

491 files changed

+90232
-1099214
lines changed

sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,9 @@ This version of the SDK defaults to the latest supported API version, which curr
77
**New features**
88

99
- New methods `begin_recognize_business_cards` and `begin_recognize_business_cards_from_url` introduced to the SDK. Use these
10-
methods to recognize data from business cards.
10+
methods to recognize data from business cards
11+
- New methods `begin_recognize_invoices` and `begin_recognize_invoices_from_url` introduced to the SDK. Use these
12+
methods to recognize data from invoices
1113
- Recognize receipt methods now take keyword argument `locale` to optionally indicate the locale of the receipt for
1214
improved results
1315
- Added ability to create a composed model from the `FormTrainingClient` by calling method `begin_create_composed_model()`
@@ -21,6 +23,13 @@ also be populated with any selection marks found on the page
2123
- Added model type `CustomFormModelProperties` that includes information like if a model is a composed model
2224
- Added property `model_id` to `CustomFormSubmodel` and `TrainingDocumentInfo`
2325
- Added properties `model_id` and `form_type_confidence` to `RecognizedForm`
26+
- `appearance` property added to `FormLine` to indicate the style of extracted text - like "handwriting" or "other"
27+
- Added keyword argument `pages` to `begin_recognize_content` and `begin_recognize_content_from_url` to specify the page
28+
numbers to analyze
29+
- Added property `bounding_box` to `FormTable`
30+
- Content-type `image/bmp` now supported by recognize content and prebuilt models
31+
- Added keyword argument `language` to `begin_recognize_content` and `begin_recognize_content_from_url` to specify
32+
which language to process document in
2433

2534
**Dependency updates**
2635

sdk/formrecognizer/azure-ai-formrecognizer/README.md

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@ from form documents. It includes the following main functionalities:
55

66
* Custom models - Recognize field values and table data from forms. These models are trained with your own data, so they're tailored to your forms.
77
* Content API - Recognize text, table structures, and selection marks, along with their bounding box coordinates, from documents. Corresponds to the REST service's Layout API.
8-
* Prebuilt receipt model - Recognize data from sales receipts using a prebuilt model.
9-
* Prebuilt business card model - Recognize data from business cards using a prebuilt model.
8+
* Prebuilt models - Recognize data using the following prebuilt models
9+
* Receipt model - Recognize data from sales receipts using a prebuilt model.
10+
* Business card model - Recognize data from business cards using a prebuilt model.
11+
* Invoice model - Recognize data from invoices using a prebuilt model.
1012

1113
[Source code][python-fr-src] | [Package (PyPI)][python-fr-pypi] | [API reference documentation][python-fr-ref-docs]| [Product documentation][python-fr-product-docs] | [Samples][python-fr-samples]
1214

@@ -132,8 +134,10 @@ form_recognizer_client = FormRecognizerClient(
132134
`FormRecognizerClient` provides operations for:
133135

134136
- Recognizing form fields and content using custom models trained to recognize your custom forms. These values are returned in a collection of `RecognizedForm` objects.
135-
- Recognizing common fields from sales receipts, using a pre-trained receipt model. These fields and metadata are returned in a collection of `RecognizedForm` objects.
136-
- Recognizing common fields from business cards, using a pre-trained business card model. These fields and metadata are returned in a collection of `RecognizedForm` objects.
137+
- Recognizing common fields from the following form types using prebuilt models. These fields and metadata are returned in a collection of `RecognizedForm` objects.
138+
- Sales receipts. See fields found on a receipt [here][service_recognize_receipt].
139+
- Business cards. See fields found on a business card [here][service_recognize_business_cards].
140+
- Invoices. See fields found on an invoice [here][service_recognize_invoice].
137141
- Recognizing form content, including tables, lines, words, and selection marks, without the need to train a model. Form content is returned in a collection of `FormPage` objects.
138142

139143
Sample code snippets are provided to illustrate using a FormRecognizerClient [here](#recognize-forms-using-a-custom-model "Recognize Forms Using a Custom Model").
@@ -156,7 +160,7 @@ Long-running operations are operations which consist of an initial request sent
156160
followed by polling the service at intervals to determine whether the operation has completed or failed, and if it has
157161
succeeded, to get the result.
158162

159-
Methods that train models, recognize values from forms, or copy models are modeled as long-running operations.
163+
Methods that train models, recognize values from forms, or copy/compose models are modeled as long-running operations.
160164
The client exposes a `begin_<method-name>` method that returns an `LROPoller` or `AsyncLROPoller`. Callers should wait
161165
for the operation to complete by calling `result()` on the poller object returned from the `begin_<method-name>` method.
162166
Sample code snippets are provided to illustrate using long-running operations [below](#examples "Examples").
@@ -170,6 +174,7 @@ The following section provides several code snippets covering some of the most c
170174
* [Recognize Content](#recognize-content "Recognize Content")
171175
* [Recognize Receipts](#recognize-receipts "Recognize receipts")
172176
* [Recognize Business Cards](#recognize-business-cards "Recognize business cards")
177+
* [Recognize Invoices](#recognize-invoices "Recognize invoices")
173178
* [Train a Model](#train-a-model "Train a model")
174179
* [Manage Your Models](#manage-your-models "Manage Your Models")
175180

@@ -216,7 +221,7 @@ result = poller.result()
216221
```
217222

218223
### Recognize Content
219-
Recognize text and table structures, along with their bounding box coordinates, from documents.
224+
Recognize text, selection marks, and table structures, along with their bounding box coordinates, from documents.
220225

221226
```python
222227
from azure.ai.formrecognizer import FormRecognizerClient
@@ -235,6 +240,7 @@ page = poller.result()
235240

236241
table = page[0].tables[0] # page 1, table 1
237242
print("Table found on page {}:".format(table.page_number))
243+
print("Table location {}:".format(table.bounding_box))
238244
for cell in table.cells:
239245
print("Cell text: {}".format(cell.text))
240246
print("Location: {}".format(cell.bounding_box))
@@ -309,6 +315,30 @@ for business_card in result:
309315
print("{}: {} has confidence {}".format(item.name, item.value, item.confidence))
310316
```
311317

318+
### Recognize Invoices
319+
Recognize data from invoices using a prebuilt model. Invoice fields recognized by the service can be found [here][service_recognize_invoice].
320+
321+
```python
322+
from azure.ai.formrecognizer import FormRecognizerClient
323+
from azure.core.credentials import AzureKeyCredential
324+
325+
endpoint = "https://<region>.api.cognitive.microsoft.com/"
326+
credential = AzureKeyCredential("<api_key>")
327+
328+
form_recognizer_client = FormRecognizerClient(endpoint, credential)
329+
330+
with open("<path to your invoice>", "rb") as fd:
331+
invoice = fd.read()
332+
333+
poller = form_recognizer_client.begin_recognize_invoices(invoice)
334+
result = poller.result()
335+
336+
for invoice in result:
337+
for name, field in invoice.fields.items():
338+
print("{}: {} has confidence {}".format(name, field.value, field.confidence))
339+
```
340+
341+
312342
### Train a model
313343
Train a custom model on your own form type. The resulting model can be used to recognize values from the types of forms it was trained on.
314344
Provide a container SAS URL to your Azure Storage Blob container where you're storing the training documents.
@@ -439,13 +469,14 @@ These code samples show common scenario operations with the Azure Form Recognize
439469
* Recognize receipts: [sample_recognize_receipts.py][sample_recognize_receipts]
440470
* Recognize receipts from a URL: [sample_recognize_receipts_from_url.py][sample_recognize_receipts_from_url]
441471
* Recognize business cards: [sample_recognize_business_cards.py][sample_recognize_business_cards]
472+
* Recognize invoices: [sample_recognize_invoices.py][sample_recognize_invoices]
442473
* Recognize content: [sample_recognize_content.py][sample_recognize_content]
443474
* Recognize custom forms: [sample_recognize_custom_forms.py][sample_recognize_custom_forms]
444475
* Train a model without labels: [sample_train_model_without_labels.py][sample_train_model_without_labels]
445476
* Train a model with labels: [sample_train_model_with_labels.py][sample_train_model_with_labels]
446477
* Manage custom models: [sample_manage_custom_models.py][sample_manage_custom_models]
447478
* Copy a model between Form Recognizer resources: [sample_copy_model.py][sample_copy_model]
448-
* Create a composed model from a collection of models trained with labels: |[sample_create_composed_model.py][sample_create_composed_model]
479+
* Create a composed model from a collection of models trained with labels: [sample_create_composed_model.py][sample_create_composed_model]
449480

450481
### Async APIs
451482
This library also includes a complete async API supported on Python 3.5+. To use it, you must
@@ -456,6 +487,7 @@ are found under the `azure.ai.formrecognizer.aio` namespace.
456487
* Recognize receipts: [sample_recognize_receipts_async.py][sample_recognize_receipts_async]
457488
* Recognize receipts from a URL: [sample_recognize_receipts_from_url_async.py][sample_recognize_receipts_from_url_async]
458489
* Recognize business cards: [sample_recognize_business_cards_async.py][sample_recognize_business_cards_async]
490+
* Recognize invoices: [sample_recognize_invoices_async.py][sample_recognize_invoices_async]
459491
* Recognize content: [sample_recognize_content_async.py][sample_recognize_content_async]
460492
* Recognize custom forms: [sample_recognize_custom_forms_async.py][sample_recognize_custom_forms_async]
461493
* Train a model without labels: [sample_train_model_without_labels_async.py][sample_train_model_without_labels_async]
@@ -510,6 +542,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
510542
[default_azure_credential]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/identity/azure-identity#defaultazurecredential
511543
[service_recognize_receipt]: https://aka.ms/formrecognizer/receiptfields
512544
[service_recognize_business_cards]: https://aka.ms/formrecognizer/businesscardfields
545+
[service_recognize_invoice]: https://aka.ms/formrecognizer/invoicefields
513546
[sdk_logging_docs]: https://docs.microsoft.com/azure/developer/python/azure-sdk-logging
514547

515548
[cla]: https://cla.microsoft.com
@@ -531,6 +564,8 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
531564
[sample_recognize_receipts_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_recognize_receipts_async.py
532565
[sample_recognize_business_cards]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_recognize_business_cards.py
533566
[sample_recognize_business_cards_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_recognize_business_cards_async.py
567+
[sample_recognize_invoices]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_recognize_invoices.py
568+
[sample_recognize_invoices_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_recognize_invoices_async.py
534569
[sample_train_model_with_labels]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_train_model_with_labels.py
535570
[sample_train_model_with_labels_async]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/async_samples/sample_train_model_with_labels_async.py
536571
[sample_train_model_without_labels]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_train_model_without_labels.py

sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@
88
from ._form_recognizer_client import FormRecognizerClient
99
from ._form_training_client import FormTrainingClient
1010

11+
from ._generated.v2_1_preview_2.models import (
12+
Appearance,
13+
Style
14+
)
15+
16+
1117
from ._models import (
1218
FormElement,
1319
LengthUnit,
@@ -67,6 +73,8 @@
6773
'FieldValueType',
6874
'CustomFormModelProperties',
6975
'FormSelectionMark',
76+
'Appearance',
77+
'Style'
7078
]
7179

7280
__VERSION__ = VERSION

sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_api_versions.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99
class FormRecognizerApiVersion(str, Enum):
1010
"""Form Recognizer API versions supported by this package"""
1111

12-
#: this is the default version
13-
V2_1_PREVIEW = "2.1-preview.1"
12+
#: This is the default version
13+
V2_1_PREVIEW = "2.1-preview.2"
1414
V2_0 = "2.0"
1515

1616

0 commit comments

Comments
 (0)