-
Notifications
You must be signed in to change notification settings - Fork 32k
Add LayoutLMv3 #17060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add LayoutLMv3 #17060
Changes from all commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
11cf1dc
Make forward pass work
NielsRogge c53e7a3
More improvements
NielsRogge 57a18fa
Remove unused imports
NielsRogge 1492657
Remove timm dependency
NielsRogge c6eec65
Improve loss calculation of token classifier
NielsRogge 3ef05ae
Fix most tests
NielsRogge 2fa1744
Add docs
NielsRogge aec5dcb
Add model integration test
NielsRogge aa2cb8a
Make all tests pass
NielsRogge 5263e36
Add LayoutLMv3FeatureExtractor
NielsRogge b4ebf2b
Improve integration test + make fixup
NielsRogge 4c526cf
Add example script
NielsRogge 35cd97b
Fix style
NielsRogge 379ad5b
Add LayoutLMv3Processor
NielsRogge 4e4a098
Fix style
NielsRogge f30855e
Add option to add visual labels
NielsRogge 335acf9
Make more tokenizer tests pass
NielsRogge d5325c9
Fix more tests
NielsRogge 8ce33c1
Make more tests pass
NielsRogge c0d3130
Fix bug and improve docs
NielsRogge 70f0b25
Fix import of processors
NielsRogge f50e0c8
Improve docstrings
NielsRogge 3380acd
Fix toctree and improve docs
NielsRogge 3e59888
Fix auto tokenizer
NielsRogge 75b7471
Move tests to model folder
NielsRogge d36a98f
Move tests to model folder
NielsRogge 99a7b53
change default behavior add_prefix_space
SaulLu c160f1c
add prefix space for fast
SaulLu bb35959
add_prefix_spcae set to True for Fast
SaulLu 74b61e3
no space before `unique_no_split` token
SaulLu c89bd61
add test to hightligh special treatment of added tokens
SaulLu 009dc6f
fix `test_batch_encode_dynamic_overflowing` by building a long enough…
SaulLu c240cf5
fix `test_full_tokenizer` with add_prefix_token
SaulLu 4275185
Fix tokenizer integration test
NielsRogge ca64765
Make the code more readable
NielsRogge e3a2375
Add tests for LayoutLMv3Processor
NielsRogge 1debfd8
Fix style
NielsRogge a8404bf
Add model to README and update init
NielsRogge 1c8f447
Apply suggestions from code review
NielsRogge 174afe1
Replace asserts by value errors
NielsRogge 63d3fea
Add suggestion by @ducviet00
NielsRogge 04d7c75
Add model to doc tests
NielsRogge e501974
Simplify script
NielsRogge f8d5874
Improve README
NielsRogge dcbbba2
a step ahead to fix
SaulLu e1ddad4
Update pair_input_test
NielsRogge 33b0c15
Make all tokenizer tests pass - phew
NielsRogge 721e54e
Make style
NielsRogge 7a39664
Add LayoutLMv3 to CI job
NielsRogge e149953
Fix auto mapping
NielsRogge 033e584
Fix CI job name
NielsRogge 9f8b757
Make all processor tests pass
NielsRogge 49f2087
Make tests of LayoutLMv2 and LayoutXLM consistent
NielsRogge c043fd2
Add copied from statements to fast tokenizer
NielsRogge cc214f2
Add copied from statements to slow tokenizer
NielsRogge fd68510
Remove add_visual_labels attribute
NielsRogge 80f4fac
Fix tests
672af36
Add link to notebooks
NielsRogge 458638d
Improve docs of LayoutLMv3Processor
NielsRogge b7ce27e
Fix reference to section
NielsRogge File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # LayoutLMv3 | ||
|
|
||
| ## Overview | ||
|
|
||
| The LayoutLMv3 model was proposed in [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. | ||
| LayoutLMv3 simplifies [LayoutLMv2](layoutlmv2) by using patch embeddings (as in [ViT](vit)) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) | ||
| and word-patch alignment (WPA). | ||
|
|
||
| The abstract from the paper is the following: | ||
|
|
||
| *Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis.* | ||
|
|
||
| Tips: | ||
|
|
||
| - In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that: | ||
| - images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format. | ||
| - text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece. | ||
| Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model. | ||
| - Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-LayoutLMv2Processor) of its predecessor. | ||
| - Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3). | ||
|
|
||
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/layoutlmv3_architecture.png" | ||
| alt="drawing" width="600"/> | ||
|
|
||
| <small> LayoutLMv3 architecture. Taken from the <a href="https://arxiv.org/abs/2204.08387">original paper</a>. </small> | ||
|
|
||
| This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3). | ||
|
|
||
|
|
||
| ## LayoutLMv3Config | ||
|
|
||
| [[autodoc]] LayoutLMv3Config | ||
|
|
||
| ## LayoutLMv3FeatureExtractor | ||
|
|
||
| [[autodoc]] LayoutLMv3FeatureExtractor | ||
| - __call__ | ||
|
|
||
| ## LayoutLMv3Tokenizer | ||
|
|
||
| [[autodoc]] LayoutLMv3Tokenizer | ||
| - __call__ | ||
| - save_vocabulary | ||
|
|
||
| ## LayoutLMv3TokenizerFast | ||
|
|
||
| [[autodoc]] LayoutLMv3TokenizerFast | ||
| - __call__ | ||
|
|
||
| ## LayoutLMv3Processor | ||
|
|
||
| [[autodoc]] LayoutLMv3Processor | ||
| - __call__ | ||
|
|
||
| ## LayoutLMv3Model | ||
|
|
||
| [[autodoc]] LayoutLMv3Model | ||
| - forward | ||
|
|
||
| ## LayoutLMv3ForSequenceClassification | ||
|
|
||
| [[autodoc]] LayoutLMv3ForSequenceClassification | ||
| - forward | ||
|
|
||
| ## LayoutLMv3ForTokenClassification | ||
|
|
||
| [[autodoc]] LayoutLMv3ForTokenClassification | ||
| - forward | ||
|
|
||
| ## LayoutLMv3ForQuestionAnswering | ||
|
|
||
| [[autodoc]] LayoutLMv3ForQuestionAnswering | ||
| - forward |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.