Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,8 @@ conversion utilities for the following models:
27. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
28. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
28. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
Francesco Piccinno and Julian Martin Eisenschlos.
29. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
Expand Down
99 changes: 49 additions & 50 deletions docs/source/model_doc/tapas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,85 +5,84 @@ Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The TAPAS model was proposed in `TAPAS: Weakly Supervised Table Parsing via Pre-training
<https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and
Julian Martin Eisenschlos.
It's a BERT-based model specifically designed (and pre-trained) for answering questions about tabular data. Compared to
BERT, TAPAS uses relative position embeddings and has 7 token types that encode tabular structure. TAPAS is pre-trained
on the masked language modeling (MLM) objective on a large dataset comprising millions of tables from English Wikipedia and
corresponding texts. For question answering, TAPAS has 2 heads on top: a cell selection head and an aggregation head, for
(optionally) performing aggregations (such as counting or summing) among selected cells. TAPAS has been fine-tuned on several
datasets: SQA (Sequential Question Answering by Microsoft), WTQ (Wiki Table Questions by Stanford University) and WikiSQL
(by Salesforce). It achieves state-of-the-art on both SQA and WTQ, while having comparable performance to SOTA on WikiSQL,
with a much simpler architecture.
<https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and
Julian Martin Eisenschlos. It's a BERT-based model specifically designed (and pre-trained) for answering questions
about tabular data. Compared to BERT, TAPAS uses relative position embeddings and has 7 token types that encode tabular
structure. TAPAS is pre-trained on the masked language modeling (MLM) objective on a large dataset comprising millions
of tables from English Wikipedia and corresponding texts. For question answering, TAPAS has 2 heads on top: a cell
selection head and an aggregation head, for (optionally) performing aggregations (such as counting or summing) among
selected cells. TAPAS has been fine-tuned on several datasets: SQA (Sequential Question Answering by Microsoft), WTQ
(Wiki Table Questions by Stanford University) and WikiSQL (by Salesforce). It achieves state-of-the-art on both SQA and
WTQ, while having comparable performance to SOTA on WikiSQL, with a much simpler architecture.

The abstract from the paper is the following:

*Answering natural language questions over tables is usually seen as a semantic parsing task.
To alleviate the collection cost of full logical forms, one popular approach focuses on weak
supervision consisting of denotations instead of logical forms. However, training semantic parsers
from weak supervision poses difficulties, and in addition, the generated logical forms are only used
as an intermediate step prior to retrieving the denotation. In this paper, we present TAPAS, an
approach to question answering over tables without generating logical forms. TAPAS trains from weak
supervision, and predicts the denotation by selecting table cells and optionally applying a corresponding
aggregation operator to such selection. TAPAS extends BERT's architecture to encode tables as input,
initializes from an effective joint pre-training of text segments and tables crawled from Wikipedia,
and is trained end-to-end. We experiment with three different semantic parsing datasets, and find
that TAPAS outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on
SQA from 55.1 to 67.2 and performing on par with the state-of-the-art on WIKISQL and WIKITQ, but
with a simpler model architecture. We additionally find that transfer learning, which is trivial
in our setting, from WIKISQL to WIKITQ, yields 48.7 accuracy, 4.2 points above the state-of-the-art.*

In addition, the authors have further pre-trained TAPAS to recognize table entailment, by creating a balanced dataset of millions
of automatically created training examples which are learned in an intermediate step prior to fine-tuning. The authors of TAPAS
call this further pre-training intermediate pre-training (since TAPAS is first pre-trained on MLM, and then on another dataset).
They found that intermediate pre-training further improves performance on SQA, achieving a new state-of-the-art as well as
state-of-the-art on TabFact, a large-scale dataset with 16k Wikipedia tables for table entailment (a binary classification task).
For more details, see their new paper: `Understanding tables with intermediate pre-training <https://arxiv.org/abs/2010.00571>`__
by Julian Martin Eisenschlos, Syrine Krichene and Thomas Müller.
*Answering natural language questions over tables is usually seen as a semantic parsing task. To alleviate the
collection cost of full logical forms, one popular approach focuses on weak supervision consisting of denotations
instead of logical forms. However, training semantic parsers from weak supervision poses difficulties, and in addition,
the generated logical forms are only used as an intermediate step prior to retrieving the denotation. In this paper, we
present TAPAS, an approach to question answering over tables without generating logical forms. TAPAS trains from weak
supervision, and predicts the denotation by selecting table cells and optionally applying a corresponding aggregation
operator to such selection. TAPAS extends BERT's architecture to encode tables as input, initializes from an effective
joint pre-training of text segments and tables crawled from Wikipedia, and is trained end-to-end. We experiment with
three different semantic parsing datasets, and find that TAPAS outperforms or rivals semantic parsing models by
improving state-of-the-art accuracy on SQA from 55.1 to 67.2 and performing on par with the state-of-the-art on WIKISQL
and WIKITQ, but with a simpler model architecture. We additionally find that transfer learning, which is trivial in our
setting, from WIKISQL to WIKITQ, yields 48.7 accuracy, 4.2 points above the state-of-the-art.*

In addition, the authors have further pre-trained TAPAS to recognize table entailment, by creating a balanced dataset
of millions of automatically created training examples which are learned in an intermediate step prior to fine-tuning.
The authors of TAPAS call this further pre-training intermediate pre-training (since TAPAS is first pre-trained on MLM,
and then on another dataset). They found that intermediate pre-training further improves performance on SQA, achieving
a new state-of-the-art as well as state-of-the-art on TabFact, a large-scale dataset with 16k Wikipedia tables for
table entailment (a binary classification task). For more details, see their new paper: `Understanding tables with
intermediate pre-training <https://arxiv.org/abs/2010.00571>`__ by Julian Martin Eisenschlos, Syrine Krichene and
Thomas Müller.

The original code can be found `here <https://github.com/google-research/tapas>`__.

Tips:

- TAPAS is a model that uses relative position embeddings by default (restarting the position embeddings at every cell of the table). According to
the authors, this usually results in a slightly better performance, and allows you to encode longer sequences without running out
of embeddings.
If you don't want this, you can set the `reset_position_index_per_cell` parameter of :class:`~transformers.TapasConfig` to False.
- TAPAS has checkpoints fine-tuned on SQA, which are capable of answering questions related to a table in a conversational set-up. This
means that you can ask follow-up questions such as "what is his age?" related to the previous question. Note that the forward pass of
TAPAS is a bit different in case of a conversational set-up: in that case, you have to feed every training example one by one to the
model, such that the `prev_label_ids` token type ids can be overwritten by the predicted `label_ids` of the model to the previous
question.
- TAPAS is similar to BERT and therefore relies on the masked language modeling (MLM) objective.
It is therefore efficient at predicting masked tokens and at NLU in general, but is not optimal for
text generation. Models trained with a causal language modeling (CLM) objective are better in that regard.
- TAPAS is a model that uses relative position embeddings by default (restarting the position embeddings at every cell
of the table). According to the authors, this usually results in a slightly better performance, and allows you to
encode longer sequences without running out of embeddings. If you don't want this, you can set the
`reset_position_index_per_cell` parameter of :class:`~transformers.TapasConfig` to False.
- TAPAS has checkpoints fine-tuned on SQA, which are capable of answering questions related to a table in a
conversational set-up. This means that you can ask follow-up questions such as "what is his age?" related to the
previous question. Note that the forward pass of TAPAS is a bit different in case of a conversational set-up: in that
case, you have to feed every training example one by one to the model, such that the `prev_label_ids` token type ids
can be overwritten by the predicted `label_ids` of the model to the previous question.
- TAPAS is similar to BERT and therefore relies on the masked language modeling (MLM) objective. It is therefore
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
with a causal language modeling (CLM) objective are better in that regard.


Usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you just want to perform inference (i.e. making predictions) in a non-conversational setup, you can do the following:
If you just want to perform inference (i.e. making predictions) in a non-conversational setup, you can do the
following:

.. code-block::

>>> from transformers import TapasTokenizer, TapasForQuestionAnswering
>>> import pandas as pd

>>> model_name = 'tapas-base-finetuned-wtq'
>>> model = TapasForQuestionAnswering.from_pretrained(model_name)
>>> tokenizer = TapasTokenizer.from_pretrained(model_name)

>>> data = {'Actors': ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], 'Number of movies': ["87", "53", "69"]}
>>> queries = ["What is the name of the first actor?", "How many movies has George Clooney played in?", "What is the total number of movies?"]
>>> table = pd.Dataframe(data)
>>> inputs = tokenizer(table, queries, return_tensors='pt')
>>> logits, logits_agg = model(**inputs)
>>> answer_coordinates_batch, aggregation_predictions = tokenizer.convert_logits_to_predictions(inputs, logits, logits_agg)

>>> # let's print out the results:
>>> id2aggregation = {0: "NONE", 1: "SUM", 2: "AVERAGE", 3:"COUNT"}
>>> aggregation_predictions_string = [id2aggregation[x] for x in aggregation_predictions]

>>> answers = []
>>> for coordinates in answer_coordinates_batch:
... if len(coordinates) == 1:
Expand All @@ -95,7 +94,7 @@ If you just want to perform inference (i.e. making predictions) in a non-convers
... for coordinate in coordinates:
... cell_values.append(df.iat[coordinate])
... answers.append(", ".join(cell_values))

>>> display(df)
>>> print("")
>>> for query, answer, predicted_agg in zip(queries, answers, aggregation_predictions_string):
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -562,10 +562,10 @@
)
from .modeling_tapas import (
TAPAS_PRETRAINED_MODEL_ARCHIVE_LIST,
TapasModel,
TapasForMaskedLM,
TapasForQuestionAnswering,
TapasForSequenceClassification,
TapasModel,
load_tf_weights_in_tapas,
)
from .modeling_transfo_xl import (
Expand Down
24 changes: 12 additions & 12 deletions src/transformers/configuration_tapas.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,20 @@

from .configuration_utils import PretrainedConfig


TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP = {"tapas-base": "", "tapas-large": ""} # to be added # to be added


class TapasConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.TapasModel`.
It is used to instantiate a TAPAS model according to the specified arguments, defining the model
architecture. Instantiating a configuration with the defaults will yield a similar configuration
to that of the TAPAS `tapas-base-finetuned-sqa` architecture. Configuration objects
inherit from :class:`~transformers.PreTrainedConfig` and can be used to control the model outputs.
Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
This is the configuration class to store the configuration of a :class:`~transformers.TapasModel`. It is used to
instantiate a TAPAS model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the TAPAS `tapas-base-finetuned-sqa`
architecture. Configuration objects inherit from :class:`~transformers.PreTrainedConfig` and can be used to control
the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.

Hyperparameters additional to BERT are taken from run_task_main.py and hparam_utils.py of the original implementation.
Original implementation available at https://github.com/google-research/tapas/tree/master.
Hyperparameters additional to BERT are taken from run_task_main.py and hparam_utils.py of the original
implementation. Original implementation available at https://github.com/google-research/tapas/tree/master.

Args:
vocab_size (:obj:`int`, `optional`, defaults to 30522):
Expand Down Expand Up @@ -87,9 +87,9 @@ class TapasConfig(PretrainedConfig):
average_approximation_function: (:obj:`string`, `optional`, defaults to :obj:`"ratio"`):
Method to calculate expected average of cells in the relaxed case.
cell_selection_preference: (:obj:`float`, `optional`, defaults to None):
Preference for cell selection in ambiguous cases. Only applicable in case of weak supervision for aggregation (WTQ, WikiSQL).
If the total mass of the aggregation probabilities (excluding the "NONE" operator) is higher than this hyperparameter,
then aggregation is predicted for an example.
Preference for cell selection in ambiguous cases. Only applicable in case of weak supervision for
aggregation (WTQ, WikiSQL). If the total mass of the aggregation probabilities (excluding the "NONE"
operator) is higher than this hyperparameter, then aggregation is predicted for an example.
answer_loss_cutoff: (:obj:`float`, `optional`, defaults to None):
Ignore examples with answer loss larger than cutoff.
max_num_rows: (:obj:`int`, `optional`, defaults to 64):
Expand All @@ -109,7 +109,7 @@ class TapasConfig(PretrainedConfig):
disable_per_token_loss: (:obj:`bool`, `optional`, defaults to :obj:`False`):
Disable any (strong or weak) supervision on cells.
span_prediction: (:obj:`string`, `optional`, defaults to :obj:`"none"`):
Span selection mode to use. Currently only "none" is supported.
Span selection mode to use. Currently only "none" is supported.

Example::

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,12 @@

import torch

from transformers import TapasConfig, TapasForQuestionAnswering, TapasForSequenceClassification, load_tf_weights_in_tapas
from transformers import (
TapasConfig,
TapasForQuestionAnswering,
TapasForSequenceClassification,
load_tf_weights_in_tapas,
)
from transformers.utils import logging


Expand All @@ -41,14 +46,14 @@ def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, tapas_config_file, pyto
# select_one_column = True,
# allow_empty_column_selection = False,
# temperature = 0.0352513)

# SQA config
config = TapasConfig()

print("Building PyTorch model from configuration: {}".format(str(config)))
# model = TapasForMaskedLM(config)
model = TapasForQuestionAnswering(config)
#model = TapasForSequenceClassification(config)
# model = TapasForSequenceClassification(config)

# Load weights from tf checkpoint
load_tf_weights_in_tapas(model, config, tf_checkpoint_path)
Expand Down
8 changes: 4 additions & 4 deletions src/transformers/file_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,10 +191,10 @@

except ImportError:
_tokenizers_available = False
try:
import torch_scatter


try:
import torch_scatter

# Check we're not importing a "torch_scatter" directory somewhere
_scatter_available = hasattr(torch_scatter, "__version__") and hasattr(torch_scatter, "scatter")
Expand Down
Loading