GitHub - StonyBrookNLP/BioNLI: [EMNLP2022] BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples

What is BioNLI?

BioNLI is a biomedical NLI dataset using controllable text generation

This is the official page for the paper:

BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples
accepted at EMNLP2022 (Findings).

BioNLI is the first dataset in biomedical natural language inference. This dataset contains abstracts from biomedical literature and mechanistic premises generated with nine different strategies.

Example

In the following example we see an example of an entry in the BioNLI dataset. Some supporting text was removed to save space. The premise is a set of sentences talking about two biomedical entiteis. The consistent hypothesis is the original conclusion sentence from the abstract paper, the inconsistent hypothesis is the generated sentence with one of the different nine strategies.

Coming Soon

Dataset Statistics

There are two different versions of this dataset. One is the large distribution which contains all possible perturbations and the other is the balanced distirbution. They both share the same test set. For the full distribution, we generate as many perturbations as possible for dev and test set, but for training each instance is perturbed once.

Full Distribution:

Balanced Distribution:

Download the data

The dataset can be downloaded here:

The full set can be downloaded from here.

The balanced set can be downloaded from here.

To access the test set please contact me.

License

BioNLI is distributed under CC BY 4.0 License.

Liked us? Cite us!

Please use the following bibtex entry:

@inproceedings{bastan-etal-2022-bionli,
    title = "{B}io{NLI}: Generating a Biomedical {NLI} Dataset Using Lexico-semantic Constraints for Adversarial Examples",
    author = "Bastan, Mohaddeseh  and
      Surdeanu, Mihai  and
      Balasubramanian, Niranjan",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.374",
    pages = "5093--5104",
    
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
src		src
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is BioNLI?

BioNLI is a biomedical NLI dataset using controllable text generation

Example

Coming Soon

Dataset Statistics

Full Distribution:

Balanced Distribution:

Download the data

License

Liked us? Cite us!

About

Releases

Packages

Languages

StonyBrookNLP/BioNLI

Folders and files

Latest commit

History

Repository files navigation

What is BioNLI?

BioNLI is a biomedical NLI dataset using controllable text generation

Example

Coming Soon

Dataset Statistics

Full Distribution:

Balanced Distribution:

Download the data

License

Liked us? Cite us!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages