Incorrect assertion in pipeline test test_dbmdz_english()

### System Info

- `transformers` version: 4.22.0.dev0
- Platform: macOS-12.4-x86_64-i386-64bit
- Python version: 3.9.13
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.11.0 (False)
- Tensorflow version (GPU?): 2.9.1 (False)
- Flax version (CPU?/GPU?/TPU?): 0.5.2 (cpu)
- Jax version: 0.3.6
- JaxLib version: 0.3.5
- Using GPU in script?: n
- Using distributed or parallel set-up in script?: n

### Who can help?

@Narsil

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

`RUN_SLOW=1 RUN_PIPELINE_TESTS=yes pytest tests/pipelines/test_pipelines_token_classification.py::TokenClassificationPipelineTests::test_dbmdz_english`

Fails with two notable diffs: the "UN" entity offsets in the assertion don't match the offsets in the input string itself (off by two characters), and the `index` doesn't match. Output:

```
======================================= FAILURES ========================================
__________________ TokenClassificationPipelineTests.test_dbmdz_english __________________

self = <tests.pipelines.test_pipelines_token_classification.TokenClassificationPipelineTests testMethod=test_dbmdz_english>

    @require_torch
    @slow
    def test_dbmdz_english(self):
        # Other sentence
        NER_MODEL = "dbmdz/bert-large-cased-finetuned-conll03-english"
        model = AutoModelForTokenClassification.from_pretrained(NER_MODEL)
        tokenizer = AutoTokenizer.from_pretrained(NER_MODEL, use_fast=True)
        sentence = """Enzo works at the UN"""
        token_classifier = pipeline("ner", model=model, tokenizer=tokenizer)
        output = token_classifier(sentence)
>       self.assertEqual(
            nested_simplify(output),
            [
                {"entity": "I-PER", "score": 0.997, "word": "En", "start": 0, "end": 2, "index": 1},
                {"entity": "I-PER", "score": 0.996, "word": "##zo", "start": 2, "end": 4, "index": 2},
                {"entity": "I-ORG", "score": 0.999, "word": "UN", "start": 22, "end": 24, "index": 7},
            ],
        )
E       AssertionError: Lists differ: [{'en[24 chars] 0.998, 'index': 1, 'word': 'En', 'start': 0, [179 chars] 20}] != [{'en[24 chars] 0.997, 'word': 'En', 'start': 0, 'end': 2, 'i[179 chars]: 7}]
E       
E       First differing element 0:
E       {'ent[15 chars]'score': 0.998, 'index': 1, 'word': 'En', 'start': 0, 'end': 2}
E       {'ent[15 chars]'score': 0.997, 'word': 'En', 'start': 0, 'end': 2, 'index': 1}
E       
E         [{'end': 2,
E           'entity': 'I-PER',
E           'index': 1,
E       -   'score': 0.998,
E       ?                ^
E       
E       +   'score': 0.997,
E       ?                ^
E       
E           'start': 0,
E           'word': 'En'},
E          {'end': 4,
E           'entity': 'I-PER',
E           'index': 2,
E       -   'score': 0.997,
E       ?                ^
E       
E       +   'score': 0.996,
E       ?                ^
E       
E           'start': 2,
E           'word': '##zo'},
E       -  {'end': 20,
E       ?           ^
E       
E       +  {'end': 24,
E       ?           ^
E       
E           'entity': 'I-ORG',
E       -   'index': 6,
E       ?            ^
E       
E       +   'index': 7,
E       ?            ^
E       
E           'score': 0.999,
E       -   'start': 18,
E       ?            ^^
E       
E       +   'start': 22,
E       ?            ^^
E       
E           'word': 'UN'}]

tests/pipelines/test_pipelines_token_classification.py:284: AssertionError
```

### Expected behavior

[a green dot]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect assertion in pipeline test test_dbmdz_english() #18405

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect assertion in pipeline test test_dbmdz_english() #18405

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions