[Tokenizer] Inconsistent behavior when decoding a single ID and a list of the single ID

### System Info

- `transformers` version: 4.39.0.dev0
- Platform: Linux-5.4.0-163-generic-x86_64-with-glibc2.10
- Python version: 3.8.18
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config:    not found
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Tensorflow version (GPU?): 2.13.1 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
- Jax version: 0.4.13
- JaxLib version: 0.4.13
- Using GPU in script?: no need
- Using distributed or parallel set-up in script?:no need

### Who can help?

@ArthurZucker 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Code

```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", use_fast=False)
int_single_id = tokenizer.vocab_size-1
list_single_id = [tokenizer.vocab_size-1]
print(f'<<<<{tokenizer.decode(int_single_id)}>>>>')
print(f'<<<<{tokenizer.decode(list_single_id)}>>>>')

tokenizer = AutoTokenizer.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base", use_fast=False)
int_single_id = tokenizer.vocab_size-1
list_single_id = [tokenizer.vocab_size-1]
print(f'<<<<{tokenizer.decode(int_single_id)}>>>>')
print(f'<<<<{tokenizer.decode(list_single_id)}>>>>')

# Roughly estimated, around 15 models would have this issue.
```

Output

```text
<<<<# # ～>>>>
<<<<##～>>>>
<<<<# # ～>>>>
<<<<##～>>>>
```

### Expected behavior

Consistent behaviors. For example, when decoding the single ID, the output could also be `##~`.

Suspected rationale: In the `src/transformers/tokenization_utils.py`, the `_decode` function incorrectly uses `spaces_between_special_tokens`, and then adds spaces between the sub-tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tokenizer] Inconsistent behavior when decoding a single ID and a list of the single ID #29489

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tokenizer] Inconsistent behavior when decoding a single ID and a list of the single ID #29489

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions