Skip to content

Fix KeyErrors for attributes not available in model_kwargs#470

Closed
ankurneog wants to merge 2 commits into
huggingface:mainfrom
ankurneog:anneog_fix_bucket_size
Closed

Fix KeyErrors for attributes not available in model_kwargs#470
ankurneog wants to merge 2 commits into
huggingface:mainfrom
ankurneog:anneog_fix_bucket_size

Conversation

@ankurneog
Copy link
Copy Markdown
Contributor

These are fixes for issues caught while executing the transformers tests. The issues are captured in #445 and #446

TCs impacted ( listed for t5 but these are common TCs impacting all language models)

==============================================================
FAILED test_modeling_t5.py::T5ModelTest::test_beam_search_generate - KeyError: 'limit_hpu_graphs'
FAILED test_modeling_t5.py::T5ModelTest::test_beam_search_generate_dict_output - KeyError: 'limit_hpu_graphs'
FAILED test_modeling_t5.py::T5ModelTest::test_beam_search_generate_dict_outputs_use_cache - KeyError: 'limit_hpu_graphs'
FAILED test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate - KeyError: 'limit_hpu_graphs'
FAILED test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate_dict_output - KeyError: 'limit_hpu_graphs'
FAILED test_modeling_t5.py::T5ModelTest::test_greedy_generate - KeyError: 'bucket_size'
FAILED test_modeling_t5.py::T5ModelTest::test_greedy_generate_dict_outputs - KeyError: 'bucket_size'
FAILED test_modeling_t5.py::T5ModelTest::test_greedy_generate_dict_outputs_use_cache - KeyError: 'bucket_size'

FAILED test_modeling_t5.py::T5ModelTest::test_sample_generate - KeyError: 'limit_hpu_graphs'
FAILED test_modeling_t5.py::T5ModelTest::test_sample_generate_dict_output - KeyError: 'limit_hpu_graphs'

@regisss : Please help with the review.

@ankurneog
Copy link
Copy Markdown
Contributor Author

Status After the fixes :

(conda_qnpu1) (anneog_fix_bucket_size) anneog@anneog-vm-u20:t5 $ python -m pytest -vs test_modeling_t5.py
===================================================================================================== test session starts ======================================================================================================
platform linux -- Python 3.8.18, pytest-7.4.2, pluggy-1.3.0 -- /home/anneog/anaconda3/envs/conda_qnpu1/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/anneog/github/ankurneog/optimum-habana/tests/transformers/tests/models/t5/.hypothesis/examples')
metadata: {'Python': '3.8.18', 'Platform': 'Linux-5.4.0-163-generic-x86_64-with-glibc2.17', 'Packages': {'pytest': '7.4.2', 'pluggy': '1.3.0'}, 'Plugins': {'hypothesis': '6.87.1', 'xdist': '3.3.1', 'metadata': '3.0.0', 'html': '4.0.2', 'timeout': '2.1.0', 'forked': '1.6.0', 'random-order': '1.0.2'}}
Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>
rootdir: /home/anneog/github/ankurneog/optimum-habana
configfile: setup.cfg
plugins: hypothesis-6.87.1, xdist-3.3.1, metadata-3.0.0, html-4.0.2, timeout-2.1.0, forked-1.6.0, random-order-1.0.2
collecting ... [WARNING|utils.py:179] 2023-10-18 08:28:06,329 >> optimum-habana v1.8.0.dev0 has been validated for SynapseAI v1.12.0 but habana-frameworks v1.13.0.198 was found, this could lead to undefined behavior!
[WARNING|utils.py:196] 2023-10-18 08:28:06,349 >> Could not run hl-smi, please follow the installation guide: https://docs.habana.ai/en/latest/Installation_Guide/index.html.
collected 152 items

test_modeling_t5.py::T5ModelTest::test_assisted_decoding_matches_greedy_search SKIPPED (test is slow)
test_modeling_t5.py::T5ModelTest::test_assisted_decoding_sample ============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 8
CPU RAM : 40852220 KB

PASSED
test_modeling_t5.py::T5ModelTest::test_attention_outputs PASSED
test_modeling_t5.py::T5ModelTest::test_beam_sample_generate SKIPPED (Beam search sampling is not supported by optimum-habana yet)
test_modeling_t5.py::T5ModelTest::test_beam_sample_generate_dict_output SKIPPED (Beam search sampling is not supported by optimum-habana yet)
test_modeling_t5.py::T5ModelTest::test_beam_search_generate PASSED
test_modeling_t5.py::T5ModelTest::test_beam_search_generate_dict_output PASSED
test_modeling_t5.py::T5ModelTest::test_beam_search_generate_dict_outputs_use_cache PASSED
test_modeling_t5.py::T5ModelTest::test_can_use_safetensors PASSED
test_modeling_t5.py::T5ModelTest::test_config PASSED
test_modeling_t5.py::T5ModelTest::test_config_and_model_silu_gated PASSED
test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate PASSED
test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate_dict_output PASSED
test_modeling_t5.py::T5ModelTest::test_contrastive_generate PASSED
test_modeling_t5.py::T5ModelTest::test_contrastive_generate_dict_outputs_use_cache PASSED
test_modeling_t5.py::T5ModelTest::test_contrastive_generate_low_memory PASSED
test_modeling_t5.py::T5ModelTest::test_correct_missing_keys Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at /tmp/tmpyu2jq74p and are newly initialized: ['classification_head.out_proj.bias', 'classification_head.dense.weight', 'classification_head.out_proj.weight', 'classification_head.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
PASSED
test_modeling_t5.py::T5ModelTest::test_cpu_offload SKIPPED (test requires CUDA)
test_modeling_t5.py::T5ModelTest::test_decoder_model_past PASSED
test_modeling_t5.py::T5ModelTest::test_decoder_model_past_with_3d_attn_mask PASSED
test_modeling_t5.py::T5ModelTest::test_decoder_model_past_with_attn_mask PASSED
test_modeling_t5.py::T5ModelTest::test_decoder_model_past_with_large_inputs PASSED
test_modeling_t5.py::T5ModelTest::test_determinism PASSED
test_modeling_t5.py::T5ModelTest::test_disk_offload SKIPPED (Does not work on the tiny model as we keep hitting edge cases.)
test_modeling_t5.py::T5ModelTest::test_encoder_decoder_shared_weights PASSED
test_modeling_t5.py::T5ModelTest::test_export_to_onnx SKIPPED (Test has a segmentation fault on torch 1.8.0)
test_modeling_t5.py::T5ModelTest::test_feed_forward_chunking PASSED
test_modeling_t5.py::T5ModelTest::test_forward_signature PASSED
test_modeling_t5.py::T5ModelTest::test_from_pretrained_no_checkpoint PASSED
test_modeling_t5.py::T5ModelTest::test_generate_from_inputs_embeds_decoder_only PASSED
test_modeling_t5.py::T5ModelTest::test_generate_with_head_masking SKIPPED (Segmentation fault is observed)
test_modeling_t5.py::T5ModelTest::test_generate_with_past_key_values SKIPPED (Skipped for Gaudi)
test_modeling_t5.py::T5ModelTest::test_generate_without_input_ids PASSED
test_modeling_t5.py::T5ModelTest::test_gradient_checkpointing_backward_compatibility PASSED
test_modeling_t5.py::T5ModelTest::test_gradient_checkpointing_enable_disable PASSED
test_modeling_t5.py::T5ModelTest::test_greedy_generate PASSED
test_modeling_t5.py::T5ModelTest::test_greedy_generate_dict_outputs PASSED
test_modeling_t5.py::T5ModelTest::test_greedy_generate_dict_outputs_use_cache PASSED
test_modeling_t5.py::T5ModelTest::test_group_beam_search_generate SKIPPED (Group beam search is not supported by optimum-habana)
test_modeling_t5.py::T5ModelTest::test_group_beam_search_generate_dict_output SKIPPED (Group beam search is not supported by optimum-habana)
test_modeling_t5.py::T5ModelTest::test_head_pruning PASSED
test_modeling_t5.py::T5ModelTest::test_head_pruning_integration PASSED
test_modeling_t5.py::T5ModelTest::test_head_pruning_save_load_from_config_init PASSED
test_modeling_t5.py::T5ModelTest::test_head_pruning_save_load_from_pretrained PASSED
test_modeling_t5.py::T5ModelTest::test_headmasking PASSED
test_modeling_t5.py::T5ModelTest::test_hidden_states_output PASSED
test_modeling_t5.py::T5ModelTest::test_initialization PASSED
test_modeling_t5.py::T5ModelTest::test_inputs_embeds PASSED
test_modeling_t5.py::T5ModelTest::test_left_padding_compatibility PASSED
test_modeling_t5.py::T5ModelTest::test_load_save_without_tied_weights PASSED
test_modeling_t5.py::T5ModelTest::test_load_with_mismatched_shapes Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at /tmp/tmpeadng85e and are newly initialized because the shapes did not match:

  • classification_head.out_proj.weight: found shape torch.Size([2, 32]) in the checkpoint and torch.Size([42, 32]) in the model instantiated
  • classification_head.out_proj.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([42]) in the model instantiated
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of T5Model were not initialized from the model checkpoint at /tmp/tmpeadng85e and are newly initialized because the shapes did not match:
  • transformer.shared.weight: found shape torch.Size([99, 32]) in the checkpoint and torch.Size([10, 32]) in the model instantiated
  • transformer.encoder.embed_tokens.weight: found shape torch.Size([99, 32]) in the checkpoint and torch.Size([10, 32]) in the model instantiated
  • transformer.decoder.embed_tokens.weight: found shape torch.Size([99, 32]) in the checkpoint and torch.Size([10, 32]) in the model instantiated
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    PASSED
    test_modeling_t5.py::T5ModelTest::test_model PASSED
    test_modeling_t5.py::T5ModelTest::test_model_common_attributes PASSED
    test_modeling_t5.py::T5ModelTest::test_model_from_pretrained SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelTest::test_model_is_small PASSED
    test_modeling_t5.py::T5ModelTest::test_model_main_input_name PASSED
    test_modeling_t5.py::T5ModelTest::test_model_outputs_equivalence PASSED
    test_modeling_t5.py::T5ModelTest::test_model_parallel_beam_search SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5ModelTest::test_model_parallel_equal_results SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5ModelTest::test_model_parallelism SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5ModelTest::test_model_parallelization SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5ModelTest::test_model_v1_1 PASSED
    test_modeling_t5.py::T5ModelTest::test_model_weights_reload_no_missing_tied_weights Some weights of T5Model were not initialized from the model checkpoint at /tmp/tmpx_1cw9xh and are newly initialized: ['encoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'encoder.final_layer_norm.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.embed_tokens.weight', 'encoder.block.1.layer.0.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.final_layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'encoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'encoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'encoder.block.1.layer.1.DenseReluDense.wo.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'encoder.block.0.layer.1.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'encoder.block.0.layer.0.layer_norm.weight', 'encoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'encoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'encoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'shared.weight', 'encoder.embed_tokens.weight', 'encoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at /tmp/tmpus62arwo and are newly initialized: ['encoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'encoder.final_layer_norm.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.embed_tokens.weight', 'encoder.block.1.layer.0.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'lm_head.weight', 'decoder.final_layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'encoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'encoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'encoder.block.1.layer.1.DenseReluDense.wo.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'encoder.block.0.layer.1.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'encoder.block.0.layer.0.layer_norm.weight', 'encoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'encoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'encoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'shared.weight', 'encoder.embed_tokens.weight', 'encoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at /tmp/tmp24kk_52d and are newly initialized: ['encoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'encoder.final_layer_norm.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.embed_tokens.weight', 'encoder.block.1.layer.0.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.final_layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'classification_head.dense.bias', 'encoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'classification_head.out_proj.bias', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'encoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'encoder.block.1.layer.1.DenseReluDense.wo.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'encoder.block.0.layer.1.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'classification_head.out_proj.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'encoder.block.0.layer.0.layer_norm.weight', 'encoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'encoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'encoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'shared.weight', 'encoder.embed_tokens.weight', 'classification_head.dense.weight', 'encoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of T5ForQuestionAnswering were not initialized from the model checkpoint at /tmp/tmp5lyes6p6 and are newly initialized: ['encoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'encoder.final_layer_norm.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.embed_tokens.weight', 'encoder.block.1.layer.0.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'qa_outputs.bias', 'decoder.final_layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'encoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'encoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'encoder.block.1.layer.1.DenseReluDense.wo.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'encoder.block.0.layer.1.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'encoder.block.0.layer.0.layer_norm.weight', 'encoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'encoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'encoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'shared.weight', 'encoder.embed_tokens.weight', 'qa_outputs.weight', 'encoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    PASSED
    test_modeling_t5.py::T5ModelTest::test_multi_gpu_data_parallel_forward SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5ModelTest::test_past_key_values_format PASSED
    test_modeling_t5.py::T5ModelTest::test_problem_types PASSED
    test_modeling_t5.py::T5ModelTest::test_resize_embeddings_untied PASSED
    test_modeling_t5.py::T5ModelTest::test_resize_position_vector_embeddings PASSED
    test_modeling_t5.py::T5ModelTest::test_resize_tokens_embeddings SKIPPED (Skipped for Gaudi)
    test_modeling_t5.py::T5ModelTest::test_retain_grad_hidden_states_attentions PASSED
    test_modeling_t5.py::T5ModelTest::test_sample_generate PASSED
    test_modeling_t5.py::T5ModelTest::test_sample_generate_dict_output PASSED
    test_modeling_t5.py::T5ModelTest::test_save_load PASSED
    test_modeling_t5.py::T5ModelTest::test_save_load_fast_init_from_base Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmp_k74wbhw and are newly initialized: ['classification_head.out_proj.weight', 'classification_head.dense.weight', 'classification_head.dense.bias', 'classification_head.out_proj.bias', 'decoder.block.0.layer.2.layer_norm.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmp_k74wbhw and are newly initialized: ['classification_head.out_proj.weight', 'classification_head.dense.weight', 'classification_head.dense.bias', 'classification_head.out_proj.bias', 'decoder.block.0.layer.2.layer_norm.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmpectzs0s6 and are newly initialized: ['decoder.block.0.layer.1.layer_norm.weight', 'qa_outputs.bias', 'qa_outputs.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmpectzs0s6 and are newly initialized: ['decoder.block.0.layer.1.layer_norm.weight', 'qa_outputs.bias', 'qa_outputs.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    PASSED
    test_modeling_t5.py::T5ModelTest::test_save_load_fast_init_to_base Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmpc7s_4xpb and are newly initialized: ['transformer.encoder.block.0.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmpc7s_4xpb and are newly initialized: ['transformer.encoder.block.0.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmp37rnccg_ and are newly initialized: ['decoder.block.0.layer.0.SelfAttention.o.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmp37rnccg_ and are newly initialized: ['decoder.block.0.layer.0.SelfAttention.o.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    PASSED
    test_modeling_t5.py::T5ModelTest::test_save_load_keys_to_ignore_on_save PASSED
    test_modeling_t5.py::T5ModelTest::test_shift_right PASSED
    test_modeling_t5.py::T5ModelTest::test_tie_model_weights PASSED
    test_modeling_t5.py::T5ModelTest::test_tied_weights_keys PASSED
    test_modeling_t5.py::T5ModelTest::test_torch_fx PASSED
    test_modeling_t5.py::T5ModelTest::test_torch_fx_output_loss PASSED
    test_modeling_t5.py::T5ModelTest::test_torchscript_output_attentions SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelTest::test_torchscript_output_hidden_state SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelTest::test_torchscript_simple SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelTest::test_training SKIPPED (Skipped for Gaudi : TODO)
    test_modeling_t5.py::T5ModelTest::test_training_gradient_checkpointing SKIPPED (Skipped for Gaudi : TODO)
    test_modeling_t5.py::T5ModelTest::test_v1_1_resize_embeddings PASSED
    test_modeling_t5.py::T5ModelTest::test_with_lm_head PASSED
    test_modeling_t5.py::T5ModelTest::test_with_sequence_classification_head PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_attention_outputs PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_can_use_safetensors PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_config PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_correct_missing_keys PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_cpu_offload SKIPPED (test requires CUDA)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_determinism PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_disk_offload SKIPPED (test requires CUDA)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_feed_forward_chunking PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_forward_signature PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_from_pretrained_no_checkpoint PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_gradient_checkpointing_backward_compatibility PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_gradient_checkpointing_enable_disable PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_head_pruning PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_head_pruning_integration PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_head_pruning_save_load_from_config_init PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_head_pruning_save_load_from_pretrained PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_headmasking PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_hidden_states_output PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_initialization PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_inputs_embeds PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_load_save_without_tied_weights PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_load_with_mismatched_shapes PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_common_attributes PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_is_small PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_main_input_name PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_outputs_equivalence PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallel_beam_search SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallel_equal_results SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallelism SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallelization SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_weights_reload_no_missing_tied_weights Some weights of T5EncoderModel were not initialized from the model checkpoint at /tmp/tmp7n8sg07g and are newly initialized: ['encoder.block.1.layer.1.DenseReluDense.wo.weight', 'encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'encoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.0.layer.1.DenseReluDense.wo.weight', 'encoder.final_layer_norm.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'encoder.block.0.layer.0.layer_norm.weight', 'encoder.block.0.layer.1.layer_norm.weight', 'encoder.block.1.layer.1.layer_norm.weight', 'encoder.block.1.layer.0.layer_norm.weight', 'encoder.block.0.layer.0.SelfAttention.v.weight', 'shared.weight', 'encoder.block.1.layer.0.SelfAttention.q.weight', 'encoder.embed_tokens.weight', 'encoder.block.1.layer.0.SelfAttention.o.weight', 'encoder.block.0.layer.0.SelfAttention.o.weight', 'encoder.block.1.layer.0.SelfAttention.k.weight', 'encoder.block.0.layer.0.SelfAttention.k.weight', 'encoder.block.0.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_multi_gpu_data_parallel_forward SKIPPED (test requires multiple GPUs)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_problem_types PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_resize_embeddings_untied PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_resize_position_vector_embeddings PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_resize_tokens_embeddings SKIPPED (Skipped for Gaudi)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_retain_grad_hidden_states_attentions PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_save_load PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_save_load_fast_init_from_base PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_save_load_fast_init_to_base Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmpgu9tx7sa and are newly initialized: ['decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.final_layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Some weights of CopyClass were not initialized from the model checkpoint at /tmp/tmpgu9tx7sa and are newly initialized: ['decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.final_layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_save_load_keys_to_ignore_on_save PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_tie_model_weights PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_tied_weights_keys PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_torch_fx PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_torch_fx_output_loss PASSED
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_torchscript_output_attentions SKIPPED (test is slow)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_torchscript_output_hidden_state SKIPPED (test is slow)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_torchscript_simple SKIPPED (test is slow)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_training SKIPPED (Skipped for Gaudi : TODO)
    test_modeling_t5.py::T5EncoderOnlyModelTest::test_training_gradient_checkpointing SKIPPED (Skipped for Gaudi : TODO)
    test_modeling_t5.py::T5ModelIntegrationTests::test_contrastive_search_t5 SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_small_byt5_integration_test SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_small_generation SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_small_integration_test SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_small_v1_1_integration_test SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_summarization SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_torch_quant SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_translation_en_to_de SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_translation_en_to_fr SKIPPED (test is slow)
    test_modeling_t5.py::T5ModelIntegrationTests::test_translation_en_to_ro SKIPPED (test is slow)
    test_modeling_t5.py::TestAsymmetricT5::test_defaulting_to_symmetry PASSED
    test_modeling_t5.py::TestAsymmetricT5::test_small_decoder PASSED

======================================================================================= warnings summary =======================================================================================
../../test_modeling_common.py:2043
/home/anneog/github/ankurneog/optimum-habana/tests/transformers/tests/test_modeling_common.py:2043: PytestUnknownMarkWarning: Unknown pytest.mark.accelerate_tests - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
@mark.accelerate_tests

../../test_modeling_common.py:2081
/home/anneog/github/ankurneog/optimum-habana/tests/transformers/tests/test_modeling_common.py:2081: PytestUnknownMarkWarning: Unknown pytest.mark.accelerate_tests - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
@mark.accelerate_tests

../../test_modeling_common.py:2117
/home/anneog/github/ankurneog/optimum-habana/tests/transformers/tests/test_modeling_common.py:2117: PytestUnknownMarkWarning: Unknown pytest.mark.accelerate_tests - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
@mark.accelerate_tests

tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_beam_search_generate
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_beam_search_generate_dict_output
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_beam_search_generate_dict_outputs_use_cache
/home/anneog/github/ankurneog/optimum-habana/optimum/habana/transformers/generation/utils.py:1921: UserWarning: max_length is deprecated in this function, use stopping_criteria=StoppingCriteriaList(MaxLengthCriteria(max_length=max_length)) instead.
warnings.warn(

tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_beam_search_generate
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_contrastive_generate_low_memory
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_sample_generate
/home/anneog/github/ankurneog/optimum-habana/optimum/habana/transformers/generation/utils.py:430: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
warnings.warn(

tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate_dict_output
/home/anneog/github/ankurneog/optimum-habana/optimum/habana/transformers/generation/utils.py:2573: UserWarning: max_length is deprecated in this function, use stopping_criteria=StoppingCriteriaList(MaxLengthCriteria(max_length=max_length)) instead.
warnings.warn(

tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_greedy_generate
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_greedy_generate_dict_outputs
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_greedy_generate_dict_outputs_use_cache
/home/anneog/github/ankurneog/optimum-habana/optimum/habana/transformers/generation/utils.py:1210: UserWarning: max_length is deprecated in this function, use stopping_criteria=StoppingCriteriaList([MaxLengthCriteria(max_length=max_length)]) instead.
warnings.warn(

tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_sample_generate
tests/transformers/tests/models/t5/test_modeling_t5.py::T5ModelTest::test_sample_generate_dict_output
/home/anneog/github/ankurneog/optimum-habana/optimum/habana/transformers/generation/utils.py:1614: UserWarning: max_length is deprecated in this function, use stopping_criteria=StoppingCriteriaList(MaxLengthCriteria(max_length=max_length)) instead.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================== 107 passed, 45 skipped, 17 warnings in 751.38s (0:12:31) ===================================================================

@ssarkar2 ssarkar2 requested a review from regisss October 19, 2023 22:58
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Oct 20, 2023

@ankurneog I took a closer look at these tests. What they do is to basically compare the outputs of generation with a given decoding strategy (e.g. beam search) calling model.generate and calling directly the method associated to the decoding strategy (e.g. model.beam_search). This is just a test to make sure generate works properly.

The issue here is that the generate method does a lot of processing before calling beam_search (or any other decoding strategy), and, in particular, it sets the model kwargs:

# determine whether introduce trim_logits feature

Now, users should never call beam_search directly, they will always call generate. Let's look at the test from closer:

@ankurneog
Copy link
Copy Markdown
Contributor Author

Thanks @regisss for the clear explanation, Please help check this : #482

@ankurneog ankurneog closed this Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants