Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ValueError: not enough values to unpack (expected 2, got 1)" Output when Executing the Model #31

Open
yagmurrozdemir opened this issue Mar 20, 2024 · 0 comments

Comments

@yagmurrozdemir
Copy link

yagmurrozdemir commented Mar 20, 2024

I get the below output when I try to run the model by executing "run.sh" file that you have provided.
`root@82167638a7b3:/paper_evaporate/evaporate/evaporate# bash run.sh
Data lake
Chunking files: 100%|████████████████████████| 100/100 [00:00<00:00, 516.76it/s]

Data-lake: fda_510ks, Train size: 10

Extracting purpose for submission (1 / 16)
For attribute purpose for submission
-- Starting with 1144 chunks
-- Ending with 100 chunks
-- 93 starting chunks in sample files
-- 10 chunks in sample files
Extracting attribute purpose for submission using LM: 100%|█| 10/10 [00:00<00:00
Generating functions for attribute purpose for submission: 100%|█| 10/10 [00:00<
Extraction fraction: 1.0
Top 10 scripts:

  • function_10; Score: {'average_f1': 1.0, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 1.0, 'prior_median_f1': 1.0}
  • function_1; Score: {'average_f1': 0.9000685871056241, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.9000685871056241, 'prior_median_f1': 1.0}
  • function_3; Score: {'average_f1': 0.9000685871056241, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.9000685871056241, 'prior_median_f1': 1.0}
  • function_5; Score: {'average_f1': 0.9000685871056241, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.9000685871056241, 'prior_median_f1': 1.0}
  • function_7; Score: {'average_f1': 0.9000685871056241, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.9000685871056241, 'prior_median_f1': 1.0}
  • function_9; Score: {'average_f1': 0.9000685871056241, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.9000685871056241, 'prior_median_f1': 1.0}
  • function_11; Score: {'average_f1': 0.9000685871056241, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.9000685871056241, 'prior_median_f1': 1.0}
  • function_15; Score: {'average_f1': 0.9000685871056241, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.9000685871056241, 'prior_median_f1': 1.0}
  • function_0; Score: {'average_f1': 0.8753193960511034, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.8753193960511034, 'prior_median_f1': 1.0}
  • function_13; Score: {'average_f1': 0.8753193960511034, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 0.8753193960511034, 'prior_median_f1': 1.0}
    Best script overall: function_10; Score: {'average_f1': 1.0, 'median_f1': 1.0, 'extraction_fraction': 1.0, 'prior_average_f1': 1.0, 'prior_median_f1': 1.0}
    Apply the scripts to the data lake and save the metadata. Taking the top 10 scripts per field.
    Applying function function_10...
    Applying function function_1...
    Applying function function_3...
    Applying function function_5...
    Applying function function_7...
    Applying function function_9...
    Applying function function_11...
    Applying function function_15...
    Applying function function_0...
    Applying function function_13...
    Applying key function_10: 100%|███████████| 100/100 [00:00<00:00, 348364.12it/s]
    Applying key function_1: 100%|█████████████| 100/100 [00:00<00:00, 21459.73it/s]
    Applying key function_3: 100%|█████████████| 100/100 [00:00<00:00, 51577.77it/s]
    Applying key function_5: 100%|█████████████| 100/100 [00:00<00:00, 52083.75it/s]
    Applying key function_7: 100%|█████████████| 100/100 [00:00<00:00, 21242.36it/s]
    Applying key function_9: 100%|█████████████| 100/100 [00:00<00:00, 50321.58it/s]
    Applying key function_11: 100%|████████████| 100/100 [00:00<00:00, 64537.68it/s]
    Applying key function_15: 100%|████████████| 100/100 [00:00<00:00, 21027.24it/s]
    Applying key function_0: 100%|████████████| 100/100 [00:00<00:00, 580125.03it/s]
    Applying key function_13: 100%|███████████| 100/100 [00:00<00:00, 788403.01it/s]
    100%|████████████████████████████████████| 100/100 [00:00<00:00, 3153612.03it/s]
    /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
    return _methods._mean(a, axis=axis, dtype=dtype,
    /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
    ret = ret.dtype.type(ret / rcount)
    Average abstains across documents: nan
    Average unique votes per document: nan
    Shape of test_votes: (0,)
    Shape of test_votes: []
    Traceback (most recent call last):
    File "/paper_evaporate/evaporate/evaporate/run_profiler.py", line 510, in
    main()
    File "/paper_evaporate/evaporate/evaporate/run_profiler.py", line 506, in main
    run_experiment(profiler_args)
    File "/paper_evaporate/evaporate/evaporate/run_profiler.py", line 416, in run_experiment
    num_toks, success = run_profiler(
    File "/paper_evaporate/evaporate/evaporate/profiler.py", line 685, in run_profiler
    file2metadata, num_toks = combine_extractions(
    File "/paper_evaporate/evaporate/evaporate/profiler.py", line 159, in combine_extractions
    preds, used_deps, missing_files = run_ws(
    File "/paper_evaporate/evaporate/evaporate/./weak_supervision/run_ws.py", line 195, in run_ws
    n_test, m = test_votes.shape
    ValueError: not enough values to unpack (expected 2, got 1)
    Data lake
    Chunking files: 100%|████████████████████████| 100/100 [00:00<00:00, 505.13it/s]

Data-lake: fda_510ks, Train size: 10
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K150526.txt: 5
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K151046.txt: 16
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K180886.txt: 5
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K181525.txt: 9
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K151265.txt: 11
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K171641.txt: 10
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K182472.txt: 4
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K161714.txt: 8
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K162042.txt: 13
Chunks in sample file ./data/evaporate/fda-ai-pmas/510k/K170974.txt: 12
Directly extracting metadata from chunks: 100%|█| 10/10 [00:00<00:00, 50.24it/s]
@k = 16 --- Recall: 0.500, Precision: 0.500, F1: 0.500
@k = 1 --- Recall: 0.000, Precision: 0.000, F1: 0.000
@k = 5 --- Recall: 0.250, Precision: 0.800, F1: 0.381
@k = 10 --- Recall: 0.375, Precision: 0.600, F1: 0.462
@k = 15 --- Recall: 0.438, Precision: 0.467, F1: 0.452
@k = 20 --- Recall: 0.625, Precision: 0.500, F1: 0.556
@k = 25 --- Recall: 0.625, Precision: 0.500, F1: 0.556
@k = 30 --- Recall: 0.625, Precision: 0.500, F1: 0.556
@k = 35 --- Recall: 0.625, Precision: 0.500, F1: 0.556
@k = 40 --- Recall: 0.625, Precision: 0.500, F1: 0.556
@k = 45 --- Recall: 0.625, Precision: 0.500, F1: 0.556
@k = 50 --- Recall: 0.625, Precision: 0.500, F1: 0.556
@k = 100 --- Recall: 0.625, Precision: 0.500, F1: 0.556
@k = 20 --- Recall: 0.625, Precision: 0.500, F1: 0.556

Extracting 510(k) number (1 / 20)
For attribute 510(k) number
-- Starting with 1144 chunks
-- Ending with 174 chunks
-- 93 starting chunks in sample files
-- 18 chunks in sample files
Extracting attribute 510(k) number using LM: 100%|█| 10/10 [00:00<00:00, 283.58i
Generating functions for attribute 510(k) number: 100%|█| 10/10 [00:00<00:00, 12
Extraction fraction: 1.0
Top 10 scripts:

  • function_34; Score: {'average_f1': 0.9473684210526316, 'median_f1': 0.9473684210526316, 'extraction_fraction': 1.0, 'prior_average_f1': 0.09473684210526316, 'prior_median_f1': 0.0}
  • function_29; Score: {'average_f1': 0.7777777777777777, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.2333333333333333, 'prior_median_f1': 0.0}
  • function_26; Score: {'average_f1': 0.6666666666666666, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.13333333333333333, 'prior_median_f1': 0.0}
  • function_31; Score: {'average_f1': 0.6666666666666666, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.13333333333333333, 'prior_median_f1': 0.0}
  • function_0; Score: {'average_f1': 0.6415151515151516, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.6415151515151516, 'prior_median_f1': 0.6666666666666666}
  • function_28; Score: {'average_f1': 0.6415151515151516, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.6415151515151516, 'prior_median_f1': 0.6666666666666666}
  • function_1; Score: {'average_f1': 0.6387205387205388, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.5748484848484849, 'prior_median_f1': 0.6666666666666666}
  • function_2; Score: {'average_f1': 0.6387205387205388, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.5748484848484849, 'prior_median_f1': 0.6666666666666666}
  • function_3; Score: {'average_f1': 0.6387205387205388, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.5748484848484849, 'prior_median_f1': 0.6666666666666666}
  • function_6; Score: {'average_f1': 0.6387205387205388, 'median_f1': 0.6666666666666666, 'extraction_fraction': 1.0, 'prior_average_f1': 0.5748484848484849, 'prior_median_f1': 0.6666666666666666}
    Best script overall: function_34; Score: {'average_f1': 0.9473684210526316, 'median_f1': 0.9473684210526316, 'extraction_fraction': 1.0, 'prior_average_f1': 0.09473684210526316, 'prior_median_f1': 0.0}
    Apply the scripts to the data lake and save the metadata. Taking the top 10 scripts per field.
    Applying function function_34...
    Applying function function_29...
    Applying function function_26...
    Applying function function_31...
    Applying function function_0...
    Applying function function_28...
    Applying function function_1...
    Applying function function_2...
    Applying function function_3...
    Applying function function_6...
    Applying key function_34: 100%|███████████| 100/100 [00:00<00:00, 480447.19it/s]
    Applying key function_29: 100%|███████████| 100/100 [00:00<00:00, 353949.70it/s]
    Applying key function_26: 100%|███████████| 100/100 [00:00<00:00, 465516.54it/s]
    Applying key function_31: 100%|███████████| 100/100 [00:00<00:00, 522329.27it/s]
    Applying key function_0: 100%|████████████| 100/100 [00:00<00:00, 323634.57it/s]
    Applying key function_28: 100%|███████████| 100/100 [00:00<00:00, 309314.45it/s]
    Applying key function_1: 100%|████████████| 100/100 [00:00<00:00, 565270.08it/s]
    Applying key function_2: 100%|████████████| 100/100 [00:00<00:00, 326404.98it/s]
    Applying key function_3: 100%|████████████| 100/100 [00:00<00:00, 423667.07it/s]
    Applying key function_6: 100%|████████████| 100/100 [00:00<00:00, 292693.93it/s]
    100%|█████████████████████████████████████| 100/100 [00:00<00:00, 961996.33it/s]
    /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
    return _methods._mean(a, axis=axis, dtype=dtype,
    /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
    ret = ret.dtype.type(ret / rcount)
    Average abstains across documents: nan
    Average unique votes per document: nan
    Shape of test_votes: (0,)
    Shape of test_votes: []
    Traceback (most recent call last):
    File "/paper_evaporate/evaporate/evaporate/run_profiler.py", line 510, in
    main()
    File "/paper_evaporate/evaporate/evaporate/run_profiler.py", line 506, in main
    run_experiment(profiler_args)
    File "/paper_evaporate/evaporate/evaporate/run_profiler.py", line 416, in run_experiment
    num_toks, success = run_profiler(
    File "/paper_evaporate/evaporate/evaporate/profiler.py", line 685, in run_profiler
    file2metadata, num_toks = combine_extractions(
    File "/paper_evaporate/evaporate/evaporate/profiler.py", line 159, in combine_extractions
    preds, used_deps, missing_files = run_ws(
    File "/paper_evaporate/evaporate/evaporate/./weak_supervision/run_ws.py", line 195, in run_ws
    n_test, m = test_votes.shape
    ValueError: not enough values to unpack (expected 2, got 1)
    `

Also, since the directory structure desired by the code is not very clearly stated, I arranged the directory structure through my own interpretation of things and modified the "run.sh"'s arguments as follows,
python3 run_profiler.py --data_lake fda_510ks --num_attr_to_cascade 50 --num_top_k_scripts 10 --train_size 10 --combiner_mode ws --use_dynamic_backoff --KEYS "$keys" --data_dir "./data/evaporate/fda-ai-pmas/510k/" --base_data_dir "./data/evaporate/data/fda_510ks" --gold_extractions_file "./data/evaporate/data/fda_510ks/table.json" python3 run_profiler.py --data_lake fda_510ks --num_attr_to_cascade 50 --num_top_k_scripts 10 --train_size 10 --combiner_mode ws --use_dynamic_backoff --KEYS "$keys" --do_end_to_end --data_dir "./data/evaporate/fda-ai-pmas/510k/" --base_data_dir "./data/evaporate/data/fda_510ks" --gold_extractions_file "./data/evaporate/data/fda_510ks/table.json"
The directory structure I have used can, hopefully, be understood from the above given paths. Can you help me identify the problem here? I guess the problem is about the file paths (so the directory structure), but I was not able to find the appropriate structure although I have tried every combination I can think of. Can you share the exact directory structure I should use so that I would be able to use the original "run.sh" file directly and successfully run the model? Or else, can you specify, for my own structure, what to do differently? Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant