You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Impressed by the simplicity and accuracy of GuessLang model, but the model is built on Tensorflow Estimator which is out-dated and cumbersome to use in other language, I want to update the GuessLang model to TF 2.0 with better architecture for both Python and other language for deployment, and hopefully a higher accuracy.
I got this error when running GuessLangTools to download and prepare the dataset. How can I get around this problem? For example: manually add TOML repos, skip this language....
Thanks
07:54:07 WARNING: Checking extensions: "h" is associated with more than one language: ['C', 'C++', 'Objective-C']
07:54:07 WARNING: Checking extensions: "hh" is associated with more than one language: ['C++', 'PHP']
07:54:07 WARNING: Checking extensions: "m" is associated with more than one language: ['Matlab', 'Objective-C']
07:54:07 WARNING: Checking extensions: "pl" is associated with more than one language: ['Perl', 'Prolog']
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/01_repositories_dataset.tar.gz
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/02_repositories_dataset.csv
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/03_shrunk_repositories_dataset.csv
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/04_altered_repositories_dataset.csv
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/05_selected_repositories.csv
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/06_prepare_repositories_to_download.csv
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/07_downloaded_repositories.csv
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/09_deduplicated_files.csv
07:54:07 INFO: Found in the cache: /mnt/hdd/guesslangtools/dataset/09_deduplicated_files.csv
07:54:07 INFO: Split repositories by usage: train, valid &test
07:54:07 INFO: This operation should take few seconds...
07:54:35 INFO: Total downloaded repositories: 272292
07:54:35 INFO: Assembly nb repositories, train: 4408, valid: 944, test: 944
07:54:35 INFO: Batchfile nb repositories, train: 3956, valid: 847, test: 847
07:54:35 INFO: C nb repositories, train: 4403, valid: 943, test: 943
07:54:35 INFO: C# nb repositories, train: 4297, valid: 920, test: 920
07:54:35 INFO: C++ nb repositories, train: 4401, valid: 943, test: 943
07:54:35 INFO: Clojure nb repositories, train: 4871, valid: 1043, test: 1043
07:54:35 INFO: CMake nb repositories, train: 4084, valid: 875, test: 875
07:54:35 INFO: COBOL nb repositories, train: 225, valid: 48, test: 48
07:54:35 INFO: CoffeeScript nb repositories, train: 4063, valid: 870, test: 870
07:54:35 INFO: CSS nb repositories, train: 3930, valid: 841, test: 841
07:54:35 INFO: CSV nb repositories, train: 2, valid: 1, test: 1
07:54:35 INFO: Dart nb repositories, train: 2825, valid: 605, test: 605
07:54:35 INFO: DM nb repositories, train: 269, valid: 57, test: 57
07:54:35 INFO: Dockerfile nb repositories, train: 1683, valid: 360, test: 360
07:54:35 INFO: Elixir nb repositories, train: 4065, valid: 871, test: 871
07:54:35 INFO: Erlang nb repositories, train: 3698, valid: 792, test: 792
07:54:35 INFO: Fortran nb repositories, train: 4411, valid: 945, test: 945
07:54:35 INFO: Go nb repositories, train: 4494, valid: 962, test: 962
07:54:35 INFO: Groovy nb repositories, train: 4561, valid: 977, test: 977
07:54:35 INFO: Haskell nb repositories, train: 4944, valid: 1059, test: 1059
07:54:35 INFO: HTML nb repositories, train: 4159, valid: 890, test: 890
07:54:35 INFO: INI nb repositories, train: 4, valid: 1, test: 1
07:54:35 INFO: Java nb repositories, train: 4427, valid: 948, test: 948
07:54:35 INFO: JavaScript nb repositories, train: 4378, valid: 937, test: 937
07:54:35 INFO: JSON nb repositories, train: 33, valid: 6, test: 6
07:54:35 INFO: Julia nb repositories, train: 3966, valid: 849, test: 849
07:54:35 INFO: Kotlin nb repositories, train: 4234, valid: 906, test: 906
07:54:35 INFO: Lisp nb repositories, train: 4518, valid: 968, test: 968
07:54:35 INFO: Lua nb repositories, train: 4183, valid: 895, test: 895
07:54:35 INFO: Makefile nb repositories, train: 3968, valid: 849, test: 849
07:54:35 INFO: Markdown nb repositories, train: 2393, valid: 512, test: 512
07:54:35 INFO: Matlab nb repositories, train: 4691, valid: 1005, test: 1005
07:54:35 INFO: Objective-C nb repositories, train: 4483, valid: 960, test: 960
07:54:35 INFO: OCaml nb repositories, train: 4441, valid: 951, test: 951
07:54:35 INFO: Pascal nb repositories, train: 4229, valid: 906, test: 906
07:54:35 INFO: Perl nb repositories, train: 4389, valid: 940, test: 940
07:54:36 INFO: PHP nb repositories, train: 3539, valid: 757, test: 757
07:54:36 INFO: PowerShell nb repositories, train: 3889, valid: 833, test: 833
07:54:36 INFO: Prolog nb repositories, train: 3443, valid: 737, test: 737
07:54:36 INFO: Python nb repositories, train: 4458, valid: 954, test: 954
07:54:36 INFO: R nb repositories, train: 4743, valid: 1016, test: 1016
07:54:36 INFO: Ruby nb repositories, train: 4421, valid: 947, test: 947
07:54:36 INFO: Rust nb repositories, train: 3771, valid: 808, test: 808
07:54:36 INFO: Scala nb repositories, train: 4289, valid: 918, test: 918
07:54:36 INFO: Shell nb repositories, train: 4138, valid: 886, test: 886
07:54:36 INFO: SQL nb repositories, train: 3069, valid: 657, test: 657
07:54:36 INFO: Swift nb repositories, train: 4310, valid: 923, test: 923
07:54:36 INFO: TeX nb repositories, train: 4545, valid: 973, test: 973
Traceback (most recent call last):
File "/home/cao/anaconda3/envs/onnx/bin/gltool", line 8, in<module>sys.exit(main())
File "/home/cao/anaconda3/envs/onnx/lib/python3.7/site-packages/guesslangtools/__main__.py", line 153, in main
run_workflow(config)
File "/home/cao/anaconda3/envs/onnx/lib/python3.7/site-packages/guesslangtools/app.py", line 19, in run_workflow
source_files.split(config)
File "/home/cao/anaconda3/envs/onnx/lib/python3.7/site-packages/guesslangtools/common.py", line 175, in wrapped
result = func(config, *args, **kw)
File "/home/cao/anaconda3/envs/onnx/lib/python3.7/site-packages/guesslangtools/workflow/source_files.py", line 268, in split
f'Need more than {MIN_REPOSITORIES}, '
RuntimeError: Need more than 3, only 2 repositories usable for language TOML
The text was updated successfully, but these errors were encountered:
Did you ever figure out a solution? I'm getting a different error UnicodeDecodeError in repositories_dataset.py", line 67, but really just looking for an updated version of the model using TF 2, so I can convert it to CoreML
Impressed by the simplicity and accuracy of GuessLang model, but the model is built on Tensorflow Estimator which is out-dated and cumbersome to use in other language, I want to update the GuessLang model to TF 2.0 with better architecture for both Python and other language for deployment, and hopefully a higher accuracy.
I got this error when running GuessLangTools to download and prepare the dataset. How can I get around this problem? For example: manually add TOML repos, skip this language....
Thanks
The text was updated successfully, but these errors were encountered: