Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty Results #28

Open
superctj opened this issue Jul 3, 2024 · 2 comments
Open

Empty Results #28

superctj opened this issue Jul 3, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@superctj
Copy link

superctj commented Jul 3, 2024

Thank you for open-sourcing this package! I was wondering if the following behavior is expected when running metacrafter scan-file --format short world+City.csv :

Processing file /data/bird_sql/train_csv/world+City.csv

2024-07-03 02:21:56,613 - root - DEBUG - Start processing None

2024-07-03 02:21:56,632 - root - DEBUG - Processing 1000 records of None

2024-07-03 02:21:56,651 - root - DEBUG - Processing 2000 records of None

2024-07-03 02:21:56,670 - root - DEBUG - Processing 3000 records of None

2024-07-03 02:21:56,689 - root - DEBUG - Processing 4000 records of None

No results

The top-5 rows of the csv file are:

ID,Name,CountryCode,District,Population

1,Kabul,AFG,Kabol,1780000

2,Qandahar,AFG,Qandahar,237500

3,Herat,AFG,Herat,186800

4,Mazar-e-Sharif,AFG,Balkh,127800

I was expecting the CountryCode column will be recognized by metacrafter. Is there anything I am missing or did wrong?

By the way, I found the message "Start processing None" is confusing, which is attributed to this line of setting fromfile to None. Probably these debug messages can be improved to be more informative.

@ivbeg
Copy link
Collaborator

ivbeg commented Jul 3, 2024

@superctj it happend since by some reason identification rules not installed with the package. Rules are YAML files that loaded during tool launch.
Still metacrafter uses file .metacrafter to find rules if they are not in package dir. You could configure to the rules path in repository https://github.com/apicrafter/metacrafter

For example my .metacrafter file looks like

rulepath:
  - /home/ibegtin/reps/metacrafter/rules
  - /home/ibegtin/reps/metacrafter-rules/rules

and it's located in the home dir.

Second rule path is to the metacrafter-rulesrepository https://github.com/apicrafter/metacrafter-rules
It's not yet python package and you need to install it seperately with python setup.py installcommand since some rules use addition python code.

Final result with CSV file from these top-5 rows should look like this
изображение

I will take a look deeper why rules were not installed and probably switch to updating rules from repository automatically on first launch.

About debug messages, sure you right, it should be more polished. I will take a look too

@ivbeg ivbeg self-assigned this Jul 3, 2024
@ivbeg ivbeg added the bug Something isn't working label Jul 3, 2024
@superctj
Copy link
Author

superctj commented Jul 3, 2024

Thank you @ivbeg for the quick response! Looking forward to the new release of the package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants