-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load knowledgebase from zipped json file of pd.DataFrame #238
Comments
I tried to use json file instead of pickle file due to large size. The regression knowledgebase in json format is ~30Mb. |
Hmm, actually, the gzipped pickle file is less than 20Mb while the gzipped tsv file is more than 30Mb, so I think we can add both options (json/pickle). |
Hmm, the new solution above is not working once the classification knowledgebase was merged with the large regression knowledge base. I tried the use a new Json Encoder to dump dict into a json file but pandas cannot read it. So I kept the current permHash solution. |
I monitored time usage of deduplicating results is not very slow, which just took ~5 seconds in my PC. |
Instead of using tsv file, we can use pickle file (which was per-generated from pd.DataFrame) to load the results into AI. This way is very fast because no
eval
process is needed for converting parameters into python dictionary. The dictionary format can be pickled within pd.DataFrame.But there is one issue about using pickle on regression knowledgebase: the pickle file is over 200Mb due to its large amounts of results while classification's knowledgebase is only 8Mb.
The text was updated successfully, but these errors were encountered: