Load knowledgebase from zipped json file of pd.DataFrame #238

weixuanfu · 2020-01-16T16:26:06Z

Instead of using tsv file, we can use pickle file (which was per-generated from pd.DataFrame) to load the results into AI. This way is very fast because no eval process is needed for converting parameters into python dictionary. The dictionary format can be pickled within pd.DataFrame.

But there is one issue about using pickle on regression knowledgebase: the pickle file is over 200Mb due to its large amounts of results while classification's knowledgebase is only 8Mb.

The text was updated successfully, but these errors were encountered:

weixuanfu · 2020-01-23T17:16:04Z

I tried to use json file instead of pickle file due to large size. The regression knowledgebase in json format is ~30Mb.

weixuanfu · 2020-01-23T17:48:28Z

Hmm, actually, the gzipped pickle file is less than 20Mb while the gzipped tsv file is more than 30Mb, so I think we can add both options (json/pickle).

weixuanfu · 2020-01-28T15:10:48Z

The screenshot shows that drop_duplicates or DataFrame.apply without hash is much faster even I added one more step to convert frozenset back to dict.

weixuanfu · 2020-01-28T15:50:04Z

Hmm, the new solution above is not working once the classification knowledgebase was merged with the large regression knowledge base.

I tried the use a new Json Encoder to dump dict into a json file but pandas cannot read it. So I kept the current permHash solution.

weixuanfu · 2020-01-28T15:55:12Z

I monitored time usage of deduplicating results is not very slow, which just took ~5 seconds in my PC. updating AI with regression knowledgebase step took ~1 minutes, which need some improvement.

weixuanfu changed the title ~~Load knowledgebase via pickle~~ Load knowledgebase from pickle file of pd.DataFrame Jan 16, 2020

weixuanfu added the enhancement label Jan 16, 2020

weixuanfu changed the title ~~Load knowledgebase from pickle file of pd.DataFrame~~ Load knowledgebase from json file of pd.DataFrame Jan 23, 2020

weixuanfu changed the title ~~Load knowledgebase from json file of pd.DataFrame~~ Load knowledgebase from zipped pickle file of pd.DataFrame Jan 23, 2020

weixuanfu changed the title ~~Load knowledgebase from zipped pickle file of pd.DataFrame~~ Load knowledgebase from zipped json file of pd.DataFrame Jan 23, 2020

weixuanfu added a commit that referenced this issue Jan 28, 2020

add json and pkl supports for knowledgebase #238

3f2f074

weixuanfu added a commit that referenced this issue Feb 4, 2020

add unit test for kb loader from json and pickle #238

b91071e

hjwilli mentioned this issue Jan 21, 2021

evaluate efficency of knowledgebase_loader methods #236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load knowledgebase from zipped json file of pd.DataFrame #238

Load knowledgebase from zipped json file of pd.DataFrame #238

weixuanfu commented Jan 16, 2020 •

edited

Loading

weixuanfu commented Jan 23, 2020

weixuanfu commented Jan 23, 2020

weixuanfu commented Jan 28, 2020 •

edited

Loading

weixuanfu commented Jan 28, 2020 •

edited

Loading

weixuanfu commented Jan 28, 2020

Load knowledgebase from zipped json file of pd.DataFrame #238

Load knowledgebase from zipped json file of pd.DataFrame #238

Comments

weixuanfu commented Jan 16, 2020 • edited Loading

weixuanfu commented Jan 23, 2020

weixuanfu commented Jan 23, 2020

weixuanfu commented Jan 28, 2020 • edited Loading

weixuanfu commented Jan 28, 2020 • edited Loading

weixuanfu commented Jan 28, 2020

weixuanfu commented Jan 16, 2020 •

edited

Loading

weixuanfu commented Jan 28, 2020 •

edited

Loading

weixuanfu commented Jan 28, 2020 •

edited

Loading