Custom dataset functionality #42

DmitryKey · 2021-09-30T09:13:23Z

I need to implement a custom dataset and its handling and been thinking about the easiest way to approach it.

I've implemented something half-way through it that kept me going and allowed to plug-in a custom dataset -- in fact it is a dataset derived from BIGANN by reducing dimensionality using a neural network.

I will show the code of what I needed to change and happy to discuss this further!

maumueller · 2021-10-04T07:16:22Z

@DmitryKey I'm not sure where the actual dimensionality reduction happens in #43? It seems that you just needed to place the entry into datasets.py, which is the right approach.

What is the pipeline that you had in mind that could improve the process?

DmitryKey · 2021-10-05T14:55:48Z

@maumueller there is no implementation for dim reduction step -- it is done elsewhere, in a separate neural network produced by my team peer.

In order for me to try this new dataset, with reduced dimensionality, I need to treat it as a 7th dataset, if this makes sense. Because it will have different dtype compared to original (non reduced) and different (lower) number of dimensions.

So I was thinking that in addition to changing datasets.py, I'd need to change the I/O, because my dataset can live somewhere else, like local disk / blob storage.
One other issue I experienced was that I had to still name my dataset with some recognized name, known to the framework -- ideally I would like to control this part as well, but by changing DATASETS dictionary I don't see how that connects to the I/O, like dataset path.

… dim reduced dataset

DmitryKey mentioned this issue Sep 30, 2021

A stab at allowing custom datasets #43

Closed

DmitryKey added a commit to DmitryKey/big-ann-benchmarks that referenced this issue Oct 5, 2021

harsha-simhadri#42 undid the hack with hardcoded values, required for…

cc1d327

… dim reduced dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom dataset functionality #42

Custom dataset functionality #42

DmitryKey commented Sep 30, 2021

maumueller commented Oct 4, 2021

DmitryKey commented Oct 5, 2021 •

edited

Loading

Custom dataset functionality #42

Custom dataset functionality #42

Comments

DmitryKey commented Sep 30, 2021

maumueller commented Oct 4, 2021

DmitryKey commented Oct 5, 2021 • edited Loading

DmitryKey commented Oct 5, 2021 •

edited

Loading