-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REQUEST] dataloader accept information of 'selected fields' #236
Comments
@thfai2000 @leebyron I would like to contribute to this. Shall I go ahead? |
@anweshknayak I'm glad that you look into my problem. Thanks. Actually I've started brainstorming. Let me share my thoughts. One of the ways is to extend DataLoader though it may be not a efficient way in aspect of performance. I don't know. Below is my draft..(in a rush). I've never run it or tested it. XD class DataWithFieldsLoader extends DataLoader {
// My terminology:
// fieldSet = an array of fields e.g. ['field1', 'field2', 'field3']
// info = a object {key, fields} which fields is a fieldSet
// e.g. {key: 'key1', fields:['field1', 'field2']}
// into contains information of what fields are being selected from database
constructor(batchLoadFn, options = {}){
let actualBatchLoadFn = (infoArray) => {
// spread infoArray into and de-duplicate some fields
// make a array of fields with unique field names
let {keys, fields} = destructInfoArray(infoArray)
batchLoadFn(keys, fields)
}
let resolvedKey = options.cacheKeyFn && options.cacheKeyFn(info.key)
let actualCacheKeyFn = info => `${resolvedKey || info.key}#${info.fields.join(',')}`
super(actualBatchLoadFn, {
...options,
cacheKeyFn: actualCacheKeyFn
})
// we need a map that record what fields a data value contains
this.fieldMap = new Map()
this.originalCacheKeyFn = options.cacheKeyFn || (key => key)
}
load(key, fields){
let [value] = this.loadMany([key], fields)
return value
}
loadMany(keys, fields) {
let infoArray = constructInfoArray(keys, fields)
infoArray = infoArray.map(info => {
let fieldSetArray = this.fieldMap.get(this.originalCacheKeyFn(info.key) )
// e.g. fieldSetArray = [ [field1, field3], [field1, field2], [field3, field4] ]
// it is history of fields retrieval of object with that key
// we have to find the first fieldSet that
// fillful the basic requirement stated in the info object
let foundFieldSet = null
if(fieldSetArray){
foundFieldSet = fieldSetArray.find(set => includesAllFields(set, info.fields) )
}
return {
...info,
fields: foundFieldSet || info.fields
}
})
let values = super.loadMany(infoArray)
// record the latest fieldSet into fieldMap
// assumption: values.length === infoArray.length
values.forEach( (value, idx) => {
let fieldSetArray = this.fieldMap.get(this.originalCacheKeyFn(info.key) )
if(!fieldSetArray){
fieldSetArray = []
this.fieldMap.set(this.originalCacheKeyFn(info.key), fieldSetArray)
}
// TODO: don't push the fieldSet if same exists
fieldSetArray.push(infoArray[idx].fields)
})
return values
}
}
function constructInfoArray(keys, fields) {
//TODO
//return [{key, fields}, {key, fields}]
}
function destructInfoArray(infoArray) {
//TODO
// return {keys, fields}
}
function includesAllFields(targetFieldSet, searchFields){
//TODO:
// return true if it finds the intersection
} |
Dataloader currently is popular and widely used by many developers. so a big change/feature on this standard would need careful consideration. I think it may be a good idea to make a POC first and do some tests on some projects (e.g. my private web system) to examine the actual benefit. If the benefit maybe small but the change is big, it isn't worth. I created a repository for this. Please feel free to edit or use. |
I don't think there is anything stopping you from doing this with the current API. You don't just have to pass IDs to the DataLoader; it can be anything you want. So you could for example have this: carLoader.loadMany([
{ id: 1, fields: ["field1", "field2"] }],
{ id: 2, fields: ["field1"] },
])` In your DataLoader you can map through the keys and merge any for the same IDs before retrieving. DataLoader takes a
So I think you have all the parts necessary to do this now. It just requires a bit more work than a normal DataLoader that just does |
For those looking for a solution: import DataLoader from "dataloader";
import { groupBy } from "lodash";
/**
* -----------------
* createFieldLoader
* -----------------
*/
type KeyType = string | number;
type KeyAndFields<K, F> = { key: K; fields?: F[] };
function createFullHash(k: string, fieldHash: string) {
return `${k}:${fieldHash}`;
}
function addHashes<K, F>(keyAndFields: KeyAndFields<K, F>) {
const fieldSet = new Set(keyAndFields.fields);
const fieldHash = "f:" + Array.from(fieldSet).sort().join(":");
const fullHash = createFullHash(`${keyAndFields.key}`, fieldHash);
return { ...keyAndFields, fieldHash, fullHash };
}
/**
* A Dataloader that will batch the loading of data for same set of fields.
* Requires sortIdKey to be passed to help it find the values to be sorted against
*/
export function createFieldLoader<
V,
F extends keyof V,
K extends KeyType = string
>(batchLoadFn: (keys: K[], fields?: F[]) => Promise<V[]>, sortIdKey: keyof V) {
type LoadFnKey = { key: K; fields?: F[] };
return new DataLoader(
async (loadKeys: LoadFnKey[]) => {
const loadKeysWithHash = loadKeys.map(k => addHashes(k));
const store = new Map<string, V>();
//1. Form batches to load at once based on fieldHash. all ids with same field hash should be loaded at the same time.
const fieldHashGroups = groupBy(loadKeysWithHash, "fieldHash");
await Promise.all(
Object.entries(fieldHashGroups).map(async ([fieldHash, keysToLoad]) => {
//2. In each batch, load values based on the id and fields. Make sure the id field is present here.
const ids = keysToLoad.map(k => k.key);
let fields = keysToLoad[0]?.fields;
if (fields?.length) {
fields = Array.from(new Set([...fields, sortIdKey as F])); // Make sure we add the id field here
}
const values = await batchLoadFn(ids, fields);
//3. Create fullHash again for each returned value and store it in the store.
values.forEach(v => {
const fullHash = createFullHash(`${v[sortIdKey]}`, fieldHash);
store.set(fullHash, v);
});
})
);
//4. Get values based on the fullHash in the same order as it was requested.
return loadKeysWithHash.map(lk => store.get(lk.fullHash));
},
{ cacheKeyFn: kf => addHashes(kf).fullHash }
);
}
Test cases:
import { createFieldLoader } from "./dataLoader";
type Model = {
id: string;
title: string;
eid: string;
description: string;
};
const A: Model = { id: "a", title: "A", eid: "1", description: "A-Model" };
const B: Model = { id: "b", title: "B", eid: "2", description: "B-Model" };
const C: Model = { id: "c", title: "C", eid: "3", description: "C-Model" };
const D: Model = { id: "d", title: "D", eid: "4", description: "D-Model" };
class MockDataModel {
static items = [D, B, A, C];
static async getByIds(ids: string[]) {
return MockDataModel.items.filter(item => ids.includes(item.id));
}
static async getByIdsWithFields(
ids: Model["id"][],
fields?: (keyof Model)[]
) {
const items = await MockDataModel.getByIds(ids);
if (!fields?.length) {
return items;
}
return items.map(item => {
const newItem: Partial<Model> = {};
fields.forEach(f => (newItem[f] = item[f]));
return newItem;
});
}
static async getByExtenalIds(eids: string[]) {
return MockDataModel.items.filter(item => eids.includes(item.eid));
}
static async getByExtenalIdsWithFields(
eids: string[],
fields?: (keyof Model)[]
) {
const items = await MockDataModel.getByExtenalIds(eids);
if (!fields?.length) {
return items;
}
return items.map(item => {
const newItem: Partial<Model> = {};
fields.forEach(f => (newItem[f] = item[f]));
return newItem;
});
}
}
describe("createFieldLoader", () => {
const fieldLoadFn = MockDataModel.getByIdsWithFields;
it("should sort the items in the order asked for", async () => {
const loader = createFieldLoader(fieldLoadFn, "id");
const [a, b, c] = await Promise.all([
loader.load({ key: A.id }),
loader.load({ key: B.id }),
loader.load({ key: C.id })
]);
expect(a).toEqual(A);
expect(b).toEqual(B);
expect(c).toEqual(C);
});
it("should only load duplicate items once", async () => {
const spyLoadFn = jest.fn(fieldLoadFn);
const loader = createFieldLoader(spyLoadFn, "id");
await Promise.all([
loader.load({ key: A.id }),
loader.load({ key: B.id }),
loader.load({ key: B.id }),
loader.load({ key: B.id }),
loader.load({ key: B.id })
]);
expect(spyLoadFn).toBeCalledWith(["a", "b"], undefined);
});
it("should return undefined for the items not existing", async () => {
const loader = createFieldLoader(fieldLoadFn, "id");
const [a, y, b, x] = await Promise.all([
loader.load({ key: A.id }),
loader.load({ key: "y" }),
loader.load({ key: B.id }),
loader.load({ key: "x" })
]);
expect(a).toEqual(A);
expect(b).toEqual(B);
expect(x).toBeUndefined();
expect(y).toBeUndefined();
});
it("should only load with only the fields requested", async () => {
const loader = createFieldLoader(fieldLoadFn, "id");
const [a1, a2, a3, a4, b] = await Promise.all([
loader.load({ key: A.id, fields: ["id", "eid"] }),
loader.load({ key: A.id, fields: ["id", "eid"] }),
loader.load({ key: A.id, fields: ["description"] }),
loader.load({ key: A.id }),
loader.load({ key: B.id })
]);
// should load everything for B since we didnt request any specific fields for it.
expect(b).toEqual(B);
expect(a1).toEqual({ id: A.id, eid: A.eid });
expect(a2).toEqual({ id: A.id, eid: A.eid });
expect(a3).toEqual({ id: A.id, description: A.description });
// should load everything for a4 since we didnt request any specific fields for it.
expect(a4).toEqual(A);
});
it("should work for other ids too in the data model", async () => {
const loader = createFieldLoader(
MockDataModel.getByExtenalIdsWithFields,
"eid"
);
const [a1, a2, a3, a4, b] = await Promise.all([
loader.load({ key: A.eid, fields: ["id", "eid"] }),
loader.load({ key: A.eid, fields: ["id", "eid"] }),
loader.load({ key: A.eid, fields: ["id", "description"] }),
loader.load({ key: A.eid }),
loader.load({ key: B.eid })
]);
// should load everything for B since we didnt request any specific fields for it.
expect(b).toEqual(B);
expect(a1).toEqual({ id: A.id, eid: A.eid });
expect(a2).toEqual({ id: A.id, eid: A.eid });
expect(a3).toEqual({ id: A.id, description: A.description, eid: A.eid });
// should load everything for a4 since we didnt request any specific fields for it.
expect(a4).toEqual(A);
});
});
|
I believe that would dedupe request if I as id1 field1 and id1 field2 data loader will query id1 field1 and field2. Doing one request instead of two but then you will need another step to take the corresponding fields to each call |
What problem am I trying to solve?
Dataloader currently can solve the problem of collection of data leads to multiple database access.
But
batchLoadFn
may query unnecessary data from database in order to suit for all kind of resolvers.Here is my example:
There is a Graph Type called
Car
and it represents my database tablecars
.sql database
car.graphql
resolvers.js
the dataloader
If the graphql query doesn't involve both
lastOwner
orlastAccidentLocation
fields at same time, there will be unnecessary database query cost because the SQL query will make unnecessary table joins/selection.If the
car_trade_records
andcar_accident_records
are huge table or with some complex data manipulation, the unnecessary cost will be significant.Query examples:
Someone might think that we can 'join other tables' on demand (in the resolvers) like below. But there will be additional database queries. If 4 car records are selected, 8 (4 * 2=8) more sql queries will be made.
The new feature I'd like to have
If dataloader can accept more infomation about what attributes will be selected, it can reduce unnecessary cost in querying data.
Proposed changes of DataLoader functions:
load(key, fields)
key
: A key value to load.fields
: (optional) An array of field to load. Default value = []loadMany(keys, fields)
keys
: An array of key values to load.fields
: (optional) An array of field to load. Default value = []new DataLoader(batchLoadFn [, options])
batchLoadFn
: function(keys, fields)keys
: an array of keysfields
: an array of field that are being queried.For
load()
andloadMany()
, they will return a cached value only if both thekey
andfields
matches those key and fields inbatchLoadFn()
.Example 1
Expected console outputs:
Example 2: demostrate the real case
If it is implemented, no matter what fields are selected to Type
Car
, only one single sql query will be made.The text was updated successfully, but these errors were encountered: