Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply functions to row header #21

Open
ArcticTee opened this issue Nov 7, 2018 · 8 comments
Open

Apply functions to row header #21

ArcticTee opened this issue Nov 7, 2018 · 8 comments

Comments

@ArcticTee
Copy link

Hi,

first of all, this is a great library, thank you for keeping this up! To enhance the functionality I would like to do the following:

I just tried to import a sheet which has a header row with duplicates like this:

test | test | 1
1 | 2 | 2
2 | 3 | 2

This will result in a json object like this:

[
  {"test": 2, "1": 2},
  {"test": 3, "1": 2},
]

I would like get the option to apply multiple functions to the header row to transform the column names when using the auto columnToKey mapping - maybe something like this:

const result = excelToJson({
	sourceFile: 'SOME-EXCEL-FILE.xlsx',
	columnToKey: {
		'*': '{{columnHeader}}',
                 'transformSingleKey':
                                              [
                                                 removeSpecialCharacters,
                                                 replaceNumberColumnNames,
                                              ]
                 'transformAllKeys':
                                              [
                                                 suffixDuplicates
                                              ]
	}
});

Where the transformSingleKey - Array will map over the columnnames and apply all functions

const cleandedUpCols = colnames.map(
      pipeFunctions(
        removeSpecialCharacters,
        replaceNumberColumnNames,
      ),
    );

And the transformAllKeys - Array will work the whole column array

suffixDuplicates(cleanedUp)

Helper functions:

const suffixDuplicates = (list) => {
  // Containers

  const count = { };
  let itemCount;
  return list.map((item) => {
    itemCount = count[item];
    itemCount = itemCount === null ? 1 : itemCount + 1;
    count[item] = itemCount;
    // starting with 2
    return itemCount > 1 ? `${item}_${itemCount}` : item;
  });
};
const removeSpecialCharacters = key => key.replace(/[^a-zA-Z0-9]/g, '');

export const replaceNumberColumnNames = key => (Number.isNaN(key) ? key : `c${key}`);


const pipeFunctions = (...funcsToApply) => (currentValue,
  index,
  arrayToMap) => funcsToApply.reduce(
  (accum,
    cFunc) => cFunc(accum, index, arrayToMap), currentValue,
);

Thank you

@DiegoZoracKy
Copy link
Owner

Hi @ArcticTee, thanks for bringing that case, and a solution which I believe is a very good one. And it seems to match with a suggestion that I've left open a few months ago. Take a look at this issue and tell me what you think of this approach:

#4

@ArcticTee
Copy link
Author

Yeah, this option is fine as long as you know the name of your columns.

I am currently working on a general import of Excel files and I do not know the column format of that files when implementing it. The user of this import tool shall give the sheet name and if the file contains (and if so where the header row lies) in a GUI.

Therefore I would be happy if the enhancement for single keys and all keys for unknown columns would be added 👍

@DiegoZoracKy
Copy link
Owner

I believe that both ideas can be joined. In fact, I can see two different issues here.

The first one is to never let data be discarded due to multiple column headers with the same name. By default it should suffixDuplicates as you suggested, and this behavior could be disabled via config.

The second issue is about the interface to apply a pipeline of data transformations.

I believe that the first one is what will solve your current problem, am I right?

@ArcticTee
Copy link
Author

ArcticTee commented Nov 7, 2018

I believe that the first one is what will solve your current problem, am I right?

Just partly. Because it is just one transformation our tool needs to do. We are using a MongoDB which accepts JSON inputs.

E.g. this suffixDuplicate will prevent data loss during the import. But it needs to look at all the keys, to find duplicates.

But another practical problem the pipeline/transformation would solve is that invalid column names (which can be checked by just looking at a single key) can be resolved. e.g. replaceNumberColumnNames will add a 'c' prefix to all columns that are just a number. For mongoDB this is necessary because they do not support column names that are just a number.

So for my use case it would be great if pipelines for single and all keys would be supported. The suffixDuplicates is just a function for our import tool, which this project can benefit from as well.

@ArcticTee
Copy link
Author

@DiegoZoracKy Any updates on this? Let me know if I can help you.

@DiegoZoracKy
Copy link
Owner

DiegoZoracKy commented Nov 21, 2018 via email

@ArcticTee
Copy link
Author

Yeah. I am happy you are on it.

@tijevlam
Copy link

hi there, since this has been a long time. Is there any news on this issue? In my case i would like to apply a camelCase function to the headerrow but applying a function to {{columnHeader}} results in undefined keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants