Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new option IMPF: importable fields #8

Closed
wants to merge 1 commit into from

Conversation

sebres
Copy link
Contributor

@sebres sebres commented Oct 20, 2016

new option IMPF introduced: describes fields to be imported into the table(s), default nr,nf,0-), can be used to exact specifying which fields should be imported from input (increases import performance, minimizes overhead and simplifies output of statements like select * from a, if e. g. fields like anr, anf and a0 are not necessary or too many input fields available).

Examples:

# with `-IMPF nr,1-2` - select anr, a1, a2 only:
echo -e "A B C D\nA B C\nA B" | sqawk -FS " " -IMPF nr,1-2 -output json "select * from a"

[{"anr":"1","a1":"A","a2":"B"},{"anr":"2","a1":"A","a2":"B"},{"anr":"3","a1":"A","a2":"B"}]

# without `-IMPF` - select anr, anf, a0 (whole record) and 10 fieds (default):
echo -e "A B C D\nA B C\nA B" | sqawk -FS " " -output json "select * from a"

[{"anr":"1","anf":"5","a0":"A B C D","a1":"A","a2":"B","a3":"C","a4":"D","a5":"","a6":"","a7":"","a8":"","a9":"","a10":""},{"anr":"2","anf":"4","a0":"A B C","a1":"A","a2":"B","a3":"C","a4":"","a5":"","a6":"","a7":"","a8":"","a9":"","a10":""},{"anr":"3","anf":"3","a0":"A B","a1":"A","a2":"B","a3":"","a4":"","a5":"","a6":"","a7":"","a8":"","a9":"","a10":""}]

…e table(s), default `nr,nf,0-`), can be used to exact specifying which fields should be imported from input (increases import performance, minimizes overhead and simplifies output of statements like `select * from a`, if e. g. fields like `anr`, `anf` and `a0` are not necessary or too many input fields available).
@dbohdan
Copy link
Owner

dbohdan commented Oct 21, 2016

I like the idea of having this functionality and your code looks fine but I am not sure the design of the feature is right. Some thoughts:

  1. Setting -snf and enabling/disabling the special columns may be better as separate options.
  2. Alternatively, you could give the user a general way to map the columns returned by the parser to the fields in the table, like a generalized version of columns=. E.g., let the user say something like "map column 2 to field a1, column 1 to field a2, column 5 to field a3, drop columns 3-4 and 6-end". It would replace two or more of -IMPF, columns= and merge=. (I think merge= may be overcomplicated as it is now.)
  3. I can see the benefit of disabling a0 but I don't think there is a performance reason to disable anr and anf. While it makes the result of select * prettier a better way to handle this case may be to build a special SQL function into Sqawk that selects all columns except for the ones named (except for the empty ones?).

I'm very interested in ideas to do with the second point.

@dbohdan
Copy link
Owner

dbohdan commented Nov 28, 2016

I'm closing this pull request because the functionality it adds (other than disabling nr and nf) has been implemented in 1fa13f7 and b361f57 through the file options skip and F0. I would be glad to hear your feedback on those. (Note that they are not final for version 1.0 and I may still replace them with a single option like map.)

@dbohdan dbohdan closed this Nov 28, 2016
@sebres
Copy link
Contributor Author

sebres commented Nov 29, 2016

Well, although looks not bad, but ...

Partially I don't like it, e. g.:

echo -e "A B C D\nA B C\nA B" | sqawk -FS " " -output json "select * from a" NF=1 MNF=crop skip=3-10 F0=0
[{"anr":"1","anf":"3","a1":"A","a2":"B"},{"anr":"2","anf":"3","a1":"A","a2":"B"},{"anr":"3","anf":"3","a1":"A","a2":"B"}]

Cons that I see:

  • negative handling (I should specify not what I want see, but what I want NOT see)
  • too wordy and too not understandable, just compare:
    NF=1 MNF=crop skip=3-10 F0=0 with suggested -IMPF nr,1-2
  • I miss the possibility to skip anr and anf (e. g. target json is strict and should not have free fields at all). And when it'll be implemented, the command line will be still longer (syntax yet wordy)...
  • I miss the possibility to disable all fields except 1 and 2 (something like skip=3-), so if we don't know how much fields in the input file (dynamical) we should write something like skip=3-10000

On the whole, I find my solution better (and I will continue to use it)...

@dbohdan
Copy link
Owner

dbohdan commented Nov 29, 2016

Thanks for the feedback! To me, the points about the negative handling and the wordiness seem especially strong. My primary reason for rejecting -IMPF was that it would result in three different column/field numberings used at different points, which could be confusing, particularly if I extended IMPF to support multiple ranges. E.g., if you said IMPF=nr,5-7,9-15 merge=5-7 'select a5, a7 from a' the numbers 5 and 7 would refer to different fields/columns in each of the two options and in the query. Using skip avoids that problem but, as you point out, the way skip works may itself be pretty confusing.

Ultimately, the problem might lie in trying to cram into a few command line options the kind thing that requires nothing less than a scripting language than to handle fully and well (a-la Greenspun's tenth rule).

As for shaping JSON output, I do not consider that Sqawk's job. What I found was that, in practice, you'd sooner or later need a specialized tool like jq for it. (I highly recommend jq; it was an inspiration for Sqawk in the first place.) In your example you could use jq like this:

echo -e 'A B C D\nA B C\nA B' \
| sqawk -FS ' ' -output json 'select * from a' NF=1 MNF=crop skip=3-10 F0=0 \
| jq 'map(del(.anr) | del(.anf))'

For a more advanced example (converting JSON to Tcl dictionaries with jq) check out jqlang/jq#1228.

Another thing to note is that it's unfortunate that you have to maintain a fork of Sqawk if you want to use functionality that isn't accepted into the core. Sqawk may need a plugin interface.

Once again, I appreciate your feedback. I've created an issue (#9) for revising these options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants