Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make yaml check error on duplicate keys #81

Closed
asottile opened this issue Sep 16, 2015 · 17 comments
Closed

Make yaml check error on duplicate keys #81

asottile opened this issue Sep 16, 2015 · 17 comments

Comments

@asottile
Copy link
Member

We make this mistake all the time:

# foo.yaml

'key1': false
'key2': true
'key1': true
@ssbarnea
Copy link
Contributor

ssbarnea commented Jul 1, 2018

I am not sure about check-yaml implementation but I am using the yamllint linter which is probably the most advanced, and it does catch duplicate keys.

  - repo: https://github.com/adrienverge/yamllint.git
    rev: v1.11.1
    hooks:
      - id: yamllint
        files: \.(yaml|yml)$

Obviously that this should not be taken as a reason not to fix it official pre-commit hooks. I would myself support getting yamllint into main repo, maybe even removing the check-yaml one.

@vin01
Copy link

vin01 commented Jul 12, 2018

Related: pyyaml issue (in NEW state but inactive).

Similar situation in Goland: go-yaml/yaml#154, go-swagger/go-swagger#1464

yamllint handles it well: adrienverge/yamllint@f6bab05

Instead of re-implementing the same in check-yaml, I would also like to replace the same with yamllint. Will be happy to send a PR if that makes sense.

@asottile
Copy link
Member Author

yamllint is already supported via pre-commit so it probably just makes more sense to suggest yaml lint for this behaviour (instead of replacing / rewriting this hook)

@ssbarnea
Copy link
Contributor

@asottile The reality is that at this moment yamllint hook is not as visible as the check-yaml on https://pre-commit.com/hooks.html page.

Just search for yaml on this page and imagine that you are a developer that is interested about pre-commit but never used it. I bet 9/10 of them will pick the check-yaml hook which is hosted in the main repository.

I would say that we should better find a way to avoid this confusion and hint users towards the current best yamllint maybe? I would personally prefer to have it directly in pre-commit

@asottile
Copy link
Member Author

I suspect the answer to that is to "improve search".

I'm currently trying to reduce the dependencies in this repository as it is already quite slow to install and has had some conflict issues in the past. Introducing another dependency for yaml linting would go in the opposite direction. Additionally I want to give credit where credit is due by leading users to the repositories which wrote these great tools! Lastly, yaml lint and check-yaml have very different goals - - the former being a full blown linter and the latter just being a quick syntax check.

The eventual goal for this repository is to only deal with simple, unopinionated, language agnostic checks. It currently isn't quite doing that because of the python-specific checks but I hope to slowly move those out.

@ssbarnea
Copy link
Contributor

Let's assume that pre-commit success will explode and it will be inundated with a huge number of hooks available. As each test is usually inexpensive people will be likely to want to add more to their projects, like 20-30 or even more. If each of this would point to another git repository installation would become a real issue.

I am in favour of having hooks defined in a centralized repository (registry?) (without loosing ability to have additional repos) but at the same time I do not expect this repository to be used for development of these hook. Hook dependencies should be installed per hook instead of per repository.

One way or another I think you will struggle with "simple", it will grow regardless your effort to keep the goal small. I think that the trick here is to decouple development and maintenance of hooks from their publication in a centralised registry.

There is also one thing that concerns me a lot, security: without a central "trustable" registry it would be too easy to add some cool hook today which could inject some nasty stuff into thousands of project two years from now. I am not saying that someone will do it by design, but once you have 10 git repositories pointing to 10 different individuals on github, you are creating an easy way for someone to hack you build process: they need to hack only one of these accounts. This is one place where package repositories are doing a little bit better job. For example if something bad happens on pypi, i can contact them by email or irc report the malicious item and in (hopefully) minutes it will be taken down. I doubt this will happen so quick with github repositories or other git servers.

One thing for sure, future is going to be interesting ;)

@asottile
Copy link
Member Author

If each of this would point to another git repository installation would become a real issue.

I'm not convinced of this -- I have several thousand git repos cloned on my machine and it's not currently an issue. pre-commit also is smart about sharing these in a central cache (it's not an NxN problem, just an N problem).

I am in favour of having hooks defined in a centralized repository (registry?)

A good portion of the design is intentionally away from centralized control -- because it is entirely done through git it makes it very easy for users to pick and choose, modify in site-specific ways, or even run entirely off the grid with private repositories. I don't see this changing, it's one of the core design decisions. The closest to a "central registry" is the hooks.html page (and associated json file) -- yes the discoverability and such could be better here but for now it mostly serves its purpose.

Hook dependencies should be installed per hook instead of per repository.

This would actually make the disk space problem much much worse -- in almost all cases the installed environment's disk usage dwarfs the repository size.

There is also one thing that concerns me a lot, security

I mean, if you find a repository you don't want to use just remove it from your configuration :P. That said I do at some point want to add a hardening option (sha1 locking) but I haven't found a sound design for it that isn't just a merge-conflict-by-design.

@asottile
Copy link
Member Author

#351 adds checking for duplicate keys

@matt212
Copy link

matt212 commented Jan 31, 2019

can we disable the duplicate keys error message since anyof
swagger-api/swagger-ui#3803 is still not support
atleast i can use different payload with same endpoint ! and have a workaround on same

@asottile
Copy link
Member Author

@matt212 could you provide a sample yaml file? the yaml specification forbids duplicate keys. the linked issue seems to have a list of maps (not duplicate keys)

@matt212
Copy link

matt212 commented Jan 31, 2019

@asottile i have same endpoint with different payload structure and since we have cannot choose from multiple payload option oneOf swagger-api/swagger-ui#3803
hence one can use same endpoint twice with different payloads. however that swagger identifies as duplicate keys

@asottile
Copy link
Member Author

This repo has nothing to do with swagger as far as I know, you'll need to provide a minimal example and invocation so I can see what you're even trying to do. Please include commands you're running and file contents and what you expect

@asottile
Copy link
Member Author

@matt212 according to the swagger spec that is invalid: https://swagger.io/docs/specification/paths-and-operations/

This also means that it is impossible to have multiple paths that differ only in query string [...] This is because OpenAPI considers a unique operation as a combination of a path and the HTTP method

@matt212
Copy link

matt212 commented Jan 31, 2019

@asottile i knew about what my problem statement is if i have one endpoint and multiple payload how to use it in swagger or in openapi3 mode btw small workaround is "oneof" works in openapi3 but not in swagger 😞

@asottile
Copy link
Member Author

🤷‍♂️

if you want a workaround you can use args: [--unsafe], but like, I don't see how this helps with swagger because it can't represent dupe keys in json anyway

@matt212
Copy link

matt212 commented Jan 31, 2019

@asottile 😞😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants