Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML format issues in modelFiles/yml/HumanGEM.yml #169

Closed
3 tasks done
dweindl opened this issue May 10, 2020 · 14 comments · Fixed by #173
Closed
3 tasks done

YAML format issues in modelFiles/yml/HumanGEM.yml #169

dweindl opened this issue May 10, 2020 · 14 comments · Fixed by #173

Comments

@dweindl
Copy link

dweindl commented May 10, 2020

Hi, great project! I was trying to look into the yaml model, but ran into some issues:

Description of the issue:

YAML errors in modelFiles/yml/HumanGEM.yml preventing parsing the file.

Expected feature/value/output:

Expected to be able to load modelFiles/yml/HumanGEM.yml with any yaml parser.

Current feature/value/output:

Trying to load modelFiles/yml/HumanGEM.yml fails.

Main issue is using ":" in unquoted strings, leading to

syntax error: mapping values are not allowed here (syntax)

as well as long list of other issues which seem to be minor.

Reproducing these results:

yamllint modelFiles/yml/HumanGEM.yml

-> long list of issues

or

import yaml
yaml_file = "modelFiles/yml/HumanGEM2.yml"
with open(yaml_file) as f:
    model = yaml.safe_load(f)

-> [...] yaml.scanner.ScannerError: mapping values are not allowed here

I hereby confirm that I have:

  • Tested my code on my own computer for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
@haowang-bioinfo
Copy link
Member

haowang-bioinfo commented May 10, 2020

@dweindl thank you posting the issue.

The parsing error is because the specification of "HumanGEM.yml" is slightly different from the one defined in Cobrapy. The HumanGEM Yaml format is originated from RAVEN for the purpose of tracking changes only, and later adjusted according to specific needs (#27, #71).

One can use either the SBML "HumanGEM.xml" or MATLAB "HumanGEM.mat" file for importing model.

@dweindl
Copy link
Author

dweindl commented May 10, 2020

Thanks for the quick reply, @Hao-Chalmers . The issue occurs independently of cobrapy. The current file is not valid yaml. I think most of it would be fixed by quoting strings during yaml export in

value = field{pos};

the same way as in

value = ['"',field{pos},'"'];

@haowang-bioinfo
Copy link
Member

Can you please confirm if this error can be fixed by systematically adding double quotes?

@dweindl
Copy link
Author

dweindl commented May 10, 2020

Confirming that double quotes for strings will do the job. After running regex replacement ([^:]*): +([^"!].*[^"])$ -> \1: "\2", the file can be loaded.

Unfortunately, I can't test the proposed export fix due to lack of a matlab license.

@haowang-bioinfo
Copy link
Member

Thanks, will implement the fix accordingly.

@haowang-bioinfo haowang-bioinfo self-assigned this May 11, 2020
@haowang-bioinfo
Copy link
Member

@dweindl the Yaml file had been updated by systematically enclosing string elements with double quote in this fix branch. The new Yaml file can be successfully imported with following code without error. Please double check this.

import yaml
yaml_file = "HumanGEM.yml"
with open(yaml_file) as f:
    model = yaml.safe_load(f)

@dweindl
Copy link
Author

dweindl commented May 18, 2020

@dweindl the Yaml file had been updated by systematically enclosing string elements with double quote in this fix branch. The new Yaml file can be successfully imported with following code without error. Please double check this.

Confirming that the file can be loaded now. Thanks @Hao-Chalmers.

yamllint still reports a number of issues, but they don't prevent using the file:

yamllint HumanGEM.yml 
HumanGEM.yml
  1:1       warning  missing document start "---"  (document-start)
  3:16      error    too many spaces before colon  (colons)
  4:16      error    too many spaces before colon  (colons)
  5:16      error    too many spaces before colon  (colons)
  6:16      error    too many spaces before colon  (colons)
  7:16      error    too many spaces before colon  (colons)
  7:81      error    line too long (86 > 80 characters)  (line-length)
  8:16      error    too many spaces before colon  (colons)
  10:16     error    too many spaces before colon  (colons)
  11:16     error    too many spaces before colon  (colons)
  12:16     error    too many spaces before colon  (colons)
  12:81     error    line too long (391 > 80 characters)  (line-length)
  14:3      error    wrong indentation: expected 4 but found 2  (indentation)
[...many more...]

@haowang-bioinfo
Copy link
Member

haowang-bioinfo commented May 18, 2020

@dweindl good to know this, pushing this fix to devel now.

You are welcome to provide solutions to fix other warnings/errors.

@dweindl
Copy link
Author

dweindl commented May 18, 2020

You are welcome to provide solutions to fix other warnings/errors.

  • 1:1 warning missing document start "---" (document-start)

    Insert --- as first line

  • 3:16 error too many spaces before colon (colons) and other errors are due to

    fprintf(fid,[' short_name : "',model.id,'"\n']);
    fprintf(fid,[' full_name : "',model.description,'"\n']);
    fprintf(fid,[' version : "',model.version,'"\n']);
    fprintf(fid,[' date : "',datestr(now,29),'"\n']); % 29=YYYY-MM-DD
    fprintf(fid,[' authors : "',model.annotation.authorList,'"\n']);
    fprintf(fid,[' email : "',model.annotation.email,'"\n']);
    fprintf(fid,[' organization: "',model.annotation.organization,'"\n']);
    fprintf(fid,[' taxonomy : "',model.annotation.taxonomy,'"\n']);
    fprintf(fid,[' github : "',model.annotation.sourceUrl,'"\n']);
    fprintf(fid,[' description : "',model.annotation.note,'"\n']);

    having spaces between keyword and colon.

  • wrong indentation: expected 8 but found 6 and others:

    Currently the file is indented using two spaces per level, but yaml expects 4. Would have to be adapted in all the fprintf statements. I guess most parsers are tolerant enough to ignore that.

  • line too long

    Could be fixed using some of the approaches in https://stackoverflow.com/questions/3790454/how-do-i-break-a-string-over-multiple-lines. Again, most parsers can probably handle the current state.

@JonathanRob
Copy link
Collaborator

Thanks for the suggestions, @dweindl. We are aiming to continue moving toward a standardized and valid yaml file; however, we also would like to ensure that we maintain compatibility with the cobrapy toolbox. Therefore, we may be a bit slow to address some of these additional issues unless they directly affect functionality, since we will also want to verify that any such changes do not disrupt cobrapy compatibility.

As far as I understand, the cobrapy yaml format does not quite follow yaml standards - I could be mistaken, and @BenjaSanchez is likely to know more about this. I would therefore recommend that you make an issue/suggestion on the cobrapy repo that they move toward a fully valid yaml file format, and we of course would follow such a movement.

@dweindl
Copy link
Author

dweindl commented May 19, 2020

Thanks for pointing out the potential issues with cobrapy, @JonathanRob. I wasn't aware of that. As I am not a user of cobrapy, I will rather not complain about potentially different yaml used there. The changes in #170 addressed the issue sufficiently. Thanks @Hao-Chalmers. Feel free to close this.

@haowang-bioinfo
Copy link
Member

@dweindl Thank you for the input that is highly appreciated and always welcome :)

Now close this issue.

@BenjaSanchez
Copy link

@JonathanRob I'm actually not sure if our current standard is 100% equal with cobrapy, I will be testing that soon as I'm working in compatibility issues. If I find anything noteworthy I will let you know here, thanks @dweindl for informing of those issues, will likely be useful for my analysis

@mihai-sysbio mihai-sysbio mentioned this issue May 27, 2020
2 tasks
@haowang-bioinfo haowang-bioinfo linked a pull request May 27, 2020 that will close this issue
2 tasks
@haowang-bioinfo
Copy link
Member

All remaining yamllint warnings/errors reported in this issue had been fixed in #173, except line too long

migp11 added a commit to bsc-life/Human-GEM that referenced this issue May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants