Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mzML file: String.trim() fails, CVParamType.getValue() is null #2115

Open
jwframe28 opened this issue Apr 2, 2025 · 5 comments
Open

mzML file: String.trim() fails, CVParamType.getValue() is null #2115

jwframe28 opened this issue Apr 2, 2025 · 5 comments
Assignees

Comments

@jwframe28
Copy link

jwframe28 commented Apr 2, 2025

CheckCentroid
java -Xmx453G -cp /home/projects/elinav/birgit/fragpipe2/fragpipe/lib/fragpipe-22.0.jar:/home/projects/elinav/birgit/fragpipe2/fragpipe/tools/batmass-io-1.33.4.jar com.dmtavt.fragpipe.util.CheckCentroid /home/projects/elinav/johnf/databases/PXD_PEPTIDOMICS3/pride_downloads/PXD003533/Q001_80K_Fr1_18uL_T90_C25nano.mzML 9
java.lang.NullPointerException: Cannot invoke "String.trim()" because the return value of "umich.ms.fileio.filetypes.mzml.jaxb.CVParamType.getValue()" is null
at umich.ms.fileio.filetypes.mzml.MZMLRunHeaderParser.lookupInstrumentVendor(MZMLRunHeaderParser.java:335)
at umich.ms.fileio.filetypes.mzml.MZMLRunHeaderParser.parse(MZMLRunHeaderParser.java:224)
at umich.ms.fileio.filetypes.mzml.MZMLFile.parseRunInfo(MZMLFile.java:88)
at umich.ms.fileio.filetypes.mzml.MZMLFile.fetchRunInfo(MZMLFile.java:78)
at umich.ms.fileio.filetypes.mzml.MZMLFile.fetchRunInfo(MZMLFile.java:32)
at umich.ms.fileio.filetypes.xmlbased.AbstractXMLBasedDataSource.parse(AbstractXMLBasedDataSource.java:121)
at umich.ms.datatypes.scancollection.impl.ScanCollectionDefault.loadData(ScanCollectionDefault.java:807)
at umich.ms.datatypes.scancollection.impl.ScanCollectionDefault.loadData(ScanCollectionDefault.java:791)
at com.dmtavt.fragpipe.util.CheckCentroid.isCentroid(CheckCentroid.java:76)
at com.dmtavt.fragpipe.util.CheckCentroid.main(CheckCentroid.java:39)
Process 'CheckCentroid' finished, exit code: 1
Process returned non-zero exit code, stopping

- Describe the issue or question:

I converted some .mgf files with FileConverter to mzML and tried running fragpipe, and got the above error.

The fix to it, was to change the following portion of my mzML file to have value="null". before there was no value parameter here.

What is causing this and how can I avoid it in the future?

    <instrumentConfigurationList count="1">
            <instrumentConfiguration id="ic_0">
                    <cvParam cvRef="MS" accession="MS:1000031" name="instrument model" value="null" />
                    <softwareRef ref="so_in_0" />
            </instrumentConfiguration>
    </instrumentConfigurationList>
@fcyu fcyu self-assigned this Apr 2, 2025
@fcyu
Copy link
Member

fcyu commented Apr 2, 2025

What is causing this

The mzML format doesn't follow the schema

how can I avoid it in the future?

  1. Avoid using MGF file.
  2. If you must use MGF files, avoid converting them to mzML format.
  3. If you must convert the format, choose a converter that produces mzML files that comply with the schema.

Best,

Fengchao

@jwframe28
Copy link
Author

jwframe28 commented Apr 3, 2025

Ah I see - for this I am somewhat stuck with .mgf or .mzML files based on what is available on proteome exchange.

If I run w/ MGF files, I get the following errors (an excerpt as the full log is too large):

RawFileReader reading tool. Copyright (c) 2016 by Thermo Fisher Scientific, Inc. All rights reserved.
timdTOF .d reading tool. Copyright (c) 2022 by Bruker Daltonics GmbH & Co. KG. All rights reserved.
System OS: Linux, Architecture: amd64
Java Info: 23.0.2-internal, OpenJDK 64-Bit Server VM, Oracle Corporation
JVM started with 615 GB memory
Checking database...
Failed in checking
pride_downloads/PXD003533b/Qts08_001_Fr4_80K1000_200uLeq_T215.mgf. Will ignore it.
java.lang.RuntimeException: Could not parse the MGF title: query=143414
Failed in checking Q001_80K_Fr1_18uL_T90_C25nano.mgf. Will ignore it.
java.lang.RuntimeException: Could not parse the MGF title: query=482847
Failed in checking Q002_pDS80_B_TFA_4uLeq_4uL_T54_nano.mgf. Will ignore it.
java.lang.RuntimeException: Could not parse the MGF title: query=505104
Checking spectral files...
Qts08_001_Fr4_80K1000_200uLeq_T215.mgf: Scans = 0; ITMS: false; FTMS: false; Isolation sizes = []
Q001_80K_Fr1_18uL_T90_C25nano.mgf: Scans = 0; ITMS: false; FTMS: false; Isolation sizes = []
Q002_pDS80_B_TFA_4uLeq_4uL_T54_nano.mgf: Scans = 0; ITMS: false; FTMS: false; Isolation sizes = []
Failed in checking
Qts15_011_80K1000_Fr1_noIAA_1o2ug_T120.mgf. Will ignore it.
java.lang.RuntimeException: Could not parse the MGF title: query=306796
Failed in checking

@fcyu
Copy link
Member

fcyu commented Apr 3, 2025

The "title" of the MGF format has no restrictions, which makes it hard to support all "flavors" of the MGF format. We only support a few common ones, but it seems that yours are not one of them.

Best,

Fengchao

@jwframe28
Copy link
Author

Ahh okay I see. I can try to edit this but I guess I can try to find some related .raw files.

One major issue I have is that I have a slightly large custom database and although I'm working on a cluster, with all of the input files (mzML or raw), it seems to require a ton of memory.

Is there a way I can run FragPipe in batches on subsets of my full manifest? It's a miniconda installation of FragPipe (as I don't have sudo control in the cluster)

@fcyu
Copy link
Member

fcyu commented Apr 3, 2025

You could set the "split database" in the "MSFragger" part to > 1. It will split the database into chunks to reduce the memory footprint.

Best,

Fengchao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants