Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initiated integration of ColabFold-AF3 pipeline. Draft state of AF3 json input generation. #680

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rachelse
Copy link
Contributor

@rachelse rachelse commented Feb 3, 2025

Major

batch.py

  • Added argument group to handle specific arguments for AF3.
    • --af3-json to specify AF3 json generation.
    • --fasta-type to handle non-protein molecules in fasta file.
      The header should be > entityID | entityType | numberOfEntity
      e.g. FASTA file with 1 homodimer and 3 ligand
      > Prot1|protein|2
      PROTEINSEQUENCE
      > Lig1|ccd|3
      ATP
      
  • Added functions
    • generate_af3_input processes fasta and a3m files and produce json files if specified --af3-json. (This will not proceed with structure inference.)
    • get_queries_af3 handles fasta files containing complex with non-protein molecules if --fasta-type is 1.
    • parse_fasta_af3 is called by get_qureries_af3 to parse single fasta file .

utils.py

  • Added class AF3Utils to fill out elements in json file.

Subsidiary

  • msa2json.py

    This code was adopted from Yoshi's code. It generates json inputs from given a3m file / directgory contains a3m files.

  • af3json.py

    Adaptation of msa2json.py to generate default AF3 json format. This json will make AF3 to produce MSAs using their own pipeline.

Progress

  • Convert fasta/msa to json
    • colabfold_batch
    • colabfold_search
  • Support FASTA file with multiple molecule types
    • single fasta file per complex
    • directory with multiple fasta files
  • Support template
  • Support unpairedMsaPath and pairedMsaPath
  • Test pipeline to validate json file
    (each molecule types, msa-pairing, template, custom ccd, RNA MSA, modifications, etc.)
  • Validation
    • molecule types
      • single protein, protein complex, protein+ligand
      • protein+RNA, protein+DNA, RNA, DNA, ligand
    • msa-pairing
      • UnpairedMSA/PairedMSA
      • dependency on --pair-mode
      • specify with path to MSA file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant