Skip to content

Enter neutralization data (metadata tables)

Kaiming Tao edited this page Jun 18, 2022 · 3 revisions

List of metadata tables

About create a new RefID, please read here

Column name Description Format Default Comment
ref_name RefID
doi DOI
url Provide URL if the reference doesn't have DOI
first_author Surname, Initials (no dots) NULL
year YYYY
date_added YYYY-MM-DD
date_updated YYYY-MM-DD NULL

The isolates.d is for organizing isolates by studies. All files in this folder will be merged in to isolates.csv file.

The concept of isolates in the database is for recording variants and mutations. Each neutralization data should link to an isolate. Each isolate name represents a list of mutations and an isolate may belongs to a variant. Please see Record Variants, isolates, and mutations for more details.

Please find the isolate and mutations from the paper in the orders below:

  1. GISAID/GenBank. Please find the sequence by GISAID ID or GISAID name, then download the FASTA file, and upload it into SARS-CoV-2 Sequence Analysis program. You can download the mutation list by selecting Spreadsheets (CSV) > Mutation list, you can find the PANGOLIN name by clicking the button "Analyze".
  2. Mutation list in paper. Some papers provide the full list of mutations for the viruses. This is our alternative source of the mutation lists if the GISAID/GenBank accessions are not available from the paper. You need to convert the genes and positions to the database format.
  3. Consensus. If the paper only reports the name of the variants or PANGOLIN names. Please find the corresponding isolate in isolates table and use "[PANGOLIN NAME] full genome" for authentic virus assay or "[PANGOLIN NAME] Spike" for pseudovirus assay.
Column name Description Format Default Comment
iso_name
var_name NULL Please see variants table.
site_directed TRUE for virus with single mutation created in lab, else FALSE
gisaid_id GISAID ID without the prefix EPI_ISL_ NULL
genbank_accn Genbank accession number NULL
sra_accn SRA accession number NULL
expandable TRUE for most case; FALSE for non-SARS-CoV-2 viruses
Column name Description Format Default Comment
var_name Variant name (naming rules)
as_wildtype TRUE if this variant should be treated as a wild type; else FALSE
consensus_availability FALSE

The isolate_mutations.d is for organizing isolate mutations by studies. All files in this folder will be merged in to isolate_mutations.csv file.

The isolate_mutations table stores a list of mutations for each isolate in the isolates table. If you are adding a new isolate to the isolates.csv table, then you should add the mutation list to the isolate_mutations table.

Column name Description Format Default Comment
iso_name isolate name
gene Gene name (see SARS-CoV-2 genome / gene position conversion table # column 4
position Amino acid position
amino_acid Single letter amino acid. Mixtures with two AAs should be input as two rows. Mixtures with three or more should not be entered. ins for insertion, del for deletion, stop for stop codon.
count NULL
total NULL