-
Notifications
You must be signed in to change notification settings - Fork 0
4. Command line arguments
Whether TALLSorts is run in testing mode or training mode. Accepts the following arguments:
-
test
(default): Supply a counts matrix to classify -
train
: Supply a counts matrix and true classifications to generate a TALLSorts classification model, which then can be used to test
Path to samples (rows) x genes (columns) csv file representing a raw counts matrix.
- If
mode=test
, this will be the testing matrix. - If
mode=train
, this will be the matrix on which the classifiers are trained.
If mode=test
and you are using the default TALLSorts model, this is whether your gene labels in your samples
matrix are in Ensembl ID or Gene Symbol form. Accepts the following arguments:
-
id
(default): Ensembl ID -
symbol
: Gene symbol Ignored ifmode=train
.
Defaults to current working directory.
- If
mode=test
, this is the directory where you want the testing report to be saved. - If
mode=train
, this is the directory where the trained classifier model object filecustom.pkl.gz
will be stored.
- If
mode=test
, this is the path of the classifier model object file with extension.pkl.gz
. Defaults to the TALLSorts default model stored at<root>/models/tallsorts/tallsorts_default_model.pkl.gz
- If
mode=train
, this argument will be ignored.
Path to CSV file with samples (rows) x subtypes (columns), representing true classifications of each sample. Cells are 1
if ths sample belongs to the subtype, or 0
if not.
If provided, input genes will be filtered by the same method used when generating the default TALLSorts model. Please refer to our publications's Supplementary Information for a description of our method. Leave this flag out if your counts matrix is already pre-filtered.
Important: If you intend to use this flag, you must have pyensembl installed as per Step 5 of our Installation guide, and your gene labels in the sample-sheet
must be in Ensembl ID form.
Path to CSV file that describes the hierarchical relationships of the subtypes. Its rows are of the form <subtype label>,<parent label>
. If a subtype has no parent, leave <parent label>
blank. Subtype labels should correspond to column headings in the sample-sheet
CSV file.
Path to CSV file that provides arguments to the sklearn LogisticRegression
class on which TALLSorts runs. Its rows are subtype labels, and its columns are parameters of the LogisticRegression
class.
Leave argument out, or leave cells blank to default to the following default values:
random_state=0, max_iter=10000, tol=0.0001, penalty='l1', solver='saga', C=0.2, class_weight='balanced'
See sklearn's docs for more info.
Number of cores to use when training TALLSorts in parallel. Uses joblib
's parallel_backend
. Defaults to 1
.