Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads filtration change seq length + RF model update #5

Open
wants to merge 363 commits into
base: controlled_shuffles
Choose a base branch
from

Conversation

TsabarM
Copy link

@TsabarM TsabarM commented Sep 30, 2020

No description provided.

@TsabarM TsabarM changed the base branch from master to controlled_shuffles September 30, 2020 09:42
@@ -42,7 +42,7 @@ def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondi

module_parameters = [fastq_path, first_phase_output_path, first_phase_logs_path,
barcode2samplename_path, left_construct, right_construct,
max_mismatches_allowed, min_sequencing_quality, first_phase_done_path,
max_mismatches_allowed, min_sequencing_quality, minimal_length_required,first_phase_done_path,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatch with the definition in read_filtration/module_wrapper.py.
You define that as a named parameter (starts with --) and pass it here as positional parameter.
Also the order is incorrect/doesn't match, you are passing minimal length as done path.

To summarize, this change is wrong and doesn't work


def get_hyperparameters_grid(seed):
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start=100, stop=2000, num=20)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set parameters using command arguments, parameters shouldn't be hardcoded.
This remark should be applied to entire file, not just this line

for i in range(num_of_configurations_to_sample):
configuration = {}
for key in hyperparameters_grid:
configuration[key] = np.random.choice(hyperparameters_grid[key], size=1)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this seeded? Would we get same results every experiment run?

data.drop(['sample_name', 'label'], axis=1, inplace=True)
# a matrix of the actual feature values
X_train = data[train_rows_mask].values
X_test = data[test_rows_mask].values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no usage in (modified) code - not X_test and not Y_test

yael1994 and others added 30 commits May 3, 2022 12:19
change the path of the wsl tutorial
add new script for summary reads in one csv file
The motif samples were by sort_by_num_samples, sort_by_unique_memebers, sort_by_cluster_size

now its sort_by_num_samples, sort_by_cluster_size , sort_by_unique_memebers
when unique members goes from low to high
changed the order of the samples
fixed bug of biological condition type value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants