Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Generate Synthetic dataset. #88

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open

Conversation

RishabGoel
Copy link
Contributor

No description provided.

@@ -5,6 +5,8 @@ git clone https://[email protected]/googleprivate/compressive-ip

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't merge this file.

@@ -53,19 +57,31 @@ def generate_dataset(
else:
test_file_writer.write(record_bytes)

def get_target_index(target, keep_errors_only):
error_idx_offset = 1 if keep_errors_only else 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is it OK having error indexes hardcoded here, or will it make maintenance hard later?

@@ -180,18 +180,19 @@ def main(experiment_id=None, study_id=None, dataset_path=None, skip_create=False
if experiment_id is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflict. don't check in.

RishabGoel and others added 18 commits January 3, 2022 09:44
- Adds edge_* features to dataset, 6 edge types
- Sweeps
- GGNN implementation (tests timing out still)
- Adds run_test (option for no subsampling in run_test) to eval 1 epoch
- Adds inspect_edges to analyze_data
A dry-run sweep generates the commands for the sweep without running them. This is useful for resuming old runs on different machines than they were originally run on, or for resuming just a subset of old runs.
Allows setting number of training steps and seed, so we can run multiple runs of a single model to compute variance of the metrics. Colab for generating commands is here: https://colab.research.google.com/drive/1axwI8dGJ1_wTLIKJsLEx0FazHaPXEu72#scrollTo=IroRFMZyl6kR&uniqifier=3
- Configs for GGNN:
  - config.ggnn_use_fixed_num_layers = True
  - config.ggnn_layers = 3
-  new dataset with edge info for GGNN
  - generates edge_sources_shape on the fly 
- We're filtering the same examples as before
- Sweep for ggnn experiments
- Overwriting top checkpoints after preemption (better would be a new checkpoints dir) to avoid failure on restart
- Supports both fixed num layers and num_steps num layers for ggnns
- Code for generating sampled test set with roughly equal error and no error examples
The sgd optimizer state has changed, so our naive existing method of loading old checkpoints doesn't always work.
This works around that for test.
The restore logic now skips init (was unnecessary and slow anyway), loads the old checkpoint state, but then only keeps the params, dropping opt_state.

Also in this commit: the ability to restore from an LSTM into an Exception IPA-GNN or regular IPA-GNN.
To do this, set --config.finetune=LSTM
Merged assert error generation
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants