Skip to content

Commit

Permalink
Adding video, some final fixes.
Browse files Browse the repository at this point in the history
Summary: Adding video, and made minor fixes to docs.  allow-large-files

Reviewed By: echo-xiao9

Differential Revision: D63348509

fbshipit-source-id: 7e066911296a4b2518ef314d035433218c433fd2
  • Loading branch information
YLouWashU authored and facebook-github-bot committed Sep 24, 2024
1 parent 70592af commit 7717cf2
Show file tree
Hide file tree
Showing 10 changed files with 796 additions and 747 deletions.
9 changes: 5 additions & 4 deletions docs/ATEK_Data_Store.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ATEK Data Store

ATEK Data Store is a data platform where preprocessed open Aria datasets in WebDataset (WDS) formats, with selected preprocessing configurations, are available for users to directly download and load into PyTorch.
ATEK Data Store is a data platform where preprocessed open Aria datasets in [WebDataset](https://github.com/webdataset/webdataset) (WDS) formats, with selected preprocessing configurations, are available for users to directly download and load into PyTorch.

## ATEK datasets in WDS format

Expand All @@ -26,7 +26,7 @@ To access the data:

1. Click the **access link** in the above table, you can find the **Access The Dataset** button on the bottom of the page. Input your email address, you will be redirected to a page where you will find a button to download **[dataset] in ATEK format (PyTorch ready)**.

![Download button](./images/atek_data_store_download_button.png)
<img src="./images/atek_data_store_download_button.png" width="600">

2. This will download a json file, e.g. `[dataset_name]_ATEK_download_urls.json`, that contains the URLs of the actual preprocessed data. Note that for the same dataset, all preprocessing configuration's URLs are contained in the same json file.

Expand All @@ -42,14 +42,15 @@ To access the data:

where :

- `--config-name` specifies which [preprocessing configuration](./preprocessing_configurations.md) you would like to download.
- `--config-name` specifies which [preprocessing configuration](./preprocessing_configurations.md) you would like to download. You should choose one from [this table](#atek-datasets-in-wds-format).
- `--download-wds-to-local` user can remove this flag to create **streamable** yaml files.

User can also specify other options including maximum number of sequences to download, training validation split ratio, etc. See [src code](../tools/atek_wds_data_downloader.py) for details.

4. **Note that these URLs will EXPIRE AFTER 30 DAYS**, user will need to re-download and re-generate the streamable yaml files.

These steps will download ATEK preprocessed WebDataset files with the following folder structure. Note that if the download breaks in the middle, simply run it again to pick up from the middle.
## Downloaded WDS files
Following the above steps will download ATEK preprocessed WebDataset files with the following folder structure. Note that if the download breaks in the middle, simply run it again to pick up from the middle.

```bash
./downloaded_local_wds
Expand Down
2 changes: 1 addition & 1 deletion docs/Install.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

We provided 2 ways to install ATEK:

1. If you just need **the core functionalities of ATEK**, including data pre-processing, data loading, and visualization, you can simply [install ATEK's core lib](#core-lib-installation)
1. If you just need the core functionalities of ATEK, including data pre-processing, data loading, and visualization, you can simply [install **ATEK core lib**](#core-lib-installation)
2. If you want to run the CubeRCNN demos and all task-specific evaluation benchmarking, you can follow this guide to [install **full dependencies**](#install-all-dependencies-using-mambaconda).

## Core lib installation
Expand Down
34 changes: 24 additions & 10 deletions docs/example_cubercnn_customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,22 +139,18 @@ print(f"Loading WDS into CubeRCNN format, each sample contains the following key

## CuberCNN model trainng / inference

With the created Pytorch DataLoader, user will be able to easily run model training / inference for CubeRCNN model:
With the created Pytorch DataLoader, user will be able to easily run model training or inference for CubeRCNN model.

**Training script**
```python
# Load pre-trained model for training / inference
model_config, model = create_inference_model(
model_config_file, model_ckpt_path, use_cpu_only = use_cpu_only
)
# Load pre-trained model for training
model_config, model = create_training_model(model_config_file, model_ckpt_path)

# training / inference loop
# Training loop
for cubercnn_input_data in tqdm(
cubercnn_dataloader,
desc="Training / Inference progress: ",
desc="Training progress: ",
):
# Inference step
cubercnn_model_output = model(cubercnn_input_data)

# Training step
loss = model(cubercnn_input_data)
losses = sum(loss.values())
Expand All @@ -163,3 +159,21 @@ for cubercnn_input_data in tqdm(
optimizer.step()
...
```


**Inference script**

```python
# Load pre-trained model for inference
model_config, model = create_inference_model(model_config_file, model_ckpt_path)

# Inference loop
for cubercnn_input_data in tqdm(
cubercnn_dataloader,
desc="Training progress: ",
):
# Inference step
cubercnn_model_output = model(cubercnn_input_data)
...
```
# Inference step
2 changes: 1 addition & 1 deletion docs/example_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,4 @@ Here is the tensorboard results for ASE trained results, where the left 2 figure

Once model training is finished, user can proceed to [example_inference.md] to run model inference on the trained weights.

We also provided 2 sets of CubeRCNN trained weights by us, one on ASE 10K dataset, the other on ADT dataset. The weights can be downloaded [here](https://www.projectaria.com/async/sample/download/?bucket=adt&filename=ATEK_example_model_weights.tar)
We also provided 2 sets of CubeRCNN trained weights by us, one on ASE 10K dataset, the other on ADT dataset. The weights can be downloaded [here](https://www.projectaria.com/async/sample/download/?bucket=atek&filename=ATEK_example_model_weights.tar). By downloading this file, you acknowledge that you have read, understood, and agree to be bound by the terms of the [CC-BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en) software license.
Binary file added docs/images/atek_github_video_small.webm
Binary file not shown.
8 changes: 4 additions & 4 deletions docs/preprocessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ Before ATEK, users will need to hand-craft all these code on their own, which is

## Simple customization through preprocessing config

ATEK allows user to **customize the preprocessing workflow by simply modifying the preprocessing configuration yaml file** (see [preprocessing_configurations.md](./preprocessing_configurations.md) for details).
ATEK allows user to **customize the preprocessing workflow by simply modifying the preprocessing configuration yaml file** (see [Preprocessing configurations page](./preprocessing_configurations.md) for details).

The following is the core code to load an open Aria data sequence, preprocess according to a given configuration file, and write the preprocessed results to disk as WebDataset ([full example](../examples/Demo_1_data_preprocessing.ipynb)). We also use a visualization library based on `ReRun` to visualize the preprocessed results. The results are stored as `Dict` in memory containing tensors, strings, and sub-dicts, and also saved to local disk in WebDataset (WDS) format for further use.
The following is the core code to load an open Aria data sequence, preprocess according to a given configuration file, and write the preprocessed results to disk as WebDataset. We also use a visualization library based on `ReRun` to visualize the preprocessed results. The results are stored as `Dict` in memory containing tensors, strings, and sub-dicts, and also saved to local disk in WebDataset (WDS) format for further use.

```python
from omegaconf import OmegaConf
Expand All @@ -35,11 +35,11 @@ num_samples = preprocessor.process_all_samples(write_to_wds_flag = True, viz_fla

### `create_general_atek_preprocessor_from_conf`

This is a factory method that initializes a `GeneralAtekPreprocessor` based on a configuration object. It selects the appropriate preprocessor configuration for ATEK using the `atek_config_name` field in the provided Omega configuration. See [here](./preprocessing_configurations.md) for currently supported configs.
This is a factory method that initializes a `GeneralAtekPreprocessor` based on a configuration object. It selects the appropriate preprocessor configuration for ATEK using the `atek_config_name` field in the provided Omega configuration.

#### Parameters

- **conf** (`DictConfig`): Configuration object with preprocessing settings. The `atek_config_name` key specifies the preprocessor type,
- **conf** (`DictConfig`): Configuration object with preprocessing settings.
- **raw_data_folder** (`str`): Path to the folder with raw data files.
- **sequence_name** (`str`): Name of the data sequence to process.
- **output_wds_folder** (`Optional[str]`): Path for saving preprocessed data in WebDataset (WDS) format. If `None`, data is not saved in WDS format.
Expand Down
Loading

0 comments on commit 7717cf2

Please sign in to comment.