Skip to content

Commit

Permalink
docs: bootstrap basic documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
kjappelbaum committed Aug 12, 2024
1 parent c3831f1 commit 07ee7a1
Show file tree
Hide file tree
Showing 4 changed files with 134 additions and 0 deletions.
Binary file added .DS_Store
Binary file not shown.
114 changes: 114 additions & 0 deletions docs/api/sampler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Sampler Module

## Overview

The `sampler` module provides functionality for generating text samples based on templates and data. It is primarily used for creating datasets for natural language processing tasks in chemistry and related fields. The main class in this module is `TemplateSampler`, which allows for flexible text generation with support for multiple choice questions and class balancing.

## TemplateSampler

### Class: TemplateSampler

The `TemplateSampler` class is responsible for sampling and generating text based on templates and data.

#### Initialization

```python
sampler = TemplateSampler(df: pd.DataFrame, meta: Dict, config: Dict, column_datafield_sampler: Optional[Callable] = None)
```

- `df`: A pandas DataFrame containing the dataset.
- `meta`: A dictionary containing metadata about the dataset, including identifiers and targets.
- `config`: A dictionary containing configuration parameters for the sampler.
- `column_datafield_sampler`: An optional callable for custom sampling from multiple options.

#### Main Methods

##### sample

```python
def sample(self, sample: Optional[pd.Series], template: str) -> str
```

Generates a text sample based on a template and a data sample.

- `sample`: A row from the dataset. If None, a random sample is chosen.
- `template`: The template string to be filled.
- Returns: The completed text sample with all variables replaced by their values.

##### enable_class_balancing

```python
def enable_class_balancing(self, column: str)
```

Enables class-balanced sampling for a specified column.

- `column`: The column to use for balancing.

##### disable_class_balancing

```python
def disable_class_balancing(self)
```

Disables class-balanced sampling and reverts to the original dataset.

#### Usage Examples

Basic usage:

```python
import pandas as pd
from chemnlp.data.sampler import TemplateSampler

# Prepare your data, metadata, and config
df = pd.DataFrame(...)
meta = {...}
config = {...}

# Initialize the sampler
sampler = TemplateSampler(df, meta, config)

# Define a template
template = "The molecule with SMILES {SMILES#} has a {property#} of {value#}."

# Generate a sample
result = sampler.sample(df.iloc[0], template)
print(result)
```

Using class balancing:

```python
# Enable class balancing
sampler.enable_class_balancing("target_column")

# Generate balanced samples
balanced_results = [sampler.sample(None, template) for _ in range(100)]

# Disable class balancing when done
sampler.disable_class_balancing()
```

Multiple choice question:

```python
multiple_choice_template = """
Question: What is the {property__names__noun} of the molecule with SMILES {SMILES#}?
Options: {%multiple_choice_enum%4%aA1}
{value%}
Answer: {%multiple_choice_result}
"""

mc_result = sampler.sample(df.iloc[0], multiple_choice_template)
print(mc_result)
```

## Notes

- The `TemplateSampler` class supports various types of templates, including those with multiple choice questions.
- Class balancing can be useful for creating balanced datasets for machine learning tasks.
- The sampler can handle both categorical and continuous data types, with proper formatting for continuous values.
- Custom sampling functions can be provided for more control over how values are selected from multiple options.

For more detailed information on the implementation and advanced usage, please refer to the source code and unit tests.
Empty file added docs/index.md
Empty file.
20 changes: 20 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
site_name: ChemNLP Documentation
theme:
name: material
palette:
primary: teal
nav:
- Home: index.md
- User Guide:
- Installation: user-guide/installation.md
- Quick Start: user-guide/quickstart.md
- API Reference:
- Sampler Module: api/sampler.md
- Examples:
- Basic Usage: examples/basic-usage.md
- Advanced Techniques: examples/advanced-techniques.md
- Contributing: contributing.md
- Changelog: changelog.md
markdown_extensions:
- pymdownx.highlight
- pymdownx.superfences

0 comments on commit 07ee7a1

Please sign in to comment.