The framework provides a set of well-known automated evaluation metrics for text generation tasks.
The library includes the following metrics:
- Blanc: paper
- Mover score: paper
- BLEU: paper
- METEOR: paper
- ROUGE: paper
- chrF: paper
- BERTScore: paper
- BARTScore: paper
- Data statistics metrics: paper
- Compression
- Coverage
- Length
- Novelty
- Density
- Repetition
- ROUGE-We: paper
- S3: paper
- BaryScore: paper
- DepthScore: paper
- InfoLM: paper
Clone the repository and install the library from the root:
git clone https://github.com/Moonlight-Syntax/LUNA.git
pip install .
Another way is to use poetry
. Then, run poetry install
from the root.
The user can either trigger the Calculator
to evaluate metrics or integrate the code itself.
The easiest way to evaluate NLG models is to execute the following snippet:
from luna.calculate import Calculator
# Choose to compute in a sequential or a parallel setting
calculator = Calculator(execute_parallel=True)
metrics_dict = calculator.calculate(
metrics=[depth_score, s3_metrics], # both are LUNA's metrics
candidates=candidates,
references=references
)
print(metrics_dict)
>>> {"DepthScore": ..., "S3": ...}
All the metrics in the library follow the same interface:
class Metrics:
def evaluate_batch(self, hypothesyses: List[str], references: Optional[List[str]]) -> List[float]:
*some code here*
def evaluate_example(self, hypothesys: str, reference: Optional[str]) -> float:
*some code here*
Thus, to evaluate your examples run the following code:
from luna import MetricName
metric = MetricName()
result = metric.evaluate_example("Generated bad model by example", "Gold example")
results = metric.evaluate_batch(["Generated bad model by example 1", "Generated bad model by example 2"],
["Gold example 1", "Gold example 2"])
We are open for issues and pull requests. We hope that LUNA's functionality is wide enough but we believe that it can always be elaborated and improved.
We use pre-commit hooks to check the code before commiting.
To install the hooks run the following:
pip install pre-commit
pre-commit install
After that every commit will trigger standard checks on code style, including black
, isort
etc.
Tests for luna
are located in the tests
directory. To run them, execute:
pytest tests