Skip to content

azure-ai-evaluation_1.0.0

Compare
Choose a tag to compare
@azure-sdk azure-sdk released this 13 Nov 22:31
58c158e

1.0.0 (2024-11-13)

Breaking Changes

  • The parallel parameter has been removed from composite evaluators: QAEvaluator, ContentSafetyChatEvaluator, and ContentSafetyMultimodalEvaluator. To control evaluator parallelism, you can now use the _parallel keyword argument, though please note that this private parameter may change in the future.
  • Parameters query_response_generating_prompty_kwargs and user_simulator_prompty_kwargs have been renamed to query_response_generating_prompty_options and user_simulator_prompty_options in the Simulator's call method.

Bugs Fixed

  • Fixed an issue where the output_path parameter in the evaluate API did not support relative path.
  • Output of adversarial simulators are of type JsonLineList and the helper function to_eval_qr_json_lines now outputs context from both user and assistant turns along with category if it exists in the conversation
  • Fixed an issue where during long-running simulations, API token expires causing "Forbidden" error. Instead, users can now set an environment variable AZURE_TOKEN_REFRESH_INTERVAL to refresh the token more frequently to prevent expiration and ensure continuous operation of the simulation.
  • Fix evaluate function not producing aggregated metrics if ANY values to be aggregated were None, NaN, or
    otherwise difficult to process. Such values are ignored fully, so the aggregated metric of [1, 2, 3, NaN]
    would be 2, not 1.5.

Other Changes

  • Refined error messages for serviced-based evaluators and simulators.
  • Tracing has been disabled due to Cosmos DB initialization issue.
  • Introduced environment variable AI_EVALS_DISABLE_EXPERIMENTAL_WARNING to disable the warning message for experimental features.
  • Changed the randomization pattern for AdversarialSimulator such that there is an almost equal number of Adversarial harm categories (e.g. Hate + Unfairness, Self-Harm, Violence, Sex) represented in the AdversarialSimulator outputs. Previously, for 200 max_simulation_results a user might see 140 results belonging to the 'Hate + Unfairness' category and 40 results belonging to the 'Self-Harm' category. Now, user will see 50 results for each of Hate + Unfairness, Self-Harm, Violence, and Sex.
  • For the DirectAttackSimulator, the prompt templates used to generate simulated outputs for each Adversarial harm category will no longer be in a randomized order by default. To override this behavior, pass randomize_order=True when you call the DirectAttackSimulator, for example:
adversarial_simulator = DirectAttackSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
outputs = asyncio.run(
    adversarial_simulator(
        scenario=scenario,
        target=callback,
        randomize_order=True
    )
)