Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Valeo][Evals] Same prompts with same data but results are not consistent #935

Open
mabdelaziz999 opened this issue Sep 24, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@mabdelaziz999
Copy link

Environment:
(Hint: "Report Extension Issue on Github" command will fill these out for you.)

Version information
Gemini code assist 2.17.0 (September 2024)
VScode 1.93.1

Cloud Code Extension version:

VSCode version:

OS: Windows 10

Cloud SDK:

Skaffold:

Kubectl:

Description: Trying the same prompt with same data but getting inconsistent results. sometimes very bad results

Repro step:

  • give GCA a requirement statement and ask GCA to design test cases to test the given requirements.
  • results vary significantly and some times very bad
  • prompt:
    Instructions to do your task:
    Use a self-asking method to breakdown complex part into simpler fine-grained
    Breakdown any of the sub parts if possible to more simpler
    Go through one by one subparts very carefully and verify step by step
    Based on your suggestions to the sub part try to optimize the original
    Criticize your modifications by self asking questions and answer them one by one, about why the modification will not be good
    Based on the criticism, evaluate the answer very carefully, taking into consideration that there is a chance it can be an incorrect modification and verify step by step
    Based on evaluation, select one of the two following actions
    (a) Generate the final answer if you are very confident that the answer is correct
    (b) If you are not very sure if the answer is correct, then go to the first step to regenerate more self asking questions and repeat all the steps one more time. Be skeptical and be more biased to select option a if you have any doubts
    Repeat the steps until you feel confident about the answer
    Show your confidence score as percentage based on steps above
    Generate as many unique test cases as needed to ensure complete coverage of the requirements.

Consider the following when generating test cases:

  • Input Validation: Test different types of inputs, including valid, invalid, and edge cases (e.g., null values, empty strings, maximum length).
  • Functional Coverage: Ensure that all functional requirements are covered by at least one test case.
  • User Interactions: Test different user interactions and their outcomes.
  • System Behavior: Consider system behavior under different conditions, such as normal operation, error conditions, and edge cases.
  • Performance and Load: Include test cases that check for performance issues, like handling large data sets or high concurrency.
  • Security Considerations: Include test cases that assess security aspects, such as unauthorized access or data leaks.
  • Boundary Conditions: Ensure that test cases cover minimum and maximum limits, as well as any other relevant boundary conditions.

Given the following requirement statement and all related requirements from already given requirements statements list,
generate a comprehensive list of test cases that covers all possible scenarios, including valid, invalid, edge cases, and boundary conditions.
Each test case should include the following details: Test ID, Requirement ID, Description, Input Values, Expected Outcome,
and test steps in Robot testing Framework format. output should be sequential and not tabular.
enclose each test case with "" and "" tags.
test id should be unique, start with requirement id. ex, "Test ID: PBMW_SRS_PARK_107_005"
don't reply with any explanation.

Requirement statement:
{Requirement statement}

example format:
Test ID: PBMW_SRS_PARK_58_006
Requirement ID: PBMW_SRS_PARK_58
Description:

Input Values:
Sig_ReqWipMode = WTD_WREQ_NO_REQ
PARAM_RebootsCount = CFG_n_REBOOTS_Counter_2
CFG_EN_AP_PARK = ENABLED
PARAM_EEPROMSavedParkPos = CLD_α_DP_POSITION
Sleep Request = False

Expected Outcome:
Sig_ReqPosition = unchanged
PARAM_RebootsCount = unchanged

Test Steps:

@kschaab kschaab self-assigned this Sep 25, 2024
@kschaab
Copy link
Contributor

kschaab commented Sep 25, 2024

Thank you for your issue report @mabdelaziz999. If you are able, would you please report this through the leave feedback option in the extension?

@ttosta-google ttosta-google added the question Further information is requested label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants