[Valeo][Evals] Same prompts with same data but results are not consistent #935

mabdelaziz999 · 2024-09-24T16:05:52Z

Environment:
(Hint: "Report Extension Issue on Github" command will fill these out for you.)

Version information
Gemini code assist 2.17.0 (September 2024)
VScode 1.93.1

Cloud Code Extension version:

VSCode version:

OS: Windows 10

Cloud SDK:

Skaffold:

Kubectl:

Description: Trying the same prompt with same data but getting inconsistent results. sometimes very bad results

Repro step:

give GCA a requirement statement and ask GCA to design test cases to test the given requirements.
results vary significantly and some times very bad
prompt:
Instructions to do your task:
Use a self-asking method to breakdown complex part into simpler fine-grained
Breakdown any of the sub parts if possible to more simpler
Go through one by one subparts very carefully and verify step by step
Based on your suggestions to the sub part try to optimize the original
Criticize your modifications by self asking questions and answer them one by one, about why the modification will not be good
Based on the criticism, evaluate the answer very carefully, taking into consideration that there is a chance it can be an incorrect modification and verify step by step
Based on evaluation, select one of the two following actions
(a) Generate the final answer if you are very confident that the answer is correct
(b) If you are not very sure if the answer is correct, then go to the first step to regenerate more self asking questions and repeat all the steps one more time. Be skeptical and be more biased to select option a if you have any doubts
Repeat the steps until you feel confident about the answer
Show your confidence score as percentage based on steps above
Generate as many unique test cases as needed to ensure complete coverage of the requirements.

Consider the following when generating test cases:

Input Validation: Test different types of inputs, including valid, invalid, and edge cases (e.g., null values, empty strings, maximum length).
Functional Coverage: Ensure that all functional requirements are covered by at least one test case.
User Interactions: Test different user interactions and their outcomes.
System Behavior: Consider system behavior under different conditions, such as normal operation, error conditions, and edge cases.
Performance and Load: Include test cases that check for performance issues, like handling large data sets or high concurrency.
Security Considerations: Include test cases that assess security aspects, such as unauthorized access or data leaks.
Boundary Conditions: Ensure that test cases cover minimum and maximum limits, as well as any other relevant boundary conditions.

Given the following requirement statement and all related requirements from already given requirements statements list,
generate a comprehensive list of test cases that covers all possible scenarios, including valid, invalid, edge cases, and boundary conditions.
Each test case should include the following details: Test ID, Requirement ID, Description, Input Values, Expected Outcome,
and test steps in Robot testing Framework format. output should be sequential and not tabular.
enclose each test case with "" and "" tags.
test id should be unique, start with requirement id. ex, "Test ID: PBMW_SRS_PARK_107_005"
don't reply with any explanation.

Requirement statement:
{Requirement statement}

example format:
Test ID: PBMW_SRS_PARK_58_006
Requirement ID: PBMW_SRS_PARK_58
Description:

Input Values:
Sig_ReqWipMode = WTD_WREQ_NO_REQ
PARAM_RebootsCount = CFG_n_REBOOTS_Counter_2
CFG_EN_AP_PARK = ENABLED
PARAM_EEPROMSavedParkPos = CLD_α_DP_POSITION
Sleep Request = False

Expected Outcome:
Sig_ReqPosition = unchanged
PARAM_RebootsCount = unchanged

Test Steps:

kschaab · 2024-09-25T20:52:03Z

Thank you for your issue report @mabdelaziz999. If you are able, would you please report this through the leave feedback option in the extension?

kschaab self-assigned this Sep 25, 2024

ttosta-google added the question Further information is requested label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Valeo][Evals] Same prompts with same data but results are not consistent #935

[Valeo][Evals] Same prompts with same data but results are not consistent #935

mabdelaziz999 commented Sep 24, 2024

kschaab commented Sep 25, 2024

[Valeo][Evals] Same prompts with same data but results are not consistent #935

[Valeo][Evals] Same prompts with same data but results are not consistent #935

Comments

mabdelaziz999 commented Sep 24, 2024

kschaab commented Sep 25, 2024