Create MVP AI console #934

wwwillchen · 2024-09-10T06:38:26Z

Overview

This PR introduces an AI Console, which is a Mesop CRUD and dashboard app, and a core set of modules that provides a more structured approach, particularly around the entities and persistence of them. See console.py for an overview of the entities.

Workflows supported

Create/update producers, models, prompt contexts, prompt fragments, expected examples (for evals), golden examples
Run evals
AI service (for editor toolbar) uses producers

Screenshot:

Future work

Feature parity

There's still a couple things left to do to get feature parity with our existing AI modules and then we can delete those:

Format golden dataset (to upload for fine-tuning) - @richard-to, if you can help with this, since you did it for Gemini, that'd be helpful :)
Support example variables in prompt fragments - this could allow us to do more powerful few-shot prompting, which is what I think you were going for earlier.

More ideas

Having a UI makes it easier to improve our workflows in the future, for example, we could:

Create a button to generate more prompts for expected examples.
Create a button to turn an evaluated example into an expected example (e.g. simulate a follow-up interaction, e.g. editing a specific component) or into a golden example (e.g. save the best evaluated example).
Support selecting different producers from the editor toolbar (this makes it easier to experiment with different models/settings in a more realistic workflow).
Support pasting before/after to create a golden example (and generate the diff)

richard-to

Very cool. This looks pretty awesome. Haven't gotten a chance to actually run and test out the code yet. I'm definitely curious how the workflow will feel compared to the more text and command line driven approach previously.

Definitely seems like an improvement over the previous workflow.

Also the lambda on the event handlers! Didn't we could do that now. Makes things much nicer.

ai/src/service.py

ai/src/console.py

ai/src/ai/console/scaffold.py

ai/src/ai/common/executor.py

ai/src/ai/common/prompt_fragment.py

ai/src/ai/console/pages/add_edit_eval_page.py

richard-to · 2024-09-10T21:25:39Z

Support selecting different producers from the editor toolbar (this makes it easier to experiment with different models/settings in a more realistic workflow).

Yes, that would be pretty helpful. I think one case is supporting a producer that returns the full output (and not just the diff). I think that affects the visual editor UI slightly since it shows the diff fragment.

Also helpful, I think would be a way to have goldens that could be used for both diff and full outputs. I think it's probably already possible with the patched.py output. But haven't tested it out if that's the case.

I do think in the end, seems like the diff approach will be most efficient, especially when can handle multiple diff changes. For example one question I had was how the diff would look like if I wanted to add a new function say to line X of the code which is a blank line? How would it determine which blank line to replace? I was also wondering if returning the diff format in JSON could improve things, especially with an enforced JSON structured output.

Create a button to turn an evaluated example into an expected example (e.g. simulate a follow-up interaction, e.g. editing a specific component) or into a golden example (e.g. save the best evaluated example).

Yeah I think that would be very useful. It could also be helpful if more than one example could be generated per example input.

Support pasting before/after to create a golden example (and generate the diff)

Yes +1

wwwillchen · 2024-09-10T23:32:22Z

Support selecting different producers from the editor toolbar (this makes it easier to experiment with different models/settings in a more realistic workflow).

Yes, that would be pretty helpful. I think one case is supporting a producer that returns the full output (and not just the diff). I think that affects the visual editor UI slightly since it shows the diff fragment.

Also helpful, I think would be a way to have goldens that could be used for both diff and full outputs. I think it's probably already possible with the patched.py output. But haven't tested it out if that's the case.

+1 - agree it should be do-able .

I do think in the end, seems like the diff approach will be most efficient, especially when can handle multiple diff changes. For example one question I had was how the diff would look like if I wanted to add a new function say to line X of the code which is a blank line? How would it determine which blank line to replace? I was also wondering if returning the diff format in JSON could improve things, especially with an enforced JSON structured output.

I think you need to have bigger replacement targets, otherwise you'll get ambiguity of what to replace.

https://aider.chat/docs/unified-diffs.html
https://aider.chat/docs/benchmarks.html

I think trying out udiff would be interesting as models like Gemini Flash seem to not understand diff patch format and wants to return the unified diff format.

I'm a little bear-ish about JSON format because I've seen for Gemini it can return very strange results when you enforce the JSON schema in the response (the model starts repeating tokens over and over again). There's also a paper that suggests structured response can hurt performance: https://arxiv.org/abs/2408.02442

Of course, like everything else in AI, it's worth experimenting :)

Create a button to turn an evaluated example into an expected example (e.g. simulate a follow-up interaction, e.g. editing a specific component) or into a golden example (e.g. save the best evaluated example).

Yeah I think that would be very useful. It could also be helpful if more than one example could be generated per example input.

Agree, I've noticed that generating more outputs with the exact same input can get significantly different results.

Support pasting before/after to create a golden example (and generate the diff)

Yes +1

wwwillchen · 2024-09-10T23:32:40Z

Thanks for the detailed review!

wwwillchen · 2024-09-11T16:20:18Z

Yeah you're right. I'll open an issue and fix this

…

On Wed, Sep 11, 2024, 8:57 AM Richard To ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In ai/src/ai/common/example.py <#934 (comment)>: > + message: str | None = None + + +class EvaluatedExampleOutput(BaseModel): + time_spent_secs: float + tokens: int + output: ExampleOutput + expect_results: list[ExpectResult] + + +class EvaluatedExample(BaseModel): + expected: ExpectedExample + outputs: list[EvaluatedExampleOutput] + + +class GoldenExample(BaseExample): I think the problem that I encountered with this is that the timestamp does not appear to be the timestamp of when the file was created in git. But it ends up being the time the file was created when pulled from git (at least that's what I noticed when I was looking into this over the weekend). — Reply to this email directly, view it on GitHub <#934 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABYBEAHOGLKM2C76MGVQCEDZWBR5HAVCNFSM6AAAAABN6AR3SWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEOJXG4YTQOJWHE> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

wwwillchen added 4 commits September 9, 2024 22:45

MVP: AI console

5834d22

more

a98bba9

fix

2a8d4ad

fix readme

6cccdb1

wwwillchen requested a review from richard-to September 10, 2024 07:21

richard-to approved these changes Sep 10, 2024

View reviewed changes

pr comments

e1f0ecb

wwwillchen merged commit 6f021e8 into google:main Sep 10, 2024
1 of 2 checks passed

wwwillchen deleted the ai-console branch September 10, 2024 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create MVP AI console #934

Create MVP AI console #934

wwwillchen commented Sep 10, 2024 •

edited

Loading

richard-to left a comment

richard-to commented Sep 10, 2024

wwwillchen commented Sep 10, 2024

wwwillchen commented Sep 10, 2024

wwwillchen commented Sep 11, 2024 via email

Create MVP AI console #934

Create MVP AI console #934

Conversation

wwwillchen commented Sep 10, 2024 • edited Loading

Overview

Workflows supported

Future work

Feature parity

More ideas

richard-to left a comment

Choose a reason for hiding this comment

richard-to commented Sep 10, 2024

wwwillchen commented Sep 10, 2024

wwwillchen commented Sep 10, 2024

wwwillchen commented Sep 11, 2024 via email

wwwillchen commented Sep 10, 2024 •

edited

Loading