-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create MVP AI console #934
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool. This looks pretty awesome. Haven't gotten a chance to actually run and test out the code yet. I'm definitely curious how the workflow will feel compared to the more text and command line driven approach previously.
Definitely seems like an improvement over the previous workflow.
Also the lambda on the event handlers! Didn't we could do that now. Makes things much nicer.
Yes, that would be pretty helpful. I think one case is supporting a producer that returns the full output (and not just the diff). I think that affects the visual editor UI slightly since it shows the diff fragment. Also helpful, I think would be a way to have goldens that could be used for both diff and full outputs. I think it's probably already possible with the patched.py output. But haven't tested it out if that's the case. I do think in the end, seems like the diff approach will be most efficient, especially when can handle multiple diff changes. For example one question I had was how the diff would look like if I wanted to add a new function say to line X of the code which is a blank line? How would it determine which blank line to replace? I was also wondering if returning the diff format in JSON could improve things, especially with an enforced JSON structured output.
Yeah I think that would be very useful. It could also be helpful if more than one example could be generated per example input.
Yes +1 |
+1 - agree it should be do-able .
I think you need to have bigger replacement targets, otherwise you'll get ambiguity of what to replace. https://aider.chat/docs/unified-diffs.html I think trying out udiff would be interesting as models like Gemini Flash seem to not understand diff patch format and wants to return the unified diff format. I'm a little bear-ish about JSON format because I've seen for Gemini it can return very strange results when you enforce the JSON schema in the response (the model starts repeating tokens over and over again). There's also a paper that suggests structured response can hurt performance: https://arxiv.org/abs/2408.02442 Of course, like everything else in AI, it's worth experimenting :)
Agree, I've noticed that generating more outputs with the exact same input can get significantly different results.
|
Thanks for the detailed review! |
Yeah you're right. I'll open an issue and fix this
…On Wed, Sep 11, 2024, 8:57 AM Richard To ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In ai/src/ai/common/example.py
<#934 (comment)>:
> + message: str | None = None
+
+
+class EvaluatedExampleOutput(BaseModel):
+ time_spent_secs: float
+ tokens: int
+ output: ExampleOutput
+ expect_results: list[ExpectResult]
+
+
+class EvaluatedExample(BaseModel):
+ expected: ExpectedExample
+ outputs: list[EvaluatedExampleOutput]
+
+
+class GoldenExample(BaseExample):
I think the problem that I encountered with this is that the timestamp
does not appear to be the timestamp of when the file was created in git.
But it ends up being the time the file was created when pulled from git (at
least that's what I noticed when I was looking into this over the weekend).
—
Reply to this email directly, view it on GitHub
<#934 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYBEAHOGLKM2C76MGVQCEDZWBR5HAVCNFSM6AAAAABN6AR3SWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEOJXG4YTQOJWHE>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Overview
This PR introduces an AI Console, which is a Mesop CRUD and dashboard app, and a core set of modules that provides a more structured approach, particularly around the entities and persistence of them. See console.py for an overview of the entities.
Workflows supported
Screenshot:
Future work
Feature parity
There's still a couple things left to do to get feature parity with our existing AI modules and then we can delete those:
More ideas
Having a UI makes it easier to improve our workflows in the future, for example, we could: