-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tokenizer):add support for open source llm tokenizers #1701
feat(tokenizer):add support for open source llm tokenizers #1701
Conversation
@microsoft-github-policy-service agree |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1701 +/- ##
===========================================
- Coverage 39.62% 15.93% -23.70%
===========================================
Files 57 57
Lines 6006 6024 +18
Branches 1338 1457 +119
===========================================
- Hits 2380 960 -1420
- Misses 3433 5027 +1594
+ Partials 193 37 -156
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
9ea982a
to
a842f46
Compare
@olgavrou @AaronWard @kevin666aa @SDcodehub Hello everyone, this is a workaround that I implemented for now in order to be able to continue with my work, but I'd like to discuss with you of a proper implementation before starting the refactoring. |
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
@Halpph is this still the workaround you are using? |
I'm not actively using it at the moment, but yes. |
This PR is against AutoGen 0.2. AutoGen 0.2 has been moved to the 0.2 branch. Please rebase your PR on the 0.2 branch or update it to work with the new AutoGen 0.4 that is now in main. |
@Halpph If you can update to resolve conflicts and see if we can get CI to pass we can look at bringing this forward |
closing as stale, please reopen if you would like to bring it up to date. |
Hello everyone, I saw the Contribute guide but for some reason the tests would always fail on
_____________________________________ ERROR collecting test/test_function_utils.py _____________________________________ test/test_function_utils.py:288: in <module> class Currency(BaseModel): pydantic/main.py:197: in pydantic.main.ModelMetaclass.__new__ ??? pydantic/fields.py:497: in pydantic.fields.ModelField.infer ??? pydantic/fields.py:469: in pydantic.fields.ModelField._get_field_info ??? E ValueError:
Fielddefault cannot be set in
Annotatedfor 'amount'
I tried but I'm very busy and I seem to not manage to make it work for now, I hope you can take a look and I'll try to run it again.
Why are these changes needed?
This PR solves the following issue https://github.com/microsoft/autogen/issues/1666
Basically while serving open source llm models we were always tokenizing using cl100k_base, but now we support the native way of each model by specifying it in the OAI_CONFIG_LIST
Related issue number
Closes #1666
Checks
Sadly I didn't manage to run checks because of the error mentioned above