Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG: ] Do we have a paper of a doc describing Mixtral-8x7B and Mixtral-8x22B model architectures? #233

Open
vipannalla opened this issue Nov 7, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@vipannalla
Copy link

Python -VV

N/A

Pip Freeze

N/A

Reproduction Steps

N/A

Expected Behavior

N/A

Additional Context

We have a Mixtral implementation on JAX which works fine with 8x7B model but it generated garbage output with 8x22B model. I couldn't find any paper or doc describing the detailed architeture, is there any?

The only different noticed is Mixtral-8x7B uses tokenizer-v1 and Mixtral-8x22B uses tokenizer-v3. I also understand they have different number of params (47B vs 141B). However beyond this, do both models pretty much share exact model architecture? Are there any subtle differences in implementation I'm missing? Where can I find more details?

Thanks

Suggested Solutions

No response

@vipannalla vipannalla added the bug Something isn't working label Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant