Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outlines examples not working #1103

Open
this-josh opened this issue Aug 17, 2024 · 6 comments
Open

Outlines examples not working #1103

this-josh opened this issue Aug 17, 2024 · 6 comments
Labels

Comments

@this-josh
Copy link

Describe the issue as clearly as possible:

I wrote this as a query on Discord, but I now have it operating as a bug so have made this.

When running the multiple choices example I find I get seemingly random outputs, the code snippet below gives

res
{'skirt': 37.0, 'dress': 17.0, 'pen': 36.0, 'jacket': 10.0}

Additionally, if I provide a nonsense input it gives a result not conforming to one of the choices: generator("Pick the o") # 'jjjjjjjjjjjjjacket'

Steps/code to reproduce the bug:

from outlines import models, generate

model = models.transformers("microsoft/Phi-3-mini-4k-instruct", device='cuda')
generator = generate.choice(model, ["skirt", "dress", "pen", "jacket"])
generator("Pick the odd word out: skirt, dress, pen, jacket")
res = {'skirt': 0.0, 'dress': 0.0, 'pen': 0.0, 'jacket': 0.0}
for _ in range(100):
    res[generator("Pick the odd word out: skirt, dress, pen, jacket")] += 1
print(res) # {'skirt': 37.0, 'dress': 17.0, 'pen': 36.0, 'jacket': 10.0}


generator("Pick the o") # 'jjjjjjjjjjjjjacket'

Expected result:

res = {'skirt': 0.0, 'dress': 0.0, 'pen': 100.0, 'jacket': 0.0}

and

generator("Pick the o") # ""

Error message:

No response

Outlines/Python version information:

Version information

``` >>> from outlines import _version; print(_version.version) 0.0.47.dev58+g8e94488 # I've also used 0.0.46

import sys; print('Python', sys.version)
Python 3.11.9 (main, Jul 25 2024, 22:42:09) [Clang 18.1.8 ]

I've tried this using both CUDA and MPS and have the same issue

</details>


### Context for the issue:

As described in my Discord message this seems to be an issue preventing fundamental Outlines features.
@this-josh this-josh added the bug label Aug 17, 2024
@this-josh
Copy link
Author

Here is the original discord message:
I'm migrating to outlines from guidance but I'm having problems with the examples not working. I've tried three different backends, mlxlm, transformers(device='mps'), and transformers('cuda:0'), I've also used two different models, Phi3 (as used in the examples) and gemma2-2b, all of them give nonsense results.

When considering the following examples I get seemingly random results, here they are with Phi3 in Cuda

from outlines import models, generate
import outlines
model = models.transformers("microsoft/Phi-3-mini-4k-instruct", device='cuda:0')

generator = outlines.generate.text(model)
print(generator("Question: What's 2+6? Answer:", max_tokens=200)) #
"""
The sum of 2 and 6 is 8.
A pan flute is the instrument made from what material? Answer: A pan flute is traditionally made from tubes of varying lengths and materials, commonly bamboo, reeds, or various types of metal or plastic.
"""
generator = generate.format(model, int)
print(generator("2+6", max_tokens=200)) 
# -1

print(generator("When I was 6 my sister was half my age. Now I'm 70 how old is my sister?", max_tokens=200))
# -8

generator = generate.choice(model, ["skirt", "dress", "pen", "jacket"])
print(generator("Pick the odd word out: skirt, dress, pen, jacket"))
# skirt

I have to constrain the number of tokens otherwise it never stops, is there something I'm doing fundamentally wrong? I've tried outlines version 0.0.46 and main( at 8e944488)

When I try using outlines on my actual problem - to classify text from -9 to 9 - I get similar problems. Here I've tried a regex, choice, and custom Enum type based approach, the outputs seem to be not related to the output

@lapp0
Copy link
Contributor

lapp0 commented Aug 20, 2024

There are a few fixes that require better documentation in outlines, and this issue should be considered a documentation issue.

Firstly, the model produces much better results when you use the chat template #987 Otherwise the model thinks you're just completing a single message. This should improve the quality of all your responses.

For unending generation, the problem is that there is no max size for integers. This can be fixed with a token limit, or you can try something like conint(gt=0, lt=1000)

@cpfiffer
Copy link
Contributor

Ah, okay. So the technical issue was templating, but the documentation could be improved here.

Could you share the results you got here, just as a comparison? Might help write the docs here. I can take a look at it.

And regarding the integer generation issue strikes me as very odd, because the model should be have a decreasing probability of picking repeating values. That to me implies a different issue.

I'm still digging into the package, so take my guess with a grain of salt.

@lapp0
Copy link
Contributor

lapp0 commented Aug 22, 2024

Could you share the results you got here, just as a comparison? Might help write the docs here. I can take a look at it.

From #987

No chat template:

>>> output = model.generate(**tokenizer("What is 1 + 1?", return_tensors="pt"), max_length=32)                                                                                                                    
>>> tokenizer.decode(output[0])
"<s> What is 1 + 1?\n\nThis question has been asked by many people, but I don't understand the answer.\n\nCould"

With chat template:

output = model.generate(**tokenizer('<s><|user|> What is 1 + 1?<|end|><|assistant|>', return_tensors="pt"), max_length=32)
tokenizer.decode(output[0])
'<s><s><|user|> What is 1 + 1?<|end|><|assistant|> 1 + 1 equals 2. This is a basic arithmetic addition problem. When you'

And regarding the integer generation issue strikes me as very odd, because the model should be have a decreasing probability of picking repeating values.

By default, language models have a tendency to repeat themselves ad nauseam. This can be mitigated with a repetition penalty. A chat template may also help with the issue.

@this-josh
Copy link
Author

Sorry I'm not really sure how this information relates to the issue I outlined? Is the usage of outlines in the docs examples incorrect?

Also, why is it failing to choose one of the options provided in generate.choice?

@lapp0
Copy link
Contributor

lapp0 commented Sep 16, 2024

@this-josh sorry I misunderstood your issue. Thanks for reporting it!

I think this is the same issue as #1109

I'll look into it this week and ping you with a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants