-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support emu3-chat #2322
support emu3-chat #2322
Conversation
mi804
commented
Oct 23, 2024
- Document Updates
- More Models or Datasets Support
@@ -317,6 +317,8 @@ def __post_init__(self): | |||
connector='aligner', | |||
generator=['gen_vision_model', 'gen_aligner', 'gen_head', 'gen_embed']) | |||
|
|||
EMU3_CHAT_KEYS = MultiModelKeys(language_model='model', ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check it
swift/llm/utils/template.py
Outdated
image_placeholder = ['<|image token|>'] | ||
|
||
def __init__(self): | ||
Template.__init__(self, [], [' User: {{QUERY}}. Assistant:'], ['<|extra_204|>'], ['<|extra_204|>'], self.system, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check the user's previous ' '
swift/llm/utils/template.py
Outdated
inputs, _ = super()._encode(example) | ||
if len(inputs) == 0: | ||
return inputs, {} | ||
inputs['input_ids'] = [self.tokenizer.bos_token_id] + inputs['input_ids'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add bos_token in Template.__init__(self, [],..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
self.tokenizer.boi_token + self.tokenizer.processor.prefix_template.format(H=h, W=w) | ||
+ self.tokenizer.img_token + imgstr + self.tokenizer.eol_token + self.tokenizer.eof_token | ||
+ self.tokenizer.eoi_token) | ||
image_prompts.append(self.tokenizer.encode(image_prompt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check add_special_tokens=False
in encode function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have checked, it's the same with offical implementation of emu3
+ 1:] | ||
added_tokens_len += len(img_tokens) - 1 | ||
|
||
return {'input_ids': input_ids, 'labels': labels} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move post_encode code to encode