Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

想詢問google 原版 albert 是pb檔的使用方法? #29

Closed
PteroMaplePT opened this issue Nov 10, 2019 · 9 comments
Closed

想詢問google 原版 albert 是pb檔的使用方法? #29

PteroMaplePT opened this issue Nov 10, 2019 · 9 comments

Comments

@PteroMaplePT
Copy link

目前看範例只有ckpt,是否有直接使用https://tfhub.dev/google/albert_base/2的方法、範例?

@bojone
Copy link
Owner

bojone commented Nov 10, 2019

import numpy as np
from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import SpTokenizer


config_path = '/root/kg/bert/albert_base_en_tfhub/albert_config.json'
checkpoint_path = '/root/kg/bert/albert_base_en_tfhub/variables/variables'
spm_path = '/root/kg/bert/albert_base_en_tfhub/assets/30k-clean.model'


tokenizer = SpTokenizer(spm_path)
model = build_transformer_model(config_path, checkpoint_path, model='albert')

token_ids, segment_ids = tokenizer.encode('language model')
print(model.predict([np.array([token_ids]), np.array([segment_ids])]))

其中albert_config.json自行保存下来。

@PteroMaplePT
Copy link
Author

感謝

@SchenbergZY
Copy link

SchenbergZY commented Nov 24, 2019

对不起,看了这个例子,还是不明白如何把https://tfhub.dev/google/albert_base/2 的那个pb文件转换成ckpt...或许这个例子不是这个意思?

@bojone
Copy link
Owner

bojone commented Nov 24, 2019

@SchenbergZY 不用转啊,variables目录下就是ckpt文件,用上述方式就可以直接加载。

@SchenbergZY
Copy link

是这样的,在https://tfhub.dev/google/albert_base/2 只能下载一个2.tar的文件,解压后是一个叫”2“的无扩展名文件(或许是pb?),并没有ckpt类型的文件.

@bojone
Copy link
Owner

bojone commented Nov 24, 2019

@SchenbergZY 下载得到的是2.tar.gz,解压后是一个名为2的文件夹,里边有很多东西。如果不是,请重新下载并且学会解压tar.gz。我相信Google不会只为你一个人提供独特的下载结果的。

@SchenbergZY
Copy link

谢谢。通过看bert-for-tf2我找到了下载2.tar.gz的方法

@koryako
Copy link

koryako commented Mar 25, 2021

from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import Tokenizer
import numpy as np

config_path = './albert_large/albert_config.json'
checkpoint_path = './albert_large/model.ckpt-best'
dict_path = './albert_large/30k-clean.vocab'

tokenizer = Tokenizer(dict_path, do_lower_case=True) # 建立分词器
model = build_transformer_model(config_path, checkpoint_path, model='albert') # 建立模型,加载权重

编码测试

token_ids,segment_ids = tokenizer.encode(u'are you ok')

print('\n ===== predicting =====\n')
print(model.predict([np.array([token_ids]), np.array([segment_ids])]))

报以下错误

AttributeError: 'Tokenizer' object has no attribute '_token_unk_id'

@Teddy-SC
Copy link

from bert4keras.models import build_transformer_model from bert4keras.tokenizers import Tokenizer import numpy as np

config_path = './albert_large/albert_config.json' checkpoint_path = './albert_large/model.ckpt-best' dict_path = './albert_large/30k-clean.vocab'

tokenizer = Tokenizer(dict_path, do_lower_case=True) # 建立分词器 model = build_transformer_model(config_path, checkpoint_path, model='albert') # 建立模型,加载权重

编码测试

token_ids,segment_ids = tokenizer.encode(u'are you ok')

print('\n ===== predicting =====\n') print(model.predict([np.array([token_ids]), np.array([segment_ids])]))

报以下错误

AttributeError: 'Tokenizer' object has no attribute '_token_unk_id'

我遇到了同样的问题,用了SpTokenizer, 但是无.match函数,bert+ner报错

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants