想詢問google 原版 albert 是pb檔的使用方法? #29

PteroMaplePT · 2019-11-10T11:39:04Z

目前看範例只有ckpt，是否有直接使用https://tfhub.dev/google/albert_base/2的方法、範例?

bojone · 2019-11-10T12:11:43Z

import numpy as np
from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import SpTokenizer


config_path = '/root/kg/bert/albert_base_en_tfhub/albert_config.json'
checkpoint_path = '/root/kg/bert/albert_base_en_tfhub/variables/variables'
spm_path = '/root/kg/bert/albert_base_en_tfhub/assets/30k-clean.model'


tokenizer = SpTokenizer(spm_path)
model = build_transformer_model(config_path, checkpoint_path, model='albert')

token_ids, segment_ids = tokenizer.encode('language model')
print(model.predict([np.array([token_ids]), np.array([segment_ids])]))

其中albert_config.json自行保存下来。

PteroMaplePT · 2019-11-10T12:48:14Z

感謝

SchenbergZY · 2019-11-24T14:15:13Z

对不起，看了这个例子，还是不明白如何把https://tfhub.dev/google/albert_base/2 的那个pb文件转换成ckpt...或许这个例子不是这个意思？

bojone · 2019-11-24T14:17:34Z

@SchenbergZY 不用转啊，variables目录下就是ckpt文件，用上述方式就可以直接加载。

SchenbergZY · 2019-11-24T15:43:46Z

是这样的，在https://tfhub.dev/google/albert_base/2 只能下载一个2.tar的文件，解压后是一个叫”2“的无扩展名文件（或许是pb？），并没有ckpt类型的文件.

bojone · 2019-11-24T16:05:14Z

@SchenbergZY 下载得到的是2.tar.gz，解压后是一个名为2的文件夹，里边有很多东西。如果不是，请重新下载并且学会解压tar.gz。我相信Google不会只为你一个人提供独特的下载结果的。

SchenbergZY · 2019-11-24T19:04:09Z

谢谢。通过看bert-for-tf2我找到了下载2.tar.gz的方法

koryako · 2021-03-25T05:55:30Z

from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import Tokenizer
import numpy as np

config_path = './albert_large/albert_config.json'
checkpoint_path = './albert_large/model.ckpt-best'
dict_path = './albert_large/30k-clean.vocab'

tokenizer = Tokenizer(dict_path, do_lower_case=True) # 建立分词器
model = build_transformer_model(config_path, checkpoint_path, model='albert') # 建立模型，加载权重

编码测试

token_ids,segment_ids = tokenizer.encode(u'are you ok')

print('\n ===== predicting =====\n')
print(model.predict([np.array([token_ids]), np.array([segment_ids])]))

报以下错误

AttributeError: 'Tokenizer' object has no attribute '_token_unk_id'

Teddy-SC · 2022-02-28T09:59:30Z

from bert4keras.models import build_transformer_model from bert4keras.tokenizers import Tokenizer import numpy as np

config_path = './albert_large/albert_config.json' checkpoint_path = './albert_large/model.ckpt-best' dict_path = './albert_large/30k-clean.vocab'

tokenizer = Tokenizer(dict_path, do_lower_case=True) # 建立分词器 model = build_transformer_model(config_path, checkpoint_path, model='albert') # 建立模型，加载权重

编码测试

token_ids,segment_ids = tokenizer.encode(u'are you ok')

print('\n ===== predicting =====\n') print(model.predict([np.array([token_ids]), np.array([segment_ids])]))

报以下错误

AttributeError: 'Tokenizer' object has no attribute '_token_unk_id'

我遇到了同样的问题，用了SpTokenizer, 但是无.match函数，bert+ner报错

bojone closed this as completed Nov 11, 2019

yuhao1982 mentioned this issue Nov 13, 2019

加载徐亮版的albert,报错了 #31

Closed

bojone mentioned this issue Jan 11, 2020

albert的example的分词器没有更新为sentencepiece分词器 #58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

想詢問google 原版 albert 是pb檔的使用方法? #29

想詢問google 原版 albert 是pb檔的使用方法? #29

PteroMaplePT commented Nov 10, 2019

bojone commented Nov 10, 2019 •

edited

Loading

PteroMaplePT commented Nov 10, 2019

SchenbergZY commented Nov 24, 2019 •

edited by bojone

Loading

bojone commented Nov 24, 2019

SchenbergZY commented Nov 24, 2019

bojone commented Nov 24, 2019 •

edited

Loading

SchenbergZY commented Nov 24, 2019

koryako commented Mar 25, 2021

Teddy-SC commented Feb 28, 2022

编码测试

想詢問google 原版 albert 是pb檔的使用方法? #29

想詢問google 原版 albert 是pb檔的使用方法? #29

Comments

PteroMaplePT commented Nov 10, 2019

bojone commented Nov 10, 2019 • edited Loading

PteroMaplePT commented Nov 10, 2019

SchenbergZY commented Nov 24, 2019 • edited by bojone Loading

bojone commented Nov 24, 2019

SchenbergZY commented Nov 24, 2019

bojone commented Nov 24, 2019 • edited Loading

SchenbergZY commented Nov 24, 2019

koryako commented Mar 25, 2021

编码测试

Teddy-SC commented Feb 28, 2022

编码测试

bojone commented Nov 10, 2019 •

edited

Loading

SchenbergZY commented Nov 24, 2019 •

edited by bojone

Loading

bojone commented Nov 24, 2019 •

edited

Loading