-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rknn-llm 是否可以为算子指定使用的后端 #225
Comments
你好,目前不支持。 |
谢谢。再请教下,rkllm有没有办法获取中间层的输出结果,比如打印注意力计算得分,softmax之后的概率值张量 |
目前只能返回LAST_HIDDEN_LAYER |
这个结果怎么使用,有demo吗? @waydong ) a<|1|im => |
RKLLMInferMode设置这个参数,demo参考这个python代码,https://github.com/airockchip/rknn-llm/blob/main/examples/rkllm_server_demo/rkllm_server/flask_server.py |
是的,我就是设置的这个参数来获取的last_hidden_layer.bin文件,然后使用C++写的后续计算logits和softmax,以及预测token输出的函数,就是结果的输出不是很合理,看起来有点语无伦次。 不知道哪里有问题,有没有特别要注意或者实现的地方? |
大模型推理时,可以自己指定哪些算子使用CPU、哪些算子使用NPU吗,还是说都内部确定好的?谢谢
The text was updated successfully, but these errors were encountered: