Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何将多个模型发布成一个服务? #874

Open
iceriver97 opened this issue Sep 3, 2020 · 16 comments
Open

如何将多个模型发布成一个服务? #874

iceriver97 opened this issue Sep 3, 2020 · 16 comments
Assignees
Labels
serving Issues of hub serving

Comments

@iceriver97
Copy link

我想问一下我能否使用HubServing将多个模型发布成一个服务?也就是通过一个端口启动多个模型作为服务,然后通过不同的url来访问不同的服务?

@Steffy-zxf Steffy-zxf added the serving Issues of hub serving label Sep 3, 2020
@ShenYuhan
Copy link
Contributor

ShenYuhan commented Sep 4, 2020

可以的,可以参考这个https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.8/paddlehub/serving/templates/serving_config.json
这个配置文件使用了两个模型,只需要启动的时候指定此配置文件启动,就能实现您的需求了

@iceriver97 iceriver97 reopened this Sep 7, 2020
@iceriver97
Copy link
Author

@ShenYuhan 我再通过这种方式(配置文件)启动某个模型时,会出现系统卡死的现象,若是单独使用 -m 启动该模型,则没有问题,这是为什么呢?

@iceriver97
Copy link
Author

@ShenYuhan 这里是错误信息

/anaconda3/envs/py36-pp/lib/python3.6/site-packages/paddle/fluid/executor.py:1093: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used.
  warnings.warn(error_info)
2020-09-07 11:19:38  -  

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::memory::detail::AlignedMalloc(unsigned long)
3   paddle::memory::detail::CPUAllocator::Alloc(unsigned long*, unsigned long)
4   paddle::memory::detail::BuddyAllocator::RefillPool(unsigned long)
5   paddle::memory::detail::BuddyAllocator::Alloc(unsigned long)
6   void* paddle::memory::legacy::Alloc<paddle::platform::CPUPlace>(paddle::platform::CPUPlace const&, unsigned long)
7   paddle::memory::allocation::NaiveBestFitAllocator::AllocateImpl(unsigned long)
8   paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
9   paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
10  paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
11  paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
12  paddle::framework::ir::GraphPatternDetector::operator()(paddle::framework::ir::Graph*, std::function<void (std::unordered_map<paddle::framework::ir::PDNode*, paddle::framework::ir::Node*, std::hash<paddle::framework::ir::PDNode*>, std::equal_to<paddle::framework::ir::PDNode*>, std::allocator<std::pair<paddle::framework::ir::PDNode* const, paddle::framework::ir::Node*> > > const&, paddle::framework::ir::Graph*)>)
13  paddle::framework::ir::FCFusePass::ApplyFCPattern(paddle::framework::ir::Graph*, bool) const
14  paddle::framework::ir::FCFusePass::ApplyImpl(paddle::framework::ir::Graph*) const
15  paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph*) const
16  paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_delete<paddle::framework::ir::Graph> >)
17  paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument*)
18  paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument*)
19  paddle::AnalysisPredictor::OptimizeInferenceProgram()
20  paddle::AnalysisPredictor::PrepareProgram(std::shared_ptr<paddle::framework::ProgramDesc> const&)
21  paddle::AnalysisPredictor::Init(std::shared_ptr<paddle::framework::Scope> const&, std::shared_ptr<paddle::framework::ProgramDesc> const&)
22  std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
23  std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(paddle::AnalysisConfig const&)

----------------------
Error Message Summary:
----------------------
Error: Alloc 524288000 error!
  [Hint: Expected posix_memalign(&p, alignment, size) == 0, but received posix_memalign(&p, alignment, size):12 != 0:0.] at (/paddle/paddle/fluid/memory/detail/system_allocator.cc:59)

@iceriver97
Copy link
Author

iceriver97 commented Sep 7, 2020

@ShenYuhan 我在将多个模型发布成一个服务时有什么限制吗?

@ShenYuhan
Copy link
Contributor

您可以提供下您的配置文件看下吗

@iceriver97
Copy link
Author

@ShenYuhan 配置文件内容如下:

{
  "modules_info": {
    "jieba_paddle": {
      "init_args": {
        "version": "1.0.0"
      },
      "predict_args": {

      }
    },
    "lac": {
      "init_args": {
        "version": "2.2.0"
      },
      "predict_args": {

      }
    },
    "jieba_rank": {
      "init_args": {
        "version": "1.0.0"
      },
      "predict_args": {

      }
    },
    "chinese-roberta-wwm-ext-large_finetuned": {
      "init_args": {
        "version": "1.0.0"
      },
      "predict_args": {

      }
    }
  },
  "port": 8866,
  "use_multiprocess": true,
  "workers": 16,
  "timeout": 300
}

@ShenYuhan
Copy link
Contributor

您试试把workers降低,看看是否还报错

@iceriver97
Copy link
Author

iceriver97 commented Sep 8, 2020

@ShenYuhan 谢谢,我将其调整为 2 之后可以正常运行了。但是当我想使用GPU的时候会出现:

----------------------
Error Message Summary:
----------------------
ExternalError:  Cublas error, CUBLAS_STATUS_EXECUTION_FAILED  at (/paddle/paddle/fluid/operators/math/blas_impl.cu.h:34)

我已经将 use_multiprocess 设置为 false ,是那里出了问题呢?

@ShenYuhan
Copy link
Contributor

可以贴一下完整的报错日志吗

@iceriver97
Copy link
Author

@ShenYuhan 我在更新 CUDA 9.0 的补丁之后出现新的错误,下面是完整的错误信息:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   paddle::platform::GetCurrentDeviceId()
3   paddle::platform::RecordedCudaMalloc(void**, unsigned long, int)
4   paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
5   paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long)
6   paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long)
7   paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)
8   paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
9   paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
10  paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
11  paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
12  paddle::PaddleTensorToLoDTensor(paddle::PaddleTensor const&, paddle::framework::LoDTensor*, paddle::platform::Place const&)
13  paddle::AnalysisPredictor::SetFeed(std::vector<paddle::PaddleTensor, std::allocator<paddle::PaddleTensor> > const&, paddle::framework::Scope*)
14  paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocator<paddle::PaddleTensor> > const&, std::vector<paddle::PaddleTensor, std::allocator<paddle::PaddleTensor> >*, int)

----------------------
Error Message Summary:
----------------------
ExternalError:  Cuda error(3), initialization error.
  [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] at (/paddle/paddle/fluid/platform/gpu_info.cc:183)

@iceriver97
Copy link
Author

@ShenYuhan 谢谢,您知道怎么解决吗?

@ShenYuhan
Copy link
Contributor

您试着排除某个模型,看看是哪个模型报的问题,然后单独使用再试试

@iceriver97
Copy link
Author

@ShenYuhan 您好,我尝试了 lac的官方模型,当我使用 hub serving start -m lac --use_gpu 时是不会出现错误的,但当我将该模型放到配置文件中,指定 predict_args:use_gpu 为Ture 的时候就会出现上面的问题。

@iceriver97
Copy link
Author

@ShenYuhan 您有尝试官方lac模型吗?

@ShenYuhan
Copy link
Contributor

看错误像是显存不够,您在跑程序的时候是否有别的程序占用gpu呢

@iceriver97
Copy link
Author

@ShenYuhan 谢谢!没有别的程序啊;使用配置文件启动模型和直接启动模型两种方式对显存要求不一致吗?为什么单独启动就没问题呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serving Issues of hub serving
Projects
None yet
Development

No branches or pull requests

3 participants