Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Request Management Proposal #42

Closed
afabiani opened this issue Nov 11, 2019 · 0 comments
Closed

Async Request Management Proposal #42

afabiani opened this issue Nov 11, 2019 · 0 comments
Assignees

Comments

@afabiani
Copy link
Member

Introduction

The scope of this proposal is to present a reliable technical proposal which, under certain specific conditions which we will discuss further in the next sections, can guarantee that heavy-load execution requests won’t be executed twice.

Requirements and scope

Requirements and Assumptions

We are assuming here that only one instance of GeoServer will be responsible for the process scheduling and execution.

In other words, the assumptions are that the current architecture won’t include a cluster of GeoServer, but only one instance at a time while we can have a cluster of processing machines to offload the processing tasks, as per the image below.

image

Technical Solution

The technical proposal is to allow the remote processing endpoints to decide whenever they can execute a certain process or not. Through the processes configurations, we already know exactly which ones can be managed by the endpoint, other than their possible inputs and input-types.

The idea would be to assign each processing node a capacity (an integer from 1 to 100) that is set by the system administrator which represents the measure of the processing resources on the node; such value can be updated at runtime to refine the estimate.
We would also have each process declare its weight, an integer from 1 to 100 that is set by the developer and which represents a rough measure of the processing resources needed by a standard execution of the process;
Each process will be also given the possibility to manipulate its weight dynamically based on the input of the execution request to account for different inputs.

On a processing node we also account for:

  • Blacklisted processes. If a blacklisted process is running no other process will be scheduled until the blacklisted process runs.

  • Load Average. If the average CPU and memory load over the last XXX minutes on a node stays above a configurable threshold then no other process can be run on that node.

Once GeoServer will, upon a WPS Execute request, will try to search for a suitable processing node to execute a certain process, it will ask to all available nodes, in round-robin, if they could take care of the execution or not; each node will compare its residual capacity and decide if it can execute or not.

The residual capacity will be kept updated on the local node as the difference between the initial capacity and the sum of weights of the running processing on a specific node. If the residual capacity is bigger than the weight requested to run the new process (eventually amended taking into account the inputs of the specific requests) the new process will run, Otherwise, we will check the next processing nodes.

If at a certain point in time no processing nodes have enough residual capacity to run a WPS process, GeoServer will throwback to the requestor an exception.

From an implementation point of view, we will add a “template classes” able to describe a process capacity weight and also allowing a user to assign, statically or dynamically, a coefficient, in order to optimize and tune-up the capacity estimations.

Implementation Details

image

ProcessWeight template class

A template class “ProcessWeight” is used to describe the process weights. The weights are estimations that will impact the remote processing machine residual capacity.

class ProcessWeight():
    process_id = “”
    weight = [1; 100]
    coefficient = 1.0

    # ability to customize process load on per request basis
    @property
    def request_weight(self, exec_request):
        # this one is the default implementation
        return (coefficient * weight)

Global variables on the remote processing machine

Notice that service_processor.py, the main daemon running on each remote processing machine orchestrating the remote processes executions, keeps synchronized the following global variables.

# total capacity on this node. It can be updated at runtime by editing the node 
# configuration
capacity = [0, 100] 

# current load on this node, it update when a process starts or ends
load = 0 

# cpu_usage and available_mem over the last XXX minutes
load_average = [0, 100]  
load_threshold = [0, 100]

# a list of names the processing machine checks before starting a new process
black_listed_running_processes = [...]

Execution pseudo-logic on each processing machine

def execute(execute_request):
# if the load average is above threshold then we cannot run another process
# and we should answer with a proper message
if load_average > load_threshold:
    return -1 

# if there is at least a single blacklisted process running there is no avail
# capacity to run another process and we should answer with a proper message
if len(black_listed_running_processes) > 0:
    return -1 

# compute residual capacity
residual_capacity = capacity - load 
if (residual_capacity <=0):
    return -1 # no residual capacity left, no execution
    
# we have capacity, is it enough?
request_load = PCk.request_weight(exec_request)
if (residual_capacity >= request_load):
    # ok we can execute, update load, execute and return
    load += request_load
    # execute
    return PCk.execute(execute_request)

    # residual capacity is not enough. Skip to the next remote endpoint
    return -1

As we stated above, this pseudocode runs in a synchronized block, since processes can die while we try to run news ones.

The service processor daemon, already envisage the possibility to recover both from process exceptions, whenever a remote process throws an error for any reason, or potential deadlocks, there are already some configuration variables allowing the administrator to kill a process not sending any feedback between a certain amount of time.

In both cases, the overall load will be updated accordingly, by freeing resources for further executions.

Some additional configuration details

It will be possible, from the service configuration, to instantiate concrete implementations of the “ProcessWeight” class through one of the following methods:

In the case we won’t need to redefine the request_weight(self, exec_request) logic, we can easily ask the Service Processor to instantiate concrete classes from the service_config.properties by just defining the capacity and coefficient values.

process_weight = {weight : 20, coefficient: 1.0}

For more complex cases, where we might want to redefine the request_weight(self, exec_request) dynamically, accordingly to the exec_request parameters, we can ask the Service Processor to instantiate concrete classes through the “introspection” mechanism by defining the class path:

process_weight = “my_service.my_process.MyProcessCapacity”

@afabiani afabiani self-assigned this Nov 11, 2019
afabiani pushed a commit to geosolutions-it/wps-remote that referenced this issue Nov 11, 2019
afabiani pushed a commit that referenced this issue Nov 11, 2019
[Fixes #42] Async Request Management Proposal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant