Blitzen is a python framework built on top of python's multiprocessing module intended to maximize processor utilization on individual computers and clusters.
pip install blitzen
Blitzen uses a dispatcher
to provide concurrent processing features.
Below are the dispatchers available and their use case.
Multicore dispatcher is similar to multiprocessing.Pool
in that you initialize the dispatcher and can pass it tasks to complete concurrently.
The notable differences are:
- Workers you create with a dispatcher will remain open until you shutdown the dispatcher.
- You can queue multiple tasks, and the dispatcher will complete all of them using the specified number of processes.
- Workers are Exception resistant, meaning if the worker subprocess crashes, the dispatcher will shut down the process and initialize a new one until the specified number of workers are actively monitor incoming tasks.
Example:
import time
import random
from blitzen import MulticoreDispatcher
def f1():
delay = random.randint(3,5)
time.sleep(delay)
print('Finished after', delay, 'secs.')
return delay
def f2(delay):
time.sleep(delay)
raise ValueError('Throwing error')
if __name__ == '__main__':
dispatcher = MulticoreDispatcher(workers=4)
task_id = dispatcher.run(f2, 4)
task_ids = [
dispatcher.run(f1)
for _ in range(5)
]
All tasks are either started or queued immediately upon running the dispatcher.run()
call.
dispatcher.run()
returns the dispatcher's task_id for the task you just passed it. This is used if you want to get specific task results from the dispatcher or tasks results in a specific order.
print('Fetching Results.')
results = dispatcher.get_results(task_ids)
print('Recevied results from dispatcher:', results)
The results are always returned in the order of task_ids
.
Eventually you will have to shutdown the dispatcher to close the underlying worker processes. You can do this two ways.
dispatcher.shutdown()
Terminates all worker subprocesses and deallocates memory resources.dispatcher.join()
returns all results in task order as they were passed to the dispatcher, then callsdispatcher.shutdown()
.
print('Fetching Results.')
results = dispatcher.join()
print('Recevied results from dispatcher:', results)
Critical Note: Once you request a result from the dispatcher, it is removed the dispatchers memory by default. To override this feature use dispatcher.get_results(clear=False)
A full example is visible here.
DistributedDispatcher lets you utilize a cluster for concurrent computing. It will handle packet synchronization between clients, servers, and drivers. It will also log all activity on the clients and servers.
The clients and servers need access to the same functions, so it is likely a good idea to make a common file that will be used to import functions that arent natively in python.
#common.py
import time
def time_consuming_function(delay):
time.sleep(delay)
return delay
from blitzen.utils import get_local_ip
from blitzen.distributed import DistributedDispatcher
from common import time_consuming_function
if __name__ == '__main__':
ip = get_local_ip()
dispatcher = DistributedDispatcher(server_ip=ip)
dispatcher.spawn_server(duration=30) #Run server for 30 seconds
from blitzen.utils import get_local_ip
from blitzen.distributed import DistributedDispatcher
from common import time_consuming_function
if __name__ == '__main__':
ip = '192.168.1.2' #Server IP
dispatcher = DistributedDispatcher(server_ip=ip)
dispatcher.spawn_client(workers=6)
from blitzen.utils import get_local_ip
from blitzen.distributed import DistributedDispatcher
from common import time_consuming_function
if __name__ == '__main__':
ip = '192.168.1.2' #Server IP
dispatcher = DistributedDispatcher(server_ip=ip)
#With DistributedDispatcher you can specify an amount
#of time each client has to finish their task
#with the `timeout` keyword
task_ids = [
dispatcher.run(time_consuming_function, i+5, timeout=7)
for i in range(5)
]
print('Requesting results.')
results = dispatcher.join()
print(results)
DistributedDispatchers provide some logging, using the python logging module, so you can monitor your cluster. By default the logger is configured to report all logging info of level logging.INFO
. To some this can be excessive, so blitzen provides some functions to control the logger.
You can get the logger using
logger = blitzen.logging.get_logger()
This logger is used by all blitzen submodules.
The other blitzen.logging fuctions are:
set_logfile(filename)
lets you set a file to log out to.set_loglevel(level)
lets you set the logging level using the same convention as python'slogging
module.disable()
disabled the logger.enable()
enables the logger.
A full example visible here.