Skip to content

Background tasks

Andreas Backström edited this page Jul 26, 2024 · 4 revisions

⚠️ DRAFT

Background

Some tasks that we need to run will be too big for the usual API service to handle and as they may take significant time we also cannot allow these types of tasks to hang the API service while it is waiting to finish the task. For this we will need to run the task in the background and preferably on a separate service. Main use case right now for these sort of background tasks will be to use as a core for importing data from other similar services to BookLogr. As well as exporting your own data from BookLogr into more usable formats, csv and html files.

Design choices

  • Use the same database as the API service (Postgres).
  • Have the ability to run several workers on different servers (but not require it).

Database table

tablename = tasks

name type
id integer (primary)
type string (NOT NULL)
data string (NOT NULL)
status string (default=fresh)
worker string (NULL)
created_at datetime (default=servertime)
updated_on datetime (NULL)
created_by integer (NOT NULL)

Explanation of table columns

type

what type of background task it is, this will be used to determine what worker needs to pick it up.

data

payload data to worker. Can be anything, preferably json formatted. on import workers this can be the actual data to import, to a csv file or api key. The owner_id who will have the books imported into will also have to be in the payload. Each type of worker can set it's own payload format.

status

the status of the task. At first this status is set to fresh. And fresh will indicate to workers that the task needs to be picked up. when a worker picks up the task the status gets changed to started. when a worker finished a task successfully the status gets changed to success. if the task fails the status gets changed to failure.

worker

will be a unique ID to identify the worker that picked up the task. this ID is set on the worker in the following format "nickname:UUID" nickname is optional. In case there is no nickname set, only "UUID" will be used as unique identifier.

created_on

date and time when the task was created.

updated_on

date and time when the task was updated. Updating will be done by the worker.

created_by

the ID of the user who created the task.

on background task create, insert type, data and created_by into database. Get the returning ID and NOTIFY on the 'task_created' with the ID as the payload. workers will LISTEN to the 'task_created'. The worker will then do a quick lookup in the database table with the ID in the NOTIFY payload. if the type matches the workers type, the worker will change the status to "started", set worker to it's own ID and change updated_on to the current datetime stamp.

if the worker sucessfully finished the task without any errors. The worker will set the status to "success" and change updated_on accordingly. if the worker encountered any errors during it's work, it should stop and set the status to "failure" and change updated_on accordingly.

there is no callback from the worker to the database or to the API. The user will need to poll the API with the task ID to get it's current status.

Routes needed

  • POST /v1/tasks create task
  • GET /v1/tasks/<id> get task status by ID
  • POST /v1/tasks/<id>/retry retry previously failed task

Export data worker

A export data worker will be the first POC task worker to be made. It will follow the outlined standard above and create a csv file of all books for the given user. This csv file will then be saved to a location that is shared between the API and worker services. To accomplish this we will use Docker shared volume concept so the same directory is mounted for both services.

A database table called csv_export will have to be created. The worker will once the csv export is completed insert a row into this table. The existence of a row will be an indicator that there is data for the user to download.

Database table

tablename = csv_export

name type
id integer (primary)
filename string (NOT NULL)
owner_id integer (NOT NULL)
created_at datetime (default=servertime)

A worker shall never be directly exposed to the end user. All communcation should go through the API. This does however require modification to the API for some workers that will be created.