Skip to content

Commit

Permalink
docs: Document the various job objects
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Jul 24, 2024
1 parent 170c59e commit aa40ad8
Show file tree
Hide file tree
Showing 2 changed files with 82 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/contributing/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Interfaces
.. automodule:: scrapyd.interfaces
:members:
:undoc-members:
:special-members:

Config
------
Expand Down
81 changes: 81 additions & 0 deletions docs/contributing/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,84 @@ SCRAPY_SETTINGS_MODULE
The Python path to the `settings <https://docs.scrapy.org/en/latest/topics/settings.html#designating-the-settings>`__ module of the project.

This is usually the module from the `entry points <https://setuptools.pypa.io/en/latest/userguide/entry_point.html>`__ of the egg, but can be the module from the ``[settings]`` section of a :ref:`scrapy.cfg<config-settings>` file. See ``scrapyd/environ.py``.

Jobs
~~~~

A **pending job** is a ``dict`` object (referred to as a "message"), accessible via an :py:interface:`~scrapyd.interfaces.ISpiderQueue`'s :meth:`~scrapyd.interfaces.ISpiderQueue.pop` or :meth:`~scrapyd.interfaces.ISpiderQueue.list` methods.

.. note:: The short-lived message returned by :py:interface:`~scrapyd.interfaces.IPoller`'s :meth:`~scrapyd.interfaces.IPoller.poll` method is also referred to as a "message".

- The :ref:`schedule.json` webservice calls :py:interface:`~scrapyd.interfaces.ISpiderScheduler`'s :meth:`~scrapyd.interfaces.ISpiderScheduler.schedule` method. The ``SpiderScheduler`` implementation of :meth:`~scrapyd.interfaces.ISpiderScheduler.schedule` adds the message to the project's :py:interface:`~scrapyd.interfaces.ISpiderQueue`.
- The default :ref:`application` sets a `TimerService <https://docs.twisted.org/en/stable/api/twisted.application.internet.TimerService.html>`__ to call :py:interface:`~scrapyd.interfaces.IPoller`'s :meth:`~scrapyd.interfaces.IPoller.poll` method, at :ref:`poll_interval`.
- :py:interface:`~scrapyd.interfaces.IPoller` has a :attr:`~scrapyd.interfaces.IPoller.queues` attribute, that implements a ``__getitem__`` method to get a project's :py:interface:`~scrapyd.interfaces.ISpiderQueue` by project name.
- The ``QueuePoller`` implementation of :meth:`~scrapyd.interfaces.IPoller.poll` calls a project's :py:interface:`~scrapyd.interfaces.ISpiderQueue`'s :meth:`~scrapyd.interfaces.ISpiderQueue.pop` method, adds a ``_project`` key to the message and renames the ``name`` key to ``_spider``, and fires a callback.
- The ``Launcher`` service had added the callback to the `Deferred <https://docs.twisted.org/en/stable/core/howto/defer.html>`__, which had been returned by :py:interface:`~scrapyd.interfaces.IPoller`'s :meth:`~scrapyd.interfaces.IPoller.next` method.
- The ``Launcher`` service adapts the message to instantiate a ``ScrapyProcessProtocol`` (`ProcessProtocol <https://docs.twisted.org/en/stable/api/twisted.internet.protocol.ProcessProtocol.html>`__) object, adds a callback, and `spawns a process <https://docs.twisted.org/en/stable/core/howto/process.html>`__.

A **running job** is a ``ScrapyProcessProtocol`` object, accessible via ``Launcher.processes`` (a ``dict``), in which each key is a slot's number (an ``int``).

- ``Launcher`` has a ``finished`` attribute, which is an :py:interface:`~scrapyd.interfaces.IJobStorage`.
- When the process ends, the callback fires. The ``Launcher`` service calls :py:interface:`~scrapyd.interfaces.IJobStorage`'s :meth:`~scrapyd.interfaces.IJobStorage.add` method, passing the ``ScrapyProcessProtocol`` as input.

A **finished job** is an object with the attributes ``project``, ``spider``, ``job``, ``start_time`` and ``end_time``, accessible via an :py:interface:`~scrapyd.interfaces.IJobStorage`'s :meth:`~scrapyd.interfaces.IJobStorage.list` or :meth:`~scrapyd.interfaces.IJobStorage.__iter__` methods.

.. list-table::
:header-rows: 1
:stub-columns: 1

* - Concept
- ISpiderQueue
- IPoller
- ScrapyProcessProtocol
- Job
* - Project
- *not specified*
- _project
- project
- project
* - Spider
- name
- _spider
- spider
- spider
* - Job ID
- _job
- _job
- job
- job
* - Egg version
- _version
- _version
- ✗
- ✗
* - Scrapy settings
- settings
- settings
- args (``-s k=v``)
- ✗
* - Spider arguments
- *remaining keys*
- *remaining keys*
- args (``-a k=v``)
- ✗
* - Environment variables
- ✗
- ✗
- env
- ✗
* - Process ID
- ✗
- ✗
- pid
- ✗
* - Start time
- ✗
- ✗
- start_time
- start_time
* - End time
- ✗
- ✗
- end_time
- end_time

0 comments on commit aa40ad8

Please sign in to comment.