Skip to content

Commit bf5e1bd

Browse files
committed
docs: Improve interfaces documentation, in particular field availability and inter-dependencies
1 parent ea65213 commit bf5e1bd

File tree

2 files changed

+97
-62
lines changed

2 files changed

+97
-62
lines changed

docs/config.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ Options
185185
- ``scrapyd.poller.QueuePoller``. When using the default :ref:`application` and :ref:`launcher` values:
186186

187187
- The launcher adds :ref:`max_proc` capacity at startup, and one capacity each time a Scrapy process ends.
188-
- The :ref:`application` starts a timer so that, every :ref:`poll_interval` seconds, a job starts if there's capacity: that is, if the number of Scrapy processes that are running is less than the :ref:`max_proc` value.
188+
- The :ref:`application` starts a timer so that, every :ref:`poll_interval` seconds, jobs start if there's capacity: that is, if the number of Scrapy processes that are running is less than the :ref:`max_proc` value.
189189

190190
- Implement your own, using the ``IPoller`` interface
191191

scrapyd/interfaces.py

+96-61
Original file line numberDiff line numberDiff line change
@@ -3,180 +3,215 @@
33

44
class IEggStorage(Interface):
55
"""
6-
A component that handles storing and retrieving eggs.
6+
A component to store project eggs.
77
"""
88

99
def put(eggfile, project, version):
10-
"""Store the egg (passed in the file object) under the given project and
11-
version"""
10+
"""
11+
Store the egg (a file object), which represents a ``version`` of the ``project``.
12+
"""
1213

1314
def get(project, version=None):
14-
"""Return a tuple (version, file) for the egg matching the specified
15-
project and version. If version is None, the latest version is
16-
returned. If no egg is found for the given project/version (None, None)
17-
should be returned."""
15+
"""
16+
Return ``(version, file)`` for the egg matching the ``project`` and ``version``.
17+
18+
If ``version`` is ``None``, the latest version and corresponding file are returned.
19+
20+
If no egg is found, ``(None, None)`` is returned.
21+
22+
.. tip:: Remember to close the ``file`` when done.
23+
"""
1824

1925
def list(project):
20-
"""Return the list of versions which have eggs stored (for the given
21-
project) in order (the latest version is the currently used)."""
26+
"""
27+
Return all versions of the ``project`` in order, with the latest version last.
28+
"""
2229

2330
def list_projects():
2431
"""
25-
Return the list of projects from the stored eggs.
32+
Return all projects in storage.
2633
2734
.. versionadded:: 1.3.0
2835
Move this logic into the interface and its implementations, to allow customization.
2936
"""
3037

3138
def delete(project, version=None):
32-
"""Delete the egg stored for the given project and version. If should
33-
also delete the project if no versions are left"""
39+
"""
40+
Delete the egg matching the ``project`` and ``version``. Delete the ``project``, if no versions remains.
41+
"""
3442

3543

3644
class IPoller(Interface):
3745
"""
38-
A component that polls for projects that need to run.
46+
A component that tracks capacity for new jobs, and starts jobs when ready.
3947
"""
4048

4149
queues = Attribute(
4250
"""
4351
An object (like a ``dict``) with a ``__getitem__`` method that accepts a project's name and returns its
44-
:py:interface:`spider queue<scrapyd.interfaces.ISpiderQueue>`.
52+
:py:interface:`spider queue<scrapyd.interfaces.ISpiderQueue>` of pending jobs.
4553
"""
4654
)
4755

4856
def poll():
49-
"""Called periodically to poll for projects"""
57+
"""
58+
Called periodically to start jobs if there's capacity.
59+
"""
5060

5161
def next():
52-
"""Return the next message.
62+
"""
63+
Return the next pending job.
5364
54-
It should return a Deferred which will get fired when there is a new
55-
project that needs to run, or already fired if there was a project
56-
waiting to run already.
65+
It should return a Deferred that will be fired when there's capacity, or already fired if there's capacity.
5766
58-
The message is a dict containing (at least):
67+
The pending job is a ``dict`` containing at least the ``_project`` name, ``_spider`` name and ``_job`` ID.
68+
The job ID is unique, at least within the project.
5969
60-
- the name of the project to be run in the ``_project`` key
61-
- the name of the spider to be run in the ``_spider`` key
62-
- a unique identifier for this run in the ``_job`` key
70+
The pending job is later passed to :meth:`scrapyd.interfaces.IEnvironment.get_environment`.
6371
64-
This message will be passed later to :meth:`scrapyd.interfaces.IEnvironment.get_environment`.
72+
.. seealso:: :meth:`scrapyd.interfaces.ISpiderQueue.pop`
6573
"""
6674

6775
def update_projects():
68-
"""Called when projects may have changed, to refresh the available
69-
projects, including at initialization"""
76+
"""
77+
Called when projects may have changed, to refresh the available projects, including at initialization.
78+
"""
7079

7180

7281
class ISpiderQueue(Interface):
82+
"""
83+
A component to store pending jobs.
84+
85+
The ``dict`` keys used by the chosen ``ISpiderQueue`` implementation must match the chosen:
86+
87+
- :ref:`launcher` service (which calls :meth:`scrapyd.interfaces.IPoller.next`)
88+
- :py:interface:`~scrapyd.interfaces.IEnvironment` implementation (see :meth:`scrapyd.interfaces.IPoller.next`)
89+
- :ref:`webservices<config-services>` that schedule, cancel or list pending jobs
90+
"""
91+
7392
def add(name, priority, **spider_args):
7493
"""
75-
Add a spider to the queue given its name a some spider arguments.
76-
77-
This method can return a deferred.
94+
Add a pending job, given the spider ``name``, crawl ``priority`` and keyword arguments, which might include the
95+
``_job`` ID, egg ``_version`` and Scrapy ``settings`` depending on the implementation, with keyword arguments
96+
that are not recognized by the implementation being treated as spider arguments.
7897
7998
.. versionchanged:: 1.3.0
8099
Add the ``priority`` parameter.
81100
"""
82101

83102
def pop():
84-
"""Pop the next message from the queue. The messages is a dict
85-
containing a key ``name`` with the spider name and other keys as spider
86-
attributes.
87-
88-
This method can return a deferred."""
103+
"""
104+
Pop the next pending job. The pending job is a ``dict`` containing the spider ``name``. Depending on the
105+
implementation, other keys might include the ``_job`` ID, egg ``_version`` and Scrapy ``settings``, with
106+
keyword arguments that are not recognized by the receiver being treated as spider arguments.
107+
"""
89108

90109
def list():
91-
"""Return a list with the messages in the queue. Each message is a dict
92-
which must have a ``name`` key (with the spider name), and other optional
93-
keys that will be used as spider arguments, to create the spider.
110+
"""
111+
Return the pending jobs.
94112
95-
This method can return a deferred."""
113+
.. seealso:: :meth:`scrapyd.interfaces.ISpiderQueue.pop`
114+
"""
96115

97116
def count():
98-
"""Return the number of spiders in the queue.
99-
100-
This method can return a deferred."""
117+
"""
118+
Return the number of pending jobs.
119+
"""
101120

102121
def remove(func):
103-
"""Remove all elements from the queue for which func(element) is true,
104-
and return the number of removed elements.
122+
"""
123+
Remove pending jobs for which ``func(job)`` is true, and return the number of removed pending jobss.
105124
"""
106125

107126
def clear():
108-
"""Clear the queue.
109-
110-
This method can return a deferred."""
127+
"""
128+
Remove all pending jobs.
129+
"""
111130

112131

113132
class ISpiderScheduler(Interface):
114133
"""
115-
A component to schedule spider runs.
134+
A component to schedule jobs.
116135
"""
117136

118137
def schedule(project, spider_name, priority, **spider_args):
119138
"""
120-
Schedule a spider for the given project.
139+
Schedule a crawl.
121140
122141
.. versionchanged:: 1.3.0
123142
Add the ``priority`` parameter.
124143
"""
125144

126145
def list_projects():
127-
"""Return the list of available projects"""
146+
"""
147+
Return all projects that can be scheduled.
148+
"""
128149

129150
def update_projects():
130-
"""Called when projects may have changed, to refresh the available
131-
projects, including at initialization"""
151+
"""
152+
Called when projects may have changed, to refresh the available projects, including at initialization.
153+
"""
132154

133155

134156
class IEnvironment(Interface):
135157
"""
136-
A component to generate the environment of crawler processes.
158+
A component to generate the environment of jobs.
159+
160+
The chosen ``IEnvironment`` implementation must match the chosen :ref:`launcher` service.
137161
"""
138162

139163
def get_settings(message):
140164
"""
141165
Return the Scrapy settings to use for running the process.
142166
143-
``message`` is the message received from the :meth:`scrapyd.interfaces.IPoller.next` method.
167+
Depending on the chosen :ref:`launcher`, this would be one of more ``LOG_FILE`` or ``FEEDS``.
144168
145169
.. versionadded:: 1.4.2
146170
Support for overriding Scrapy settings via ``SCRAPY_`` environment variables was removed in Scrapy 2.8.
171+
172+
:param message: the pending job received from the :meth:`scrapyd.interfaces.IPoller.next` method
147173
"""
148174

149175
def get_environment(message, slot):
150-
"""Return the environment variables to use for running the process.
176+
"""
177+
Return the environment variables to use for running the process.
151178
152-
``message`` is the message received from the :meth:`scrapyd.interfaces.IPoller.next` method.
153-
``slot`` is the ``Launcher`` slot where the process will be running.
179+
Depending on the chosen :ref:`launcher`, this would be one of more of ``SCRAPY_PROJECT``,
180+
``SCRAPYD_EGG_VERSION`` or ``SCRAPY_SETTINGS_MODULE``.
181+
182+
:param message: the pending job received from the :meth:`scrapyd.interfaces.IPoller.next` method
183+
:param slot: the :ref:`launcher` slot for tracking the process
154184
"""
155185

156186

157187
class IJobStorage(Interface):
158188
"""
159-
A component that handles storing and retrieving finished jobs.
189+
A component to store finished jobs.
160190
161191
.. versionadded:: 1.3.0
162192
"""
163193

164194
def add(job):
165-
"""Add a finished job in the storage."""
195+
"""
196+
Add a finished job in the storage.
197+
"""
166198

167199
def list():
168200
"""
169-
Return a list of the finished jobs.
201+
Return the finished jobs.
170202
171203
.. seealso:: :meth:`scrapyd.interfaces.IJobStorage.__iter__`
172204
"""
173205

174206
def __len__():
175-
"""Return a number of the finished jobs."""
207+
"""
208+
Return the number of finished jobs.
209+
"""
176210

177211
def __iter__():
178212
"""
179213
Iterate over the finished jobs in reverse order by ``end_time``.
180214
181-
A job has the attributes ``project``, ``spider``, ``job``, ``start_time`` and ``end_time``.
215+
A job has the attributes ``project``, ``spider``, ``job``, ``start_time`` and ``end_time`` and may have the
216+
attributes ``args`` (``scrapy crawl`` CLI arguments) and ``env`` (environment variables).
182217
"""

0 commit comments

Comments
 (0)