Skip to content

Commit

Permalink
docs: Copy-edit and clarify entry_points requirement
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Jul 21, 2024
1 parent f172109 commit a6914f5
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 87 deletions.
201 changes: 114 additions & 87 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ Scrapyd-client is a client for Scrapyd_. It provides:

Command line tools:

- ``scrapyd-deploy``, to deploy your project to a Scrapyd server
- ``scrapyd-client``, to interact with your project once deployed
- `scrapyd-deploy`_, to deploy your project to a Scrapyd server
- `scrapyd-client`_, to interact with your project once deployed

Python client:

- ``ScrapydClient``, to interact with Scrapyd within your python code

It is configured using the `Scrapy configuration file`_.

.. _Scrapyd: https://scrapyd.readthedocs.io
.. |PyPI Version| image:: https://img.shields.io/pypi/v/scrapyd-client.svg
:target: https://pypi.org/project/scrapyd-client/
Expand All @@ -28,71 +30,55 @@ Python client:
scrapyd-deploy
--------------

Deploying your project to a Scrapyd server typically involves two steps:

1. Eggifying_ your project. You'll need to install setuptools_ for this. See `Egg Caveats`_ below.
2. Uploading the egg to the Scrapyd server through the `addversion.json`_ endpoint.

The ``scrapyd-deploy`` tool automates the process of building the egg and pushing it to the target
Scrapyd server.

.. _addversion.json: https://scrapyd.readthedocs.org/en/latest/api.html#addversion-json
.. _Eggifying: http://peak.telecommunity.com/DevCenter/PythonEggs
.. _setuptools: https://pypi.python.org/pypi/setuptools

Including Static Files
~~~~~~~~~~~~~~~~~~~~~~

If the egg needs to include static (non-Python) files, edit the ``setup.py`` file in your project.
Otherwise, you can skip this step.
Deploying your project to a Scrapyd server involves:

If you don't have a ``setup.py`` file, create one with::
#. `Eggifying <https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html>`__ your project.
#. Uploading the egg to the Scrapyd server through the `addversion.json <https://scrapyd.readthedocs.org/en/latest/api.html#addversion-json>`__ webservice.

scrapyd-deploy --build-egg=/dev/null
The ``scrapyd-deploy`` tool automates the process of building the egg and pushing it to the target Scrapyd server.

Then, set the ``package_data`` keyword argument in the ``setup()`` function call in the
``setup.py`` file. Example (note: ``projectname`` would be your project's name):
Deploying a project
~~~~~~~~~~~~~~~~~~~

.. code-block:: python
#. Change (``cd``) to the root of your project (the directory containing the ``scrapy.cfg`` file)
#. Eggify your project and upload it to the target:

from setuptools import setup, find_packages
.. code-block:: shell
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = projectname.settings']},
package_data = {'projectname': ['path/to/*.json']}
)
scrapyd-deploy <target> -p <project>
Deploying a Project
~~~~~~~~~~~~~~~~~~~
If you don't have a ``setup.py`` file in the root of your project, one will be created. If you have one, it must set the ``entry_points`` keyword argument in the ``setup()`` function call, for example:

First ``cd`` into your project's root, you can then deploy your project with the following::
.. code-block:: python
:emphasize-lines: 5
scrapyd-deploy <target> -p <project>
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = projectname.settings']},
)
This will eggify your project and upload it to the target. If you have a ``setup.py`` file in your
project, it will be used, otherwise one will be created automatically.
If the command is successful, you should see a JSON response, like:

If successful you should see a JSON response similar to the following::
.. code-block:: none
Deploying myproject-1287453519 to http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "spiders": ["spider1", "spider2"]}
To save yourself from having to specify the target and project, you can set the defaults in the
`Scrapy configuration file`_.
To save yourself from having to specify the target and project, you can configure your defaults in the `Scrapy configuration file`_.

Versioning
~~~~~~~~~~

By default, ``scrapyd-deploy`` uses the current timestamp for generating the project version, as
shown above. However, you can pass a custom version using ``--version``::
By default, ``scrapyd-deploy`` uses the current timestamp for generating the project version. You can pass a custom version using ``--version``:

.. code-block:: shell
scrapyd-deploy <target> -p <project> --version <version>
The version must be comparable with `Version <https://github.com/scrapy/scrapyd/issues/426>`__. Scrapyd will use the greatest version unless specified.
See `Scrapyd's documentation <https://scrapyd.readthedocs.io/en/latest/overview.html>`__ on how it determines the latest version.

If you use Mercurial or Git, you can use ``HG`` or ``GIT`` respectively as the argument supplied to
``--version`` to use the current revision as the version. You can save yourself having to specify
Expand All @@ -104,76 +90,119 @@ the version parameter by adding it to your target's entry in ``scrapy.cfg``:
...
version = HG
Local Settings
Note: The ``version`` keyword argument in the ``setup()`` function call in the ``setup.py`` file has no meaning to Scrapyd.

Include dependencies
~~~~~~~~~~~~~~~~~~~~

#. Create a `requirements.txt <https://pip.pypa.io/en/latest/reference/requirements-file-format/>`__ file at the root of your project, alongside the ``scrapy.cfg`` file
#. Use the ``--include-dependencies`` option when building or deploying your project:

.. code-block:: bash
scrapyd-deploy --include-dependencies
Alternatively, you can install the dependencies directly on the Scrapyd server.

Include data files
~~~~~~~~~~~~~~~~~~

#. Create a ``setup.py`` file at the root of your project, alongside the ``scrapy.cfg`` file, if you don't have one:

.. code-block:: shell
scrapyd-deploy --build-egg=/dev/null
#. Set the ``package_data`` and ``include_package_data` keyword arguments in the ``setup()`` function call in the ``setup.py`` file. For example:

.. code-block:: python
:emphasize-lines: 8-9
from setuptools import setup, find_packages
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = projectname.settings']},
package_data = {'projectname': ['path/to/*.json']},
include_package_data = True,
)
Local settings
~~~~~~~~~~~~~~

You may want to keep certain settings local and not have them deployed to Scrapyd. To accomplish
this you can create a ``local_settings.py`` file at the root of your project, where your
``scrapy.cfg`` file resides, and add the following to your project's settings:
You may want to keep certain settings local and not have them deployed to Scrapyd.

.. code-block:: python
#. Create a ``local_settings.py`` file at the root of your project, alongside the ``scrapy.cfg`` file
#. Add the following to your project's settings file:

.. code-block:: python
try:
from local_settings import *
except ImportError:
pass
try:
from local_settings import *
except ImportError:
pass
``scrapyd-deploy`` doesn't deploy anything outside of the project module, so the
``local_settings.py`` file won't be deployed.
``scrapyd-deploy`` doesn't deploy anything outside of the project module, so the ``local_settings.py`` file won't be deployed.

Egg Caveats
~~~~~~~~~~~
Troubleshooting
~~~~~~~~~~~~~~~

Some things to keep in mind when building eggs for your Scrapy project:
- Problem: A settings file for local development is being included in the egg.

- Make sure no local development settings are included in the egg when you build it. The
``find_packages`` function may be picking up your custom settings. In most cases you want to
upload the egg with the default project settings.
- Avoid using ``__file__`` in your project code as it doesn't play well with eggs.
Consider using `pkgutil.get_data`_ instead. Instead of:
Solution: See `Local settings`_. Or, exclude the module from the egg. If using scrapyd-client's default ``setup.py`` file, change the ``find_package()`` call:

.. code-block:: python
:emphasize-lines: 4
path = os.path.dirname(os.path.realpath(__file__)) # BAD
open(os.path.join(path, "tools", "json", "test.json"), "rb").read()
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = projectname.settings']},
)
Use:
to:

.. code-block:: python
:emphasize-lines: 4
import pkgutil
pkgutil.get_data("projectname", "tools/json/test.json")
setup(
name = 'project',
version = '1.0',
packages = find_packages(exclude=["myproject.devsettings"]),
entry_points = {'scrapy': ['settings = projectname.settings']},
)
- Be careful when writing to disk in your project, as Scrapyd will most likely be running under a
different user which may not have write access to certain directories. If you can, avoid writing
to disk and always use tempfile_ for temporary files.
- Problem: Code using ``__file__`` breaks when run in Scrapyd.

.. _pkgutil.get_data: https://docs.python.org/library/pkgutil.html#pkgutil.get_data
.. _tempfile: https://docs.python.org/library/tempfile.html
Solution: Use `pkgutil.get_data <https://docs.python.org/library/pkgutil.html#pkgutil.get_data>`__ instead. For example, change:

Including dependencies
~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
path = os.path.dirname(os.path.realpath(__file__)) # BAD
open(os.path.join(path, "tools", "json", "test.json"), "rb").read()
If your project has additional dependencies, you can either install them on the Scrapyd server, or
you can include them in the project's egg, in two steps:
to:

- Create a `requirements.txt`_ file at the root of the project
- Use the ``--include-dependencies`` option when building or deploying your project::
.. code-block:: python
scrapyd-deploy --include-dependencies
import pkgutil
pkgutil.get_data("projectname", "tools/json/test.json")
.. _requirements.txt: https://pip.pypa.io/en/latest/reference/requirements-file-format/
- Be careful when writing to disk in your project, as Scrapyd will most likely be running under a
different user which may not have write access to certain directories. If you can, avoid writing
to disk and always use `tempfile <https://docs.python.org/library/tempfile.html>`__ for temporary files.

scrapyd-client
--------------

For a reference on each subcommand invoke ``scrapyd-client <subcommand> --help``.

Where filtering with wildcards is possible, it is facilitated with fnmatch_.
Where filtering with wildcards is possible, it is facilitated with `fnmatch <https://docs.python.org/library/fnmatch.html>`__.
The ``--project`` option can be omitted if one is found in a ``scrapy.cfg``.

.. _fnmatch: https://docs.python.org/library/fnmatch.html

deploy
~~~~~~

Expand Down Expand Up @@ -292,10 +321,8 @@ To list all available projects on one target, use the ``-L`` option::
scrapyd-deploy -L example

While your target needs to be defined with its URL in ``scrapy.cfg``,
you can use netrc_ for username and password, like so::
you can use `netrc <https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html>`__ for username and password, like so::

machine scrapyd.example.com
login scrapy
password secret

.. _netrc: https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
install_requires=[
"uberegg>=0.1.1",
"requests",
"setuptools",
"scrapy>=0.17",
"urllib3",
"w3lib",
Expand Down

0 comments on commit a6914f5

Please sign in to comment.