Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: How-To Guide - Adding required fields coverage validation #247

Merged
merged 8 commits into from
May 6, 2020
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ["sphinx.ext.autodoc"]
extensions = ["sphinx.ext.autodoc", "sphinx.ext.autosectionlabel"]

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
Expand Down
4 changes: 4 additions & 0 deletions docs/source/getting-started.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _getting-started:

Getting started
===============

Expand Down Expand Up @@ -328,6 +330,8 @@ And then modify the spider code to use the newly defined item:
Now we need to create our schematics model in `validators.py` file that will contain
all the validation rules:

.. _quote-item-validation-schema:

.. code-block:: python

# tutorial/validators.py
Expand Down
7 changes: 7 additions & 0 deletions docs/source/howto/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
“How-to” guides
===============

.. toctree::
:maxdepth: 1

required-fields-coverage-validation
89 changes: 89 additions & 0 deletions docs/source/howto/required-fields-coverage-validation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
How do I add required fields coverage validation?
=================================================

When you enable :ref:`item validation <item-validation>` in your project you can
use *ValidationMonitorMixin* in your monitor, which allows you to perform some extra
checks on your results.

Considering that we have the :ref:`validation schema <quote-item-validation-schema>` from the
:ref:`getting started <getting-started>` section of our documentation, where the **author**
field is required, we want to add a new monitor to ensure that no more than 20% of the items
returned have the **author** not filled.

.. note:: The methods that will be presented next only work to check coverage of fields
that are defined as **required** in your validation schema.

*ValidationMonitorMixin* gives you the *check_missing_required_fields_percent* method,
which receives a list of field names and the maximum percentage allowed not to be
filled. Using that we can create a monitor that enforces our validation rule:

.. code-block:: python

from spidermon import Monitor
from spidermon.contrib.monitors.mixins import ValidationMonitorMixin

class CoverageValidationMonitor(Monitor, ValidationMonitorMixin):

def test_required_fields_with_minimum_coverage(self):
allowed_missing_percentage = 0.2
self.check_missing_required_fields_percent(
field_names=["author"],
allowed_percent=allowed_missing_percentage
)

We also have the option to set an absolute amount of items that we want to allow
not to be filled. This requires us to use the *check_missing_required_fields*
method. The following monitor will fail if more than 10 items returned do not
have the **author** field filled.

.. code-block:: python

class CoverageValidationMonitor(Monitor, ValidationMonitorMixin):

def test_required_fields_with_minimum_coverage(self):
allowed_missing_items = 10
self.check_missing_required_fields(
field_names=["author"],
allowed_count=allowed_missing_items
)

Multiple fields
---------------

What if we want to validate more than one field? There are two different ways, depending on whether you
want to use the same thresholds for both fields or a different one for each field.

Using the same threshold, we just need to pass a list with the field names to the desired
validation method as follows:

.. code-block:: python

class CoverageValidationMonitor(Monitor, ValidationMonitorMixin):

def test_required_fields_with_minimum_coverage(self):
allowed_missing_percentage = 0.2
self.check_missing_required_fields_percent(
field_names=["author", "author_url"],
allowed_percent=allowed_missing_percentage
)

However, if you want a different rule for different fields, you need to create a new
monitor for each field:

.. code-block:: python

class CoverageValidationMonitor(Monitor, ValidationMonitorMixin):

def test_min_coverage_author_field(self):
allowed_missing_percentage = 0.2
self.check_missing_required_fields_percent(
field_names=["author"],
allowed_percent=allowed_missing_percentage
)

def test_min_coverage_author_url_field(self):
allowed_missing_items = 10
self.check_missing_required_fields(
field_names=["author_url"],
allowed_count=allowed_missing_items
)
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,6 @@ Contents
settings
configuring-slack-for-spidermon
configuring-telegram-for-spidermon
howto/index
actions
changelog