Skip to content

Update README #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 26, 2020
3 changes: 0 additions & 3 deletions README.md

This file was deleted.

77 changes: 77 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
===========
itemloaders
===========

.. image:: https://img.shields.io/pypi/v/itemloaders.svg
:target: https://pypi.python.org/pypi/itemloaders
:alt: PyPI Version

.. image:: https://img.shields.io/pypi/pyversions/itemloaders.svg
:target: https://pypi.python.org/pypi/itemloaders
:alt: Supported Python Versions

.. image:: https://travis-ci.com/scrapy/itemloaders.svg?branch=master
:target: https://travis-ci.com/scrapy/itemloaders
:alt: Build Status

.. image:: https://codecov.io/github/scrapy/itemloaders/coverage.svg?branch=master
:target: https://codecov.io/gh/scrapy/itemloaders
:alt: Coverage report

.. image:: https://readthedocs.org/projects/itemloaders/badge/?version=latest
:target: https://itemloaders.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status


``itemloaders`` is a library that helps you collect data into models.

It's specially useful when you need to standardize the data from many sources.
For example, it allows you to have all your casting and parsing rules in a
single place.

Also, it comes in handy to extract data from web pages, as it supports
data extraction using CSS and XPath Selectors.
Copy link
Member

@kmike kmike May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is a bit confusing to me. Are we suggesting that itemloaders is a general thing, not related to web scraping, which may also come handy for web scraping? Is it really going to be used this way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is not restricted to web scraping.
If I want to load a dict from a XML source, it could be used, right?
Similarly to read from a JSON source or something else..

So, we can have the description related to web scraping or leave it open as a library to standardize the process of extracting/loading data from a source

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about explicitly mentioning HTML and XML as the sources of data in the first paragraph, and in the third paragraph replace “comes in handy” with “is specially useful” and move the CSS and XPath part to the first paragraph?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated @Gallaecio


Here is an example to get you started::

from itemloaders import ItemLoader
from parsel import Selector

html_data = '''
<!DOCTYPE html>
<html>
<head>
<title>Some random product page</title>
</head>
<body>
<div class="product_name">Some random product page</div>
<p id="price">$ 100.12</p>
</body>
</html>
'''
loader = ItemLoader(selector=Selector(html_data))
loader.add_xpath('name', '//div[@class="product_name"]/text()')
loader.add_xpath('name', '//div[@class="product_title"]/text()')
loader.add_css('price', '#price::text')
loader.add_value('last_updated', 'today') # you can also use literal values
item = loader.load_item()
item
# {'name': ['Some random product page'], 'price': ['$ 100.12'], 'last_updated': ['today']}

For more information, check out the `documentation <https://itemloaders.readthedocs.io/en/latest/>`_.

============
Contributing
============

All contributions are welcome!

* If you can to review some code, check open
`Pull Requests here <https://github.com/scrapy/itemloaders/pulls>`_

* If you want to submit a code change
* File an `issue here <https://github.com/scrapy/itemloaders/issues>`_,
if there isn't one yet
* Fork this repository
* Create a branch to work on your changes
* Push your local branch and submit a Pull Request
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from setuptools import setup, find_packages

with open('README.md') as f:
with open('README.rst') as f:
long_description = f.read()

setup(
Expand Down