|
| 1 | +=========== |
| 2 | +itemloaders |
| 3 | +=========== |
| 4 | + |
| 5 | +.. image:: https://img.shields.io/pypi/v/itemloaders.svg |
| 6 | + :target: https://pypi.python.org/pypi/itemloaders |
| 7 | + :alt: PyPI Version |
| 8 | + |
| 9 | +.. image:: https://img.shields.io/pypi/pyversions/itemloaders.svg |
| 10 | + :target: https://pypi.python.org/pypi/itemloaders |
| 11 | + :alt: Supported Python Versions |
| 12 | + |
| 13 | +.. image:: https://travis-ci.com/scrapy/itemloaders.svg?branch=master |
| 14 | + :target: https://travis-ci.com/scrapy/itemloaders |
| 15 | + :alt: Build Status |
| 16 | + |
| 17 | +.. image:: https://codecov.io/github/scrapy/itemloaders/coverage.svg?branch=master |
| 18 | + :target: https://codecov.io/gh/scrapy/itemloaders |
| 19 | + :alt: Coverage report |
| 20 | + |
| 21 | +.. image:: https://readthedocs.org/projects/itemloaders/badge/?version=latest |
| 22 | + :target: https://itemloaders.readthedocs.io/en/latest/?badge=latest |
| 23 | + :alt: Documentation Status |
| 24 | + |
| 25 | + |
| 26 | +``itemloaders`` is a library that helps you collect data from HTML and XML sources. |
| 27 | + |
| 28 | +It comes in handy to extract data from web pages, as it supports |
| 29 | +data extraction using CSS and XPath Selectors. |
| 30 | + |
| 31 | +It's specially useful when you need to standardize the data from many sources. |
| 32 | +For example, it allows you to have all your casting and parsing rules in a |
| 33 | +single place. |
| 34 | + |
| 35 | +Here is an example to get you started:: |
| 36 | + |
| 37 | + from itemloaders import ItemLoader |
| 38 | + from parsel import Selector |
| 39 | + |
| 40 | + html_data = ''' |
| 41 | + <!DOCTYPE html> |
| 42 | + <html> |
| 43 | + <head> |
| 44 | + <title>Some random product page</title> |
| 45 | + </head> |
| 46 | + <body> |
| 47 | + <div class="product_name">Some random product page</div> |
| 48 | + <p id="price">$ 100.12</p> |
| 49 | + </body> |
| 50 | + </html> |
| 51 | + ''' |
| 52 | + loader = ItemLoader(selector=Selector(html_data)) |
| 53 | + loader.add_xpath('name', '//div[@class="product_name"]/text()') |
| 54 | + loader.add_xpath('name', '//div[@class="product_title"]/text()') |
| 55 | + loader.add_css('price', '#price::text') |
| 56 | + loader.add_value('last_updated', 'today') # you can also use literal values |
| 57 | + item = loader.load_item() |
| 58 | + item |
| 59 | + # {'name': ['Some random product page'], 'price': ['$ 100.12'], 'last_updated': ['today']} |
| 60 | + |
| 61 | +For more information, check out the `documentation <https://itemloaders.readthedocs.io/en/latest/>`_. |
| 62 | + |
| 63 | +Contributing |
| 64 | +============ |
| 65 | + |
| 66 | +All contributions are welcome! |
| 67 | + |
| 68 | +* If you want to review some code, check open |
| 69 | + `Pull Requests here <https://github.com/scrapy/itemloaders/pulls>`_ |
| 70 | + |
| 71 | +* If you want to submit a code change |
| 72 | + |
| 73 | + * File an `issue here <https://github.com/scrapy/itemloaders/issues>`_, if there isn't one yet |
| 74 | + * Fork this repository |
| 75 | + * Create a branch to work on your changes |
| 76 | + * Push your local branch and submit a Pull Request |
0 commit comments