-
Notifications
You must be signed in to change notification settings - Fork 16
Update README #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
1686e1f
Update README
ejulio 752296e
Update README
ejulio 9471679
Update README
ejulio e7420c1
Update README
ejulio 7316e6d
Update README
ejulio 053cd60
Remove maintaining guide from README
ejulio 953a7c8
Fix README example
ejulio 53151c2
Update README.rst
ejulio dc4f08b
Update README
ejulio 34c4f40
Merge
ejulio 0bf3952
Update README
ejulio 78aec7c
Update README
ejulio 7dbdfdf
Update README
ejulio cab95fc
Update README.rst
ejulio 94530e4
Update README.rst
ejulio 53ab961
Update README.rst
ejulio f34f25b
Update README.rst
ejulio File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
=========== | ||
itemloaders | ||
=========== | ||
|
||
.. image:: https://img.shields.io/pypi/v/itemloaders.svg | ||
:target: https://pypi.python.org/pypi/itemloaders | ||
:alt: PyPI Version | ||
|
||
.. image:: https://img.shields.io/pypi/pyversions/itemloaders.svg | ||
:target: https://pypi.python.org/pypi/itemloaders | ||
:alt: Supported Python Versions | ||
|
||
.. image:: https://travis-ci.com/scrapy/itemloaders.svg?branch=master | ||
:target: https://travis-ci.com/scrapy/itemloaders | ||
:alt: Build Status | ||
|
||
.. image:: https://codecov.io/github/scrapy/itemloaders/coverage.svg?branch=master | ||
:target: https://codecov.io/gh/scrapy/itemloaders | ||
:alt: Coverage report | ||
|
||
.. image:: https://readthedocs.org/projects/itemloaders/badge/?version=latest | ||
:target: https://itemloaders.readthedocs.io/en/latest/?badge=latest | ||
:alt: Documentation Status | ||
|
||
|
||
``itemloaders`` is a library that helps you collect data into models. | ||
|
||
It's specially useful when you need to standardize the data from many sources. | ||
For example, it allows you to have all your casting and parsing rules in a | ||
single place. | ||
|
||
Also, it comes in handy to extract data from web pages, as it supports | ||
data extraction using CSS and XPath Selectors. | ||
ejulio marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Here is an example to get you started:: | ||
|
||
from itemloaders import ItemLoader | ||
from parsel import Selector | ||
|
||
html_data = ''' | ||
<!DOCTYPE html> | ||
<html> | ||
<head> | ||
<title>Some random product page</title> | ||
</head> | ||
<body> | ||
<div class="product_name">Some random product page</div> | ||
<p id="price">$ 100.12</p> | ||
</body> | ||
</html> | ||
''' | ||
loader = ItemLoader(selector=Selector(html_data)) | ||
loader.add_xpath('name', '//div[@class="product_name"]/text()') | ||
loader.add_xpath('name', '//div[@class="product_title"]/text()') | ||
loader.add_css('price', '#price::text') | ||
loader.add_value('last_updated', 'today') # you can also use literal values | ||
item = loader.load_item() | ||
item | ||
# {'name': ['Some random product page'], 'price': ['$ 100.12'], 'last_updated': ['today']} | ||
|
||
For more information, check out the `documentation <https://itemloaders.readthedocs.io/en/latest/>`_. | ||
|
||
============ | ||
ejulio marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Contributing | ||
============ | ||
|
||
All contributions are welcome! | ||
|
||
* If you can to review some code, check open | ||
`Pull Requests here <https://github.com/scrapy/itemloaders/pulls>`_ | ||
|
||
* If you want to submit a code change | ||
* File an `issue here <https://github.com/scrapy/itemloaders/issues>`_, | ||
if there isn't one yet | ||
* Fork this repository | ||
* Create a branch to work on your changes | ||
* Push your local branch and submit a Pull Request | ||
Gallaecio marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is a bit confusing to me. Are we suggesting that itemloaders is a general thing, not related to web scraping, which may also come handy for web scraping? Is it really going to be used this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is not restricted to web scraping.
If I want to load a
dict
from a XML source, it could be used, right?Similarly to read from a JSON source or something else..
So, we can have the description related to web scraping or leave it open as a library to standardize the process of extracting/loading data from a source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about explicitly mentioning HTML and XML as the sources of data in the first paragraph, and in the third paragraph replace “comes in handy” with “is specially useful” and move the CSS and XPath part to the first paragraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated @Gallaecio