Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More details in the Jobs table #39

Closed
Blender3D opened this issue Mar 7, 2014 · 3 comments
Closed

More details in the Jobs table #39

Blender3D opened this issue Mar 7, 2014 · 3 comments

Comments

@Blender3D
Copy link

The Jobs table in the web interface is really bare. The Scrapy stats collector contains a lot of valuable data, which should be included in this table.

I see a few ways of accessing this data:

  1. Parsing logs. This seems like unnecessary work and will only give access to crawl statistics after a crawl has finished.
  2. Subclassing CrawlerProcess, overriding methods that start/stop the reactor, thus removing the need to launch scrapyd.runner as a separate process. This gives us direct access to crawler.stats.get_stats() and gives the added benefit of using only one reactor to run multiple crawls.
  3. Using scrapy.contrib.webservice.stats.StatsResource. This doesn't rely on an unstable API (unlike 2), but will force us to parse log files to determine the webservice port.

Scrapyd needs some useful upgrades aside from a prettier UI. Scheduling periodic crawls, queues, retrying, etc. They don't seem difficult to implement, but I don't have the time to do this myself and don't know if the community even has interest in Scrapyd.

Thoughts?

@Digenis
Copy link
Member

Digenis commented Mar 8, 2014

I wouldn't favour parsing logs, at least not by default. They may spawn big files and delay the rendering of the job table. The builtin provides enough functionality already but I understand the need for more features, I myself once ended up patching it locally because I could find any documentation on custom resource classes. Maybe more extensive documentation on resource classes and a contrib/ package would unleash the creativity of even more users and their useful ideas without cluttering the builtins. I think the community has interest in scrapyd, it just takes much more to get involved without detailed documentation to start from (compared to scrapy's doc)

@jayzeng
Copy link
Contributor

jayzeng commented Jul 4, 2014

I do agree scrapyd needs more powerful features for different needs, but adding more features adds unnecessary overheads for those who needs the absolute minimal. I think we need to think about adding plugins and expose/manage them through a settings file, and/or web ui.

@jpmckinney
Copy link
Contributor

Closing as this feature request has not attracted additional interest since 2014.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants