Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapyd w/ docker is not persisting *.db file #251

Closed
ivandir opened this issue Oct 14, 2017 · 5 comments
Closed

Scrapyd w/ docker is not persisting *.db file #251

ivandir opened this issue Oct 14, 2017 · 5 comments
Labels
type: question a user support question

Comments

@ivandir
Copy link

ivandir commented Oct 14, 2017

Hi,

I thought the *.db associated with my project was supposed to persist in my local docker volume but it seems scrapyd overrides the project sqlite3 db file. I have jobs that I haven't processed yet and when I restart the scrapyd instance, the jobs disappear (i.e no persistence) due to a new db file being created.

Is this the intended behavior on docker deployments?

@Digenis Digenis added the status: insufficient info a maintainer has asked for more information from the reporter label Oct 16, 2017
@Digenis
Copy link
Member

Digenis commented Oct 16, 2017

@ivandir,
scrapyd is expected to reuse the sqlite3 files when they already exist.

Did you configure a directory for the db files?
If you don't, scrapyd uses an in-memory database.

By jobs that haven't processed yet, you mean jobs waiting in the pending state, right?
Currently there's no feature to persist running or finished jobs.

By "restart the scrapyd instance" do you mean the whole container?

@santteegt
Copy link

Indeed, spider jobs don't persist after restarting the docker container. Here's my config file:

[scrapyd]
eggs_dir          = /var/lib/scrapyd/eggs
logs_dir          = /var/lib/scrapyd/logs
items_dir         = /var/lib/scrapyd/items
dbs_dir           = /var/lib/scrapyd/dbs
jobs_to_keep      = 5
max_proc          = 0
max_proc_per_cpu  = 4
finished_to_keep  = 100
poll_interval     = 5
bind_address      = 0.0.0.0
http_port         = 6800
debug             = off
runner            = scrapyd.runner
application       = scrapyd.app.application
launcher          = scrapyd.launcher.Launcher

[services]
schedule.json     = scrapyd.webservice.Schedule
cancel.json       = scrapyd.webservice.Cancel
addversion.json   = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json  = scrapyd.webservice.ListSpiders
delproject.json   = scrapyd.webservice.DeleteProject
delversion.json   = scrapyd.webservice.DeleteVersion
listjobs.json     = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus

And an example of how I'm deploying it using docker-compose:

scrapyd:
    build: ./scrapyd
    networks:
      default:
        ipv4_address: 10.255.4.15
    ports:
      - "6800:6800"
    volumes:
      - ./scrapyd_data:/var/lib/scrapyd
    restart: always

A file called <project_name>.db is created in the right directory, however, when restarting the container, all spider jobs disappear from the list

@ivandir
Copy link
Author

ivandir commented Aug 31, 2020

Any updates on this ticket?

@Digenis Digenis removed the status: insufficient info a maintainer has asked for more information from the reporter label Apr 13, 2021
@Digenis
Copy link
Member

Digenis commented Apr 13, 2021

The "db files" store spider queues.

What exactly disappears?
The queue of spiders not yet ran
or the list of finished spiders?

If it's the first and your docker volume mount point is correctly configured then it's a bug.
If it's the second, it's planned for scrapyd-1.3: #359

@jpmckinney jpmckinney added the type: question a user support question label Sep 23, 2021
@jpmckinney
Copy link
Contributor

This sounds like #359, not about Docker, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question a user support question
Projects
None yet
Development

No branches or pull requests

4 participants