Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: sqlite3.OperationalError: attempt to write a readonly database #1441

Closed
peachkeel opened this issue Nov 30, 2023 · 11 comments
Closed
Labels
bug Something isn't working

Comments

@peachkeel
Copy link

peachkeel commented Nov 30, 2023

What happened?

When adding embeddings to Chroma via Ragna, a RAG orchestration framework, the above error about an attempt to write to a readonly database was thrown. The error seems to originate from within the chromadb package.

This bug was originally reported to Ragna as issue Quansight/ragna#190 and @pmeier, the maintainer there, suggested the issue be upstreamed here.

Versions

Chroma v0.4.15, Python 3.10.12, Ubuntu 22.04.3 LTS

Relevant log output

ERROR:huey:Unhandled exception in task fef3ca77-d786-4459-a3da-a522175cf737.
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/huey/api.py", line 382, in _execute
    task_value = task.execute()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/huey/api.py", line 807, in execute
    return func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ragna/core/_queue.py", line 43, in execute
    return fn(self, *args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ragna/source_storages/_chroma.py", line 67, in store
    collection.add(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 100, in add
    self._client._add(ids, self.id, embeddings, metadatas, documents)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 127, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/segment.py", line 328, in _add
    self._validate_embedding_record(coll, r)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 127, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/segment.py", line 716, in _validate_embedding_record
    self._validate_dimension(collection, len(record["embedding"]), update=True)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 127, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/api/segment.py", line 728, in _validate_dimension
    self._sysdb.update_collection(id=id, dimension=dim)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 127, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/chromadb/db/mixins/sysdb.py", line 600, in update_collection
    result = cur.execute(sql, params)
sqlite3.OperationalError: attempt to write a readonly database
INFO:huey.consumer:All workers have stopped.
INFO:huey.consumer:Consumer exiting.
@pmeier
Copy link

pmeier commented Nov 30, 2023

To make this issue more self-contained:

@HammadB
Copy link
Collaborator

HammadB commented Nov 30, 2023

Are you running multiple processes / is some other process accessing chroma?

@peachkeel
Copy link
Author

Good question, @HammadB. This error does have the feel of a race condition, and it is entirely possible there were, what Ragna terms, multiple workers running at the same time. Maybe @pmeier can chime in here and give a little more architectural detail about Ragna because I'm not familiar with all of the internals.

@pmeier
Copy link

pmeier commented Nov 30, 2023

Are you running multiple processes / is some other process accessing chroma?

That could be the case. Depending on the setup, we might have ended up by a single Chroma client that is used by multiple threads. @peachkeel do you remember if you were running with multiple worker threads? If you still have access to the logs, scroll to the very top. There should be a log message on how many threads you started.

@peachkeel
Copy link
Author

I'm afraid those logs are long gone, @pmeier 😕

We haven't seen the error recently, but we're usually running in UI mode (single worker).

@tazarov
Copy link
Contributor

tazarov commented Dec 1, 2023

@pmeier, @peachkeel,

Chroma is thread-safe, so long that you create a single client and pass on the ref to a number of threads, you shouldn't run into this problem.

Issues start when you have multiple clients, e.g. you instantiate a new client in each thread, or you have multiple processes, each creating a separate Chroma client.

@pmeier
Copy link

pmeier commented Dec 1, 2023

In that case, I think Ragna is not at fault. Unless @peachkeel actually spawned multiple worker processes, i.e. invoking ragna worker multiple times, we should be fine. Running ragna worker -n 4 will create one Chroma client that 4 threads will use.

@peachkeel
Copy link
Author

In that case, I think Ragna is not at fault. Unless @peachkeel actually spawned multiple worker processes, i.e. invoking ragna worker multiple times, we should be fine. Running ragna worker -n 4 will create one Chroma client that 4 threads will use.

Well, @pmeier & @tazarov, if that's the case, I think it is likely that I'm at fault:

Quansight/ragna#176 (comment)

FYI, ragna worker --num-threads 4 && ragna api && ragna ui blocks as && waits for each process to finish. I tried using single ampersands (i.e., ragna worker --num-threads 4 & ragna api & ragna ui) to background things but the different processes seem to want to step on each other (e.g., a total of 5 threads are started on the queue in two different process, contention for ports, etc.) and I got scared.

I'm pretty sure the sqlite3.OperationalError happened when I was playing around with the above to speed up the system.

Should Ragna create a lock file or pid file to prevent multiple processes from running with independent worker threads?

@pmeier
Copy link

pmeier commented Dec 1, 2023

Should Ragna create a lock file or pid file to prevent multiple processes from running with independent worker threads?

No need as we have eliminated the task queue / worker: Quansight/ragna#205. SO if this indeed happened, because we had multiple clients from different processes, than this will not happen again.

@peachkeel
Copy link
Author

I'm going to close the issue in both repositories. Thanks everyone!

@HammadB
Copy link
Collaborator

HammadB commented Dec 1, 2023

For posterities sake:

  1. Chroma is thread safe, you can access a client from multiple threads
  2. You can create many clients in the same process and access them all from multiple threads. A correction to what @tazarov said above.
  3. Chroma is not process safe. You cannot access the same persistent path from multiple processes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants