Skip to content

os error 22 when indexing #370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mmisiewicz opened this issue Dec 12, 2024 · 5 comments
Open

os error 22 when indexing #370

mmisiewicz opened this issue Dec 12, 2024 · 5 comments

Comments

@mmisiewicz
Copy link

On MacOS 15.2, running postgres 16 from home-brew, and building lantern from master, the following inscrutable errors are thrown when using an external index:

[+] [Lantern External Index] New connection: 127.0.0.1:63501
[*] [Lantern External Index] Number of available CPU cores: 20
[*] [Lantern External Index] Index Params - pq: false, metric_kind: Cos, quantization: I8, dim: 768, m: 34, ef_construction: 256, ef: 256, num_subvectors: 0, num_centroids: 0, element_bits: 32
[*] [Lantern External Index] Creating index with parameters dimensions=768 m=34 ef=256 ef_construction=256, hardware_acceleration=serial
[*] [Lantern External Index] Estimated capcity is 38979200
[+] [Lantern External Index] Indexed 3897920 tuples [speed 1110 tuples/s]...
[+] [Lantern External Index] Indexing took 6948s, indexed 7764322 items
[*] [Lantern External Index] Start streaming index
[+] [Lantern External Index] Writing index to file took 12s722ms
[+] [Lantern External Index] Reading index file took 1s662ms
[X] [Lantern External Index] Indexing error: Invalid argument (os error 22)

The corresponding command on the postgres side:

 create index on  embeddings_denormalized using lantern_hnsw (cast(vec as vector(768)) dist_vec_cos_ops) with (m = 34, ef_construction = 256, ef = 256, dim = 768, quant_bits = 8, external=true) where abs(ulid_hash(page_id) % 100) < 20;
INFO:  done init usearch index
INFO:  connecting to external indexing server on 127.0.0.1:8998
INFO:  successfully connected to external indexing server
ERROR:  external index error: Invalid argument (os error 22)
Time: 6963287.590 ms (01:56:03.288)

The partial index (where) has no impact, I simply put this in because the table is large and debugging this problem is a PITA.

Lantern was invoked using:

lantern-cli start-indexing-server --tmp-dir /opt/homebrew/var/

Thinking that this might be gatekeeper related, I tried changing tmp-dir but it had no impact.

@var77
Copy link
Collaborator

var77 commented Dec 12, 2024

Hi @mmisiewicz , thanks for reporting the issue. Can you try the following cases and see which one will make it, so we can try to understand from where the issue is coming.

  1. Indexing on less data (e.g 10k items) with the same parameters. You can create a table from your original table with 10k items CREATE TABLE embeddings_test AS SELECT * FROM embeddings_denormalized LIMIT 10000; and then run the indexing on embeddings_test table.
  2. If the above will fail again, try indexing without scalar quantization on the embeddings_test table: create index on embeddings_denormalized using lantern_hnsw (cast(vec as vector(768)) dist_vec_cos_ops) with (m = 34, ef_construction = 256, ef = 256, dim = 768, external=true);

Also can you share the data type of vec column? If it is REAL[] you can avoid the cast and use dist_cos_ops directly

@mmisiewicz
Copy link
Author

Hey @var77 - vec is a halfvec from the latest pgvector.

Running the indexing on a subset of 10,000, 100,000 and at least 1MM one time worked OK. Beyond that OS Error 22 when running the indexing server on my M1 Ultra.

I observed a similar seeming error running the index request against an indexing server on x86 Linux, OS Error 11, resource temporarily unavailable. Could that be a clue?

Is there a way to increase verbosity to find out where these errors are coming from?

@mmisiewicz
Copy link
Author

the issue also occurs when running lantern as a postgres background worker.

(39242) [local]:5432 mike@mike=# create index on bigtable using lantern_hnsw (cast(vec as vector(768)) dist_vec_cos_ops) with (m = 34, ef_construction = 256, ef = 256, dim = 768, quant_bits = 8, external=true);
INFO:  done init usearch index
INFO:  connecting to external indexing server on 127.0.0.1:8998
INFO:  successfully connected to external indexing server
ERROR:  external index error: Invalid argument (os error 22)
Time: 33025415.155 ms (09:10:25.415)

@mmisiewicz
Copy link
Author

mmisiewicz commented Dec 22, 2024

One more clue... I am able to reproduce the issue using the autotune tool on a table with the embedding stored in a float4[] column, which rules out the index parameters and halfvec being issues I think.

➜  lantern-cli autotune-index --uri 'postgresql://localhost/mike' --table "test_emb_tune" --column "vec" --metric-kind cos  --recall 99 --test-data-size 100000 --k 500
[+] [Lantern Index Autotune] Progress 5%
[+] [Lantern Index Autotune] Progress 15%
[+] [Lantern Index Autotune] Progress 25%
[+] [Lantern Index Autotune] Progress 35%
[+] [Lantern Index Autotune] Progress 45%
[+] [Lantern Index Autotune] Progress 55%
[+] [Lantern Index Autotune] Progress 65%
[+] [Lantern Index Autotune] Progress 70%
[*] [Lantern Index Autotune] ========== Results for job 36978 ==========
[*] [Lantern Index Autotune] result(recall=99.42%, latency=190.9ms, indexing_duration=5s) index_params(m=6, ef=64, efc=32)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=197.3ms, indexing_duration=5s) index_params(m=8, ef=64, efc=40)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=190.7ms, indexing_duration=6s) index_params(m=12, ef=64, efc=48)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=190.8ms, indexing_duration=8s) index_params(m=16, ef=76, efc=60)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=193.2ms, indexing_duration=18s) index_params(m=32, ef=96, efc=96)
[*] [Lantern Index Autotune] result(recall=99.42%, latency=188.1ms, indexing_duration=36s) index_params(m=48, ef=128, efc=128)
[+] [Lantern Index Autotune] Progress 100%
➜  lantern-cli autotune-index --uri 'postgresql://localhost/mike' --table "test_emb_tune" --column "vec" --metric-kind cos  --recall 99 --test-data-size 1000000 --k 10
[+] [Lantern Index Autotune] Progress 5%
[X] [Lantern Index Autotune] db error: ERROR: external index error: Invalid argument (os error 22)

Note that when the table contains 1MM rows, the first create index command tested by the autotuner failed (m = 6).

Test table has a very minimal schema of

(39242) [local]:5432 mike@mike=# \d test_emb_tune
                   Table "test_emb_tune"
 Column  |         Type          | Collation | Nullable | Default
---------+-----------------------+-----------+----------+---------
 page_id | character varying(26) |           |          |
 idx     | integer               |           |          |
 vec     | real[]                |           |          |

@var77
Copy link
Collaborator

var77 commented Dec 23, 2024

Thanks for sharing the details, can you try one more thing as well:

  1. Generate SSL certificate openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/lantern-key.pem -out /tmp/lantern-cert.pem -subj '/C=US/ST=California/L=San Francisco/O=Lantern/CN=lantern.dev'
  2. Run the indexing server using that certificate lantern-cli start-indexing-server --cert /tmp/lantern-cert.pem --key /tmp/lantern-key.pem
  3. Set lantern_extras.external_index_secure=true and retry indexing

Meanwhile I will try to think of a way to get more verbose output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants