Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark][cluster] queryNode panic in enabled all mmap params dql & ddl scene #38604

Open
1 task done
wangting0128 opened this issue Dec 20, 2024 · 0 comments
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@wangting0128
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20241219-3d360c06-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar   
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc124
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-hpxxf

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-hpxxf-26-4547-etcd-0                                      1/1     Running     0               17h     10.104.19.8     4am-node28   <none>           <none>
fouramf-hpxxf-26-4547-etcd-1                                      1/1     Running     0               17h     10.104.25.203   4am-node30   <none>           <none>
fouramf-hpxxf-26-4547-etcd-2                                      1/1     Running     0               17h     10.104.18.19    4am-node25   <none>           <none>
fouramf-hpxxf-26-4547-milvus-datanode-68bb4599bb-6p26w            1/1     Running     1 (17h ago)     17h     10.104.6.220    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-milvus-indexnode-f8fc974b9-2vsv5            1/1     Running     1 (17h ago)     17h     10.104.9.233    4am-node14   <none>           <none>
fouramf-hpxxf-26-4547-milvus-indexnode-f8fc974b9-7cpg6            1/1     Running     1 (17h ago)     17h     10.104.19.3     4am-node28   <none>           <none>
fouramf-hpxxf-26-4547-milvus-indexnode-f8fc974b9-9xsh4            1/1     Running     0               17h     10.104.6.213    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-milvus-indexnode-f8fc974b9-fsxbr            1/1     Running     0               17h     10.104.25.194   4am-node30   <none>           <none>
fouramf-hpxxf-26-4547-milvus-mixcoord-84b6667759-gbdtw            1/1     Running     1 (17h ago)     17h     10.104.6.218    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-milvus-proxy-765cc9ffc7-f9mbm               1/1     Running     1 (17h ago)     17h     10.104.6.212    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-milvus-querynode-8459b6bcf-4ptsj            1/1     Running     1 (6h54m ago)   17h     10.104.34.214   4am-node37   <none>           <none>
fouramf-hpxxf-26-4547-minio-0                                     1/1     Running     0               17h     10.104.25.202   4am-node30   <none>           <none>
fouramf-hpxxf-26-4547-minio-1                                     1/1     Running     0               17h     10.104.19.9     4am-node28   <none>           <none>
fouramf-hpxxf-26-4547-minio-2                                     1/1     Running     0               17h     10.104.18.22    4am-node25   <none>           <none>
fouramf-hpxxf-26-4547-minio-3                                     1/1     Running     0               17h     10.104.23.45    4am-node27   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-bookie-0                           1/1     Running     0               17h     10.104.25.207   4am-node30   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-bookie-1                           1/1     Running     0               17h     10.104.19.13    4am-node28   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-bookie-2                           1/1     Running     0               17h     10.104.23.46    4am-node27   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-bookie-init-mh9gf                  0/1     Completed   0               17h     10.104.6.216    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-broker-0                           1/1     Running     0               17h     10.104.6.221    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-broker-1                           1/1     Running     0               17h     10.104.14.239   4am-node18   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-proxy-0                            1/1     Running     0               17h     10.104.6.223    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-proxy-1                            1/1     Running     0               17h     10.104.25.195   4am-node30   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-pulsar-init-2wpwq                  0/1     Completed   0               17h     10.104.6.219    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-recovery-0                         1/1     Running     0               17h     10.104.6.222    4am-node13   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-zookeeper-0                        1/1     Running     0               17h     10.104.19.11    4am-node28   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-zookeeper-1                        1/1     Running     0               17h     10.104.25.208   4am-node30   <none>           <none>
fouramf-hpxxf-26-4547-pulsarv3-zookeeper-2                        1/1     Running     0               17h     10.104.18.23    4am-node25   <none>           <none>

querynode_panic.log
截屏2024-12-20 10 51 32
image

Expected Behavior

No response

Steps To Reproduce

1. create a collection with fields: "id", "float_vector"(128dim), "float_vector_1"(768dim),"sparse_float_vector","bfloat16_vector"(256dim),"int64_1","varchar_1"
2. build index
   - HNSW: float_vector
   - DISKANN: float_vector_1
   - SPARSE_INVERTED_INDEX: sparse_float_vector
   - IVF_SQ8: bfloat16_vector
   - INVERTED: int64_1, varchar_1
3. insert 20m data
4. flush 
5. rebuild index
6. load collection
7. concurrent requests
   - scene_hybrid_search_test
     (collection: create->insert->flush->index->load->hybrid_search->drop)
   - scene_test
     (collection: create->insert->flush->index->drop)
   - scene_test_partition_hybrid_search
     (partition: create->insert->flush->index again->load->hybrid_search->release->hybrid_search failed->drop)
   - search
   - hybrid_search
   - query

Milvus Log

No response

Anything else?

server config: fouramf-server-all-fields-mmap-cluster

    extraConfigFiles:
      user.yaml: |+
        queryNode:
          mmap:
            vectorField: true
            vectorIndex: true
            scalarField: true
            scalarIndex: true
    queryNode:
      resources:
        limits:
          cpu: '32'
          memory: 32Gi
        requests:
          cpu: '16'
          memory: 32Gi
      replicas: 1
      nodeSelector:
        node-role/nvme: 'true'
    indexNode:
      resources:
        limits:
          cpu: '4.0'
          memory: 16Gi
        requests:
          cpu: '2.0'
          memory: 4Gi
      replicas: 4
    dataNode:
      resources:
        limits:
          cpu: '2.0'
          memory: 16Gi
        requests:
          cpu: '2.0'
          memory: 5Gi

client config: fouramf-client-all-vector-types-dql-ddl

    dataset_params:
      metric_type: L2
      dim: 128
      scalars_index:
        int64_1:
          index_type: INVERTED
        varchar_1:
          index_type: INVERTED
      vectors_index:
        float_vector_1:
          index_type: DISKANN
          index_param: {}
          metric_type: IP
        sparse_float_vector:
          index_type: SPARSE_INVERTED_INDEX
          index_param:
            drop_ratio_build: 0.2
          metric_type: IP
        bfloat16_vector:
          index_type: IVF_SQ8
          index_param:
            nlist: 2048
          metric_type: L2
      scalars_params:
        float_vector_1:
          params:
            dim: 768
          other_params:
            dataset: laion2b_multi
            column_name: float32_vector
        sparse_float_vector:
          other_params:
            dim: 10000
            sparse_range:
            - 1
            - 20
        bfloat16_vector:
          params:
            dim: 256
      dataset_name: sift
      dataset_size: 20m
      ni_per: 10000
    collection_params:
      other_fields:
      - float_vector_1
      - sparse_float_vector
      - bfloat16_vector
      - int64_1
      - varchar_1
      shards_num: 2
    index_params:
      index_type: HNSW
      index_param:
        M: 8
        efConstruction: 200
    concurrent_params:
      concurrent_number: 20
      during_time: 24h
      interval: 20
    concurrent_tasks:
    - type: scene_hybrid_search_test
      weight: 1
      params:
        nq: 2
        top_k: 5
        reqs:
        - search_param:
            nprobe: 128
          anns_field: float_vector
          expr: bool_1 == True
          top_k: 100
        - search_param:
            nprobe: 32
          anns_field: binary_vector_scene_hybrid_search_test_1
          expr: bool_1 != True
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float16_vector_scene_hybrid_search_test_2
          expr: int64_1 >= 1500
          top_k: 5
        - search_param:
            drop_ratio_search: 0.1
          anns_field: sparse_float_vector_scene_hybrid_search_test_3
          expr: varchar_1 like "1%"
          top_k: 10
        rerank:
          RRFRanker: []
        output_fields:
        - "*"
        timeout: 600
        random_data: true
        dataset: local
        dim: 128
        shards_num: 2
        data_size: 3000
        nb: 3000
        index_type: IVF_SQ8
        index_param:
          nlist: 2048
        metric_type: L2
        other_fields:
        - binary_vector_scene_hybrid_search_test_1
        - float16_vector_scene_hybrid_search_test_2
        - sparse_float_vector_scene_hybrid_search_test_3
        - int64_1
        - bool_1
        - varchar_1
        replica_number: 1
        scalars_params:
          binary_vector_scene_hybrid_search_test_1:
            params:
              dim: 512
            other_params:
              dataset: binary
          float16_vector_scene_hybrid_search_test_2:
            params:
              dim: 64
        scalars_index:
          int64_1: {}
          bool_1:
            index_type: BITMAP
          varchar_1:
            index_type: INVERTED
        vectors_index:
          binary_vector_scene_hybrid_search_test_1:
            index_type: BIN_IVF_FLAT
            index_param:
              nlist: 2048
            metric_type: JACCARD
          float16_vector_scene_hybrid_search_test_2:
            index_type: DISKANN
            index_param: {}
            metric_type: IP
          sparse_float_vector_scene_hybrid_search_test_3:
            index_type: SPARSE_WAND
            index_param:
              drop_ratio_build: 0.2
            metric_type: IP
        hybrid_search_counts: 10
    - type: scene_test
      weight: 1
      params:
        dim: 128
        data_size: 3000
        nb: 3000
        index_type: IVF_SQ8
        index_param:
          nlist: 2048
        metric_type: L2
    - type: scene_test_partition_hybrid_search
      weight: 1
      params:
        nq: 1
        top_k: 1
        reqs:
        - search_param:
            ef: 32
          anns_field: float_vector
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float_vector_1
          top_k: 10
        - search_param:
            drop_ratio_search: 0.3
          anns_field: sparse_float_vector
          top_k: 30
        - search_param:
            nprobe: 16
          anns_field: bfloat16_vector
          top_k: 400
        rerank:
          RRFRanker: []
        output_fields:
        - "*"
        timeout: 6000
        random_data: true
        hybrid_search_counts: 10
        data_size: 3000
        ni: 3000
    - type: search
      weight: 1
      params:
        nq: 1000
        top_k: 1
        search_param:
          nprobe: 1000
        expr: int64_1 >= 0
        timeout: 6000
        random_data: true
        partition_names:
        - _default
    - type: hybrid_search
      weight: 1
      params:
        nq: 1
        top_k: 100
        reqs:
        - search_param:
            ef: 32
          anns_field: float_vector
          expr: int64_1 > 100000
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float_vector_1
          expr: id < 900000
          top_k: 10
        - search_param:
            drop_ratio_search: 0.3
          anns_field: sparse_float_vector
          expr: varchar_1 > "1"
          top_k: 30
        - search_param:
            nprobe: 16
          anns_field: bfloat16_vector
          top_k: 400
        rerank:
          WeightedRanker:
          - 0.85
          - 0.95
          - 0.51
          - 0.32
        output_fields:
        - "*"
        partition_names:
        - _default
        timeout: 6000
        random_data: true
    - type: query
      weight: 1
      params:
        expr: 'int64_1 > -1 && '
        output_fields:
        - "*"
        partition_names:
        - _default
        limit: 10
        timeout: 6000
        custom_expr: " {0} < id < {0} + 1000000"
        custom_range:
        - 0
        - 20000000
@wangting0128 wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Dec 20, 2024
@wangting0128 wangting0128 added this to the 2.5.0 milestone Dec 20, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 20, 2024
@yanliang567 yanliang567 assigned sunby and unassigned yanliang567 Dec 20, 2024
@yanliang567 yanliang567 modified the milestones: 2.5.0, 2.5.1 Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants