You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys. We've been running a stock standard Azure Container Apps deployment successfully since December 4th 2023. Its been running fine with successful data querying, as of COB last Friday. Since Monday morning the container is crashing and cant start up. There is nothing I'm aware of that ran or was done on our side as part of an automated or human process that did anything to the resource. This is the second deployment that this has occurred with (ran well for a few weeks then suddenly the container started crashing) and I'm struggling to understand why. The log stream shows:
2024-01-17T06:32:52.25782 Connecting to the container 'qdrantapicontainerapp'...
2024-01-17T06:32:52.27576 Successfully Connected to container: 'qdrantapicontainerapp' [Revision: 'sygniasynapseqdranthttp--0tfisge-567f7bd697-5hr52', Replica: 'sygniasynapseqdranthttp--0tfisge']
2024-01-17T06:32:37.835011814Z 2: std::panicking::rust_panic_with_hook
2024-01-17T06:32:37.835016242Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:735:13
2024-01-17T06:32:37.835020700Z 3: std::panicking::begin_panic_handler::{{closure}}
2024-01-17T06:32:37.835024728Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:609:13
2024-01-17T06:32:37.835028695Z 4: std::sys_common::backtrace::__rust_end_short_backtrace
2024-01-17T06:32:37.835032312Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:170:18
2024-01-17T06:32:37.835037161Z 5: rust_begin_unwind
2024-01-17T06:32:37.835041559Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:597:5
2024-01-17T06:32:37.835046238Z 6: core::panicking::panic_fmt
2024-01-17T06:32:37.835049925Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/panicking.rs:72:14
2024-01-17T06:32:37.835055635Z 7: collection::shards::shard_holder::ShardHolder::load_shards::{{closure}}.110038
2024-01-17T06:32:37.835059663Z 8: storage::content_manager::toc::TableOfContent::new
2024-01-17T06:32:37.835063911Z 9: qdrant::main
2024-01-17T06:32:37.835067928Z 10: std::sys_common::backtrace::__rust_begin_short_backtrace
2024-01-17T06:32:37.835072066Z 11: main
2024-01-17T06:32:37.835075943Z 12:
2024-01-17T06:32:37.835079880Z 13: __libc_start_main
2024-01-17T06:32:37.835083507Z 14: _start
2024-01-17T06:32:37.835086994Z
2024-01-17T06:32:37.835092183Z 2024-01-17T06:32:37.834886Z ERROR qdrant::startup: Panic occurred in file /qdrant/lib/collection/src/shards/replica_set/mod.rs at line 246: Failed to load local shard "./storage/collections/[redacted]/0": Service internal error: RocksDB open error: IO error: No such file or directory: while unlink() file: ./storage/collections/[redacted]/0/segments/23a17757-59d1-4649-acbb-7b5b183af4bb/LOG.old.1705084754144915: No such file or directory
If I browse to the file its looking for, in the Azure portal, its reported as being marked for deletion by an SMB client. As far as I know there was no human action that did this, and all other files are accessible. This is also the only fine that has contents. All the other LOG.old files are 0 size. We cant delete the file because its already marked for deletion, so I cant upload any sort of replacement file, so short of redeploying everything, I'm not sure where to go from here. I set the soft delete period to the minimum (1 day) in the hopes that once the file deleted it would sort itself out, but the file hasn't deleted and is still present but inaccessible. I'm really hoping I don't have to do a complete redeploy to fix this, so any assistance you can give to help understand why this has happened, would be highly appreciated.
Thanks so much
Please provide us with the following information:
This issue is for a: (mark with an x)
- [X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
No idea. It was working fine and then something marked the log.OLD file for deletion
Any log messages given by the failure
Expected/desired behavior
That the working deployment continues to work
OS and Version?
Azure Container Service , so probably Linux
Versions
Mention any other details that might be useful
Thanks! We'll be in touch soon.
The text was updated successfully, but these errors were encountered:
When last I looked at it, it was. We needed to get our chat service up and running again, and I wasnt prepared to recreate the resource and restore the data for a second time, only for this to happen again, for a third time, in a month. We moved to Azure AI Search.
Hi guys. We've been running a stock standard Azure Container Apps deployment successfully since December 4th 2023. Its been running fine with successful data querying, as of COB last Friday. Since Monday morning the container is crashing and cant start up. There is nothing I'm aware of that ran or was done on our side as part of an automated or human process that did anything to the resource. This is the second deployment that this has occurred with (ran well for a few weeks then suddenly the container started crashing) and I'm struggling to understand why. The log stream shows:
2024-01-17T06:32:52.25782 Connecting to the container 'qdrantapicontainerapp'...
2024-01-17T06:32:52.27576 Successfully Connected to container: 'qdrantapicontainerapp' [Revision: 'sygniasynapseqdranthttp--0tfisge-567f7bd697-5hr52', Replica: 'sygniasynapseqdranthttp--0tfisge']
2024-01-17T06:32:37.835011814Z 2: std::panicking::rust_panic_with_hook
2024-01-17T06:32:37.835016242Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:735:13
2024-01-17T06:32:37.835020700Z 3: std::panicking::begin_panic_handler::{{closure}}
2024-01-17T06:32:37.835024728Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:609:13
2024-01-17T06:32:37.835028695Z 4: std::sys_common::backtrace::__rust_end_short_backtrace
2024-01-17T06:32:37.835032312Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:170:18
2024-01-17T06:32:37.835037161Z 5: rust_begin_unwind
2024-01-17T06:32:37.835041559Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:597:5
2024-01-17T06:32:37.835046238Z 6: core::panicking::panic_fmt
2024-01-17T06:32:37.835049925Z at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/panicking.rs:72:14
2024-01-17T06:32:37.835055635Z 7: collection::shards::shard_holder::ShardHolder::load_shards::{{closure}}.110038
2024-01-17T06:32:37.835059663Z 8: storage::content_manager::toc::TableOfContent::new
2024-01-17T06:32:37.835063911Z 9: qdrant::main
2024-01-17T06:32:37.835067928Z 10: std::sys_common::backtrace::__rust_begin_short_backtrace
2024-01-17T06:32:37.835072066Z 11: main
2024-01-17T06:32:37.835075943Z 12:
2024-01-17T06:32:37.835079880Z 13: __libc_start_main
2024-01-17T06:32:37.835083507Z 14: _start
2024-01-17T06:32:37.835086994Z
2024-01-17T06:32:37.835092183Z 2024-01-17T06:32:37.834886Z ERROR qdrant::startup: Panic occurred in file /qdrant/lib/collection/src/shards/replica_set/mod.rs at line 246: Failed to load local shard "./storage/collections/[redacted]/0": Service internal error: RocksDB open error: IO error: No such file or directory: while unlink() file: ./storage/collections/[redacted]/0/segments/23a17757-59d1-4649-acbb-7b5b183af4bb/LOG.old.1705084754144915: No such file or directory
If I browse to the file its looking for, in the Azure portal, its reported as being marked for deletion by an SMB client. As far as I know there was no human action that did this, and all other files are accessible. This is also the only fine that has contents. All the other LOG.old files are 0 size. We cant delete the file because its already marked for deletion, so I cant upload any sort of replacement file, so short of redeploying everything, I'm not sure where to go from here. I set the soft delete period to the minimum (1 day) in the hopes that once the file deleted it would sort itself out, but the file hasn't deleted and is still present but inaccessible. I'm really hoping I don't have to do a complete redeploy to fix this, so any assistance you can give to help understand why this has happened, would be highly appreciated.
Thanks so much
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
That the working deployment continues to work
OS and Version?
Versions
Mention any other details that might be useful
The text was updated successfully, but these errors were encountered: