Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse corrupt after server failure #2832

Closed
RoyvanEmpel opened this issue Feb 26, 2024 · 11 comments
Closed

Clickhouse corrupt after server failure #2832

RoyvanEmpel opened this issue Feb 26, 2024 · 11 comments

Comments

@RoyvanEmpel
Copy link

Environment

self-hosted (https://develop.sentry.dev/self-hosted/)

What are you trying to accomplish?

Hi,

The server that our self hosted sentry runt on had a hardware failure causing it to shutdown. After getting everything back up and running sentry no longer starts up.

Running docker compose up -d i get the following output:

dependency failed to start: container sentry-self-hosted-clickhouse-1 is unhealthy

I tried reinstalling using ./install.sh and after that i tried updating it to the latest version but i again get this output:

▶ Bootstrapping and migrating Snuba ...
Container sentry-self-hosted-zookeeper-1 Creating
Container sentry-self-hosted-redis-1 Creating
Container sentry-self-hosted-clickhouse-1 Creating
Container sentry-self-hosted-redis-1 Created
Container sentry-self-hosted-zookeeper-1 Created
Container sentry-self-hosted-kafka-1 Creating
Container sentry-self-hosted-clickhouse-1 Created
Container sentry-self-hosted-kafka-1 Created
Container sentry-self-hosted-clickhouse-1 Starting
Container sentry-self-hosted-zookeeper-1 Starting
Container sentry-self-hosted-redis-1 Starting
Container sentry-self-hosted-redis-1 Started
Container sentry-self-hosted-clickhouse-1 Started
Container sentry-self-hosted-zookeeper-1 Started
Container sentry-self-hosted-zookeeper-1 Waiting
Container sentry-self-hosted-zookeeper-1 Healthy
Container sentry-self-hosted-kafka-1 Starting
Container sentry-self-hosted-kafka-1 Started
dependency failed to start: container sentry-self-hosted-clickhouse-1 is unhealthy
Error in install/bootstrap-snuba.sh:3.
'$dcr snuba-api bootstrap --no-migrate --force' exited with status 1
-> ./install.sh:main:32
--> install/bootstrap-snuba.sh:source:3

Cleaning up...

How are you getting stuck?

Because of this sentry isn't starting. It seems that the clickhouse DB is corrupted, can i just drop the files of it and will it recover? I am guessing that issues are stored in it.
Maybe there is a command to try a self-repair?

Oh and BTW:
this says that the latest version is 24.3.0 while it is 24.2.0
https://develop.sentry.dev/self-hosted/releases/

Where in the product are you?

Other

Link

No response

DSN

No response

Version

24.2.0

@getsantry
Copy link

getsantry bot commented Feb 26, 2024

Assigning to @getsentry/support for routing ⏲️

@RoyvanEmpel
Copy link
Author

(version 21.8.13.1.altinitystable (altinity build))
2024.02.26 10:39:41.282861 [ 266 ] {} default.replays_local (38df915e-e158-46ca-b8df-915ee15806ca): Detaching broken part /var/lib/clickhouse/store/38d/38df915e-e158-46ca-b8df-915ee15806ca/90-20240226_268916_268916_0. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
2024.02.26 10:39:41.283485 [ 300 ] {} auto DB::MergeTreeData::loadDataParts(bool)::(anonymous class)::operator()() const: Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected 'columns format version: 1\n' at end of stream., Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0x8fe7c9a in /usr/bin/clickhouse

  1. DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x9041917 in /usr/bin/clickhouse
  2. DB::NamesAndTypesList::readText(DB::ReadBuffer&) @ 0xfcb6a18 in /usr/bin/clickhouse
  3. DB::IMergeTreeDataPart::loadColumns(bool) @ 0x10d416db in /usr/bin/clickhouse
  4. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x10d40b89 in /usr/bin/clickhouse
  5. ? @ 0x10dd672f in /usr/bin/clickhouse
  6. ThreadPoolImpl::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x902b638 in /usr/bin/clickhouse
  7. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()>(void&&, void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x902d1df in /usr/bin/clickhouse
  8. ThreadPoolImplstd::__1::thread::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x902891f in /usr/bin/clickhouse
  9. ? @ 0x902c203 in /usr/bin/clickhouse
  10. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
  11. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    (version 21.8.13.1.altinitystable (altinity build))
    2024.02.26 10:39:41.283514 [ 300 ] {} default.replays_local (38df915e-e158-46ca-b8df-915ee15806ca): Detaching broken part /var/lib/clickhouse/store/38d/38df915e-e158-46ca-b8df-915ee15806ca/90-20240226_265746_268912_786. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
    2024.02.26 10:39:41.283524 [ 294 ] {} auto DB::MergeTreeData::loadDataParts(bool)::(anonymous class)::operator()() const: Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected 'columns format version: 1\n' at end of stream., Stack trace (when copying this message, always include the lines below):
  12. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0x8fe7c9a in /usr/bin/clickhouse
  13. DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x9041917 in /usr/bin/clickhouse
  14. DB::NamesAndTypesList::readText(DB::ReadBuffer&) @ 0xfcb6a18 in /usr/bin/clickhouse
  15. DB::IMergeTreeDataPart::loadColumns(bool) @ 0x10d416db in /usr/bin/clickhouse
  16. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x10d40b89 in /usr/bin/clickhouse
  17. ? @ 0x10dd672f in /usr/bin/clickhouse
  18. ThreadPoolImpl::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x902b638 in /usr/bin/clickhouse
  19. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()>(void&&, void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x902d1df in /usr/bin/clickhouse
  20. ThreadPoolImplstd::__1::thread::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x902891f in /usr/bin/clickhouse
  21. ? @ 0x902c203 in /usr/bin/clickhouse
  22. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
  23. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    (version 21.8.13.1.altinitystable (altinity build))
    2024.02.26 10:39:41.283550 [ 294 ] {} default.replays_local (38df915e-e158-46ca-b8df-915ee15806ca): Detaching broken part /var/lib/clickhouse/store/38d/38df915e-e158-46ca-b8df-915ee15806ca/90-20240226_268908_268908_0. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
    2024.02.26 10:39:41.285074 [ 248 ] {} auto DB::MergeTreeData::loadDataParts(bool)::(anonymous class)::operator()() const: Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected 'columns format version: 1\n' at end of stream., Stack trace (when copying this message, always include the lines below):
  24. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0x8fe7c9a in /usr/bin/clickhouse
  25. DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x9041917 in /usr/bin/clickhouse
  26. DB::NamesAndTypesList::readText(DB::ReadBuffer&) @ 0xfcb6a18 in /usr/bin/clickhouse
  27. DB::IMergeTreeDataPart::loadColumns(bool) @ 0x10d416db in /usr/bin/clickhouse
  28. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x10d40b89 in /usr/bin/clickhouse
  29. ? @ 0x10dd672f in /usr/bin/clickhouse
  30. ThreadPoolImpl::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x902b638 in /usr/bin/clickhouse
  31. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()>(void&&, void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x902d1df in /usr/bin/clickhouse
  32. ThreadPoolImplstd::__1::thread::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x902891f in /usr/bin/clickhouse
  33. ? @ 0x902c203 in /usr/bin/clickhouse
  34. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
  35. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    (version 21.8.13.1.altinitystable (altinity build))
    2024.02.26 10:39:41.285096 [ 248 ] {} default.replays_local (38df915e-e158-46ca-b8df-915ee15806ca): Detaching broken part /var/lib/clickhouse/store/38d/38df915e-e158-46ca-b8df-915ee15806ca/90-20240226_268913_268913_0. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
    2024.02.26 10:39:41.285278 [ 248 ] {} auto DB::MergeTreeData::loadDataParts(bool)::(anonymous class)::operator()() const: Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected 'columns format version: 1\n' at end of stream., Stack trace (when copying this message, always include the lines below):
  36. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0x8fe7c9a in /usr/bin/clickhouse
  37. DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x9041917 in /usr/bin/clickhouse
  38. DB::NamesAndTypesList::readText(DB::ReadBuffer&) @ 0xfcb6a18 in /usr/bin/clickhouse
  39. DB::IMergeTreeDataPart::loadColumns(bool) @ 0x10d416db in /usr/bin/clickhouse
  40. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x10d40b89 in /usr/bin/clickhouse
  41. ? @ 0x10dd672f in /usr/bin/clickhouse
  42. ThreadPoolImpl::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x902b638 in /usr/bin/clickhouse
  43. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()>(void&&, void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x902d1df in /usr/bin/clickhouse
  44. ThreadPoolImplstd::__1::thread::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x902891f in /usr/bin/clickhouse
  45. ? @ 0x902c203 in /usr/bin/clickhouse
  46. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
  47. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    (version 21.8.13.1.altinitystable (altinity build))
    2024.02.26 10:39:41.285303 [ 248 ] {} default.replays_local (38df915e-e158-46ca-b8df-915ee15806ca): Detaching broken part /var/lib/clickhouse/store/38d/38df915e-e158-46ca-b8df-915ee15806ca/90-20240226_268905_268905_0. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
    2024.02.26 10:39:41.294025 [ 297 ] {} auto DB::MergeTreeData::loadDataParts(bool)::(anonymous class)::operator()() const: Code: 27, e.displayText() = DB::ParsingException: Cannot parse input: expected 'columns format version: 1\n' at end of stream., Stack trace (when copying this message, always include the lines below):
  48. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0x8fe7c9a in /usr/bin/clickhouse
  49. DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x9041917 in /usr/bin/clickhouse
  50. DB::NamesAndTypesList::readText(DB::ReadBuffer&) @ 0xfcb6a18 in /usr/bin/clickhouse
  51. DB::IMergeTreeDataPart::loadColumns(bool) @ 0x10d416db in /usr/bin/clickhouse
  52. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x10d40b89 in /usr/bin/clickhouse
  53. ? @ 0x10dd672f in /usr/bin/clickhouse
  54. ThreadPoolImpl::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x902b638 in /usr/bin/clickhouse
  55. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()>(void&&, void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x902d1df in /usr/bin/clickhouse
  56. ThreadPoolImplstd::__1::thread::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x902891f in /usr/bin/clickhouse
  57. ? @ 0x902c203 in /usr/bin/clickhouse
  58. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
  59. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    (version 21.8.13.1.altinitystable (altinity build))
    2024.02.26 10:39:41.294052 [ 297 ] {} default.replays_local (38df915e-e158-46ca-b8df-915ee15806ca): Detaching broken part /var/lib/clickhouse/store/38d/38df915e-e158-46ca-b8df-915ee15806ca/90-20240226_265746_268907_781. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
    2024.02.26 10:39:41.318674 [ 44 ] {} Application: Caught exception while loading metadata: Code: 231, e.displayText() = DB::Exception: Suspiciously many (30) broken parts to remove.: Cannot attach table default.replays_local from metadata file /var/lib/clickhouse/store/d48/d48a4d38-4b97-43bb-948a-4d384b9733bb/replays_local.sql from query ATTACH TABLE default.replays_local UUID '38df915e-e158-46ca-b8df-915ee15806ca' (replay_id UUID, debug_id UUID, count_info_events UInt8 MATERIALIZED (debug_id != '00000000-0000-0000-0000-000000000000') + (info_id != '00000000-0000-0000-0000-000000000000'), count_warning_events UInt8 MATERIALIZED warning_id != '00000000-0000-0000-0000-000000000000', count_error_events UInt8 MATERIALIZED (error_id != '00000000-0000-0000-0000-000000000000') + (fatal_id != '00000000-0000-0000-0000-000000000000'), info_id UUID, warning_id UUID, error_id UUID, fatal_id UUID, replay_type LowCardinality(Nullable(String)), error_sample_rate Nullable(Float64), session_sample_rate Nullable(Float64), event_hash UUID, segment_id Nullable(UInt16), trace_ids Array(UUID), _trace_ids_hashed Array(UInt64) MATERIALIZED arrayMap(t -> cityHash64(t), trace_ids), title Nullable(String), url Nullable(String), urls Array(String), is_archived Nullable(UInt8), error_ids Array(UUID), _error_ids_hashed Array(UInt64) MATERIALIZED arrayMap(t -> cityHash64(t), error_ids), project_id UInt64, timestamp DateTime, replay_start_timestamp Nullable(DateTime), platform LowCardinality(String), environment LowCardinality(Nullable(String)), release Nullable(String), dist Nullable(String), ip_address_v4 Nullable(IPv4), ip_address_v6 Nullable(IPv6), user Nullable(String), user_id Nullable(String), user_name Nullable(String), user_email Nullable(String), os_name LowCardinality(Nullable(String)), os_version Nullable(String), browser_name LowCardinality(Nullable(String)), browser_version Nullable(String), device_name LowCardinality(Nullable(String)), device_brand LowCardinality(Nullable(String)), device_family LowCardinality(Nullable(String)), device_model LowCardinality(Nullable(String)), sdk_name LowCardinality(Nullable(String)), sdk_version LowCardinality(Nullable(String)), tags.key Array(String), tags.value Array(String), click_node_id UInt32 DEFAULT 0, click_tag LowCardinality(String) DEFAULT '', click_id String DEFAULT '', click_class Array(String), click_text String DEFAULT '', click_role LowCardinality(String) DEFAULT '', click_alt String DEFAULT '', click_testid String DEFAULT '', click_aria_label String DEFAULT '', click_title String DEFAULT '', click_component_name String, click_is_dead UInt8, click_is_rage UInt8, count_errors UInt16 MATERIALIZED length(error_ids), count_urls UInt16 MATERIALIZED length(urls), retention_days UInt16, partition UInt16, offset UInt64, INDEX bf_trace_ids_hashed _trace_ids_hashed TYPE bloom_filter GRANULARITY 1, INDEX bf_error_ids_hashed _error_ids_hashed TYPE bloom_filter GRANULARITY 1, INDEX bf_user_id user_id TYPE bloom_filter GRANULARITY 1, INDEX bf_user_email user_email TYPE bloom_filter GRANULARITY 1, INDEX bf_ip_address_v4 ip_address_v4 TYPE bloom_filter GRANULARITY 1) ENGINE = ReplacingMergeTree PARTITION BY (retention_days, toMonday(timestamp)) ORDER BY (project_id, toStartOfDay(timestamp), cityHash64(replay_id), event_hash) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity = 8192: while loading database default from path /var/lib/clickhouse/metadata/default, Stack trace (when copying this message, always include the lines below):
  60. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, int, bool) @ 0x8fe7c9a in /usr/bin/clickhouse
  61. DB::MergeTreeData::loadDataParts(bool) @ 0x10d88537 in /usr/bin/clickhouse
  62. DB::StorageMergeTree::StorageMergeTree(DB::StorageID const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, DB::StorageInMemoryMetadata const&, bool, std::__1::shared_ptrDB::Context, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, DB::MergeTreeData::MergingParams const&, std::__1::unique_ptr<DB::MergeTreeSettings, std::__1::default_deleteDB::MergeTreeSettings >, bool) @ 0x10fb2857 in /usr/bin/clickhouse
  63. ? @ 0x10fa7ce7 in /usr/bin/clickhouse
  64. DB::StorageFactory::get(DB::ASTCreateQuery const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, std::__1::shared_ptrDB::Context, std::__1::shared_ptrDB::Context, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, bool) const @ 0x10a44ec1 in /usr/bin/clickhouse
  65. DB::createTableFromAST(DB::ASTCreateQuery, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, std::__1::shared_ptrDB::Context, bool) @ 0xff9c705 in /usr/bin/clickhouse
  66. ? @ 0xff9a853 in /usr/bin/clickhouse
  67. ? @ 0xff9b83f in /usr/bin/clickhouse
  68. ThreadPoolImpl::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x902b638 in /usr/bin/clickhouse
  69. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()>(void&&, void ThreadPoolImpl::scheduleImpl(std::__1::function<void ()>, int, std::__1::optional)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x902d1df in /usr/bin/clickhouse
  70. ThreadPoolImplstd::__1::thread::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x902891f in /usr/bin/clickhouse
  71. ? @ 0x902c203 in /usr/bin/clickhouse
  72. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
  73. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    (version 21.8.13.1.altinitystable (altinity build))
    ������
    �2024.02.26 10:39:41.320792 [ 44 ] {} Application: DB::Exception: Suspiciously many (30) broken parts to remove.: Cannot attach table default.replays_local from metadata file /var/lib/clickhouse/store/d48/d48a4d38-4b97-43bb-948a-4d384b9733bb/replays_local.sql from query ATTACH TABLE default.replays_local UUID '38df915e-e158-46ca-b8df-915ee15806ca' (replay_id UUID, debug_id UUID, count_info_events UInt8 MATERIALIZED (debug_id != '00000000-0000-0000-0000-000000000000') + (info_id != '00000000-0000-0000-0000-000000000000'), count_warning_events UInt8 MATERIALIZED warning_id != '00000000-0000-0000-0000-000000000000', count_error_events UInt8 MATERIALIZED (error_id != '00000000-0000-0000-0000-000000000000') + (fatal_id != '00000000-0000-0000-0000-000000000000'), info_id UUID, warning_id UUID, error_id UUID, fatal_id UUID, replay_type LowCardinality(Nullable(String)), error_sample_rate Nullable(Float64), session_sample_rate Nullable(Float64), event_hash UUID, segment_id Nullable(UInt16), trace_ids Array(UUID), _trace_ids_hashed Array(UInt64) MATERIALIZED arrayMap(t -> cityHash64(t), trace_ids), title Nullable(String), url Nullable(String), urls Array(String), is_archived Nullable(UInt8), error_ids Array(UUID), _error_ids_hashed Array(UInt64) MATERIALIZED arrayMap(t -> cityHash64(t), error_ids), project_id UInt64, timestamp DateTime, replay_start_timestamp Nullable(DateTime), platform LowCardinality(String), environment LowCardinality(Nullable(String)), release Nullable(String), dist Nullable(String), ip_address_v4 Nullable(IPv4), ip_address_v6 Nullable(IPv6), user Nullable(String), user_id Nullable(String), user_name Nullable(String), user_email Nullable(String), os_name LowCardinality(Nullable(String)), os_version Nullable(String), browser_name LowCardinality(Nullable(String)), browser_version Nullable(String), device_name LowCardinality(Nullable(String)), device_brand LowCardinality(Nullable(String)), device_family LowCardinality(Nullable(String)), device_model LowCardinality(Nullable(String)), sdk_name LowCardinality(Nullable(String)), sdk_version LowCardinality(Nullable(String)), tags.key Array(String), tags.value Array(String), click_node_id UInt32 DEFAULT 0, click_tag LowCardinality(String) DEFAULT '', click_id String DEFAULT '', click_class Array(String), click_text String DEFAULT '', click_role LowCardinality(String) DEFAULT '', click_alt String DEFAULT '', click_testid String DEFAULT '', click_aria_label String DEFAULT '', click_title String DEFAULT '', click_component_name String, click_is_dead UInt8, click_is_rage UInt8, count_errors UInt16 MATERIALIZED length(error_ids), count_urls UInt16 MATERIALIZED length(urls), retention_days UInt16, partition UInt16, offset UInt64, INDEX bf_trace_ids_hashed _trace_ids_hashed TYPE bloom_filter GRANULARITY 1, INDEX bf_error_ids_hashed _error_ids_hashed TYPE bloom_filter GRANULARITY 1, INDEX bf_user_id user_id TYPE bloom_filter GRANULARITY 1, INDEX bf_user_email user_email TYPE bloom_filter GRANULARITY 1, INDEX bf_ip_address_v4 ip_address_v4 TYPE bloom_filter GRANULARITY 1) ENGINE = ReplacingMergeTree PARTITION BY (retention_days, toMonday(timestamp)) ORDER BY (project_id, toStartOfDay(timestamp), cityHash64(replay_id), event_hash) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity = 8192: while loading database default from path /var/lib/clickhouse/metadata/default

@getsantry
Copy link

getsantry bot commented Feb 26, 2024

Routing to @getsentry/product-owners-settings-integrations for triage ⏲️

@Dhrumil-Sentry Dhrumil-Sentry transferred this issue from getsentry/sentry Feb 26, 2024
@hubertdeng123
Copy link
Member

You may need to manually remove rows from your clickhouse table since it looks like corrupt data got in. replays_local seems to be the table in particular.

@RoyvanEmpel
Copy link
Author

Hey thanks for the response. Nobody in our company has worked with clickhouse before. So we don't know how the database functions or we can connect to it. Do we just drop files somewhere?

We don't really care about the existing replays and are wondering if there is a way to simply wipe the table so it recreates it. We just want the selfhosted sentry up and running.

@hubertdeng123
Copy link
Member

I believe you should be able to drop the table replays_local, but it may be better to just start from a new clickhouse volume if data is corrupt to be safe. That would involve removing the docker volume for sentry-postgres and rerunning the install script.

@RoyvanEmpel
Copy link
Author

Hey, if i remove the volume for clickhouse and postgres then i imagine that i will lose a lot more then just the replays. Are all my configurations inside sentry and my user accounts then also gone?

@victorelec14
Copy link
Contributor

victorelec14 commented Mar 4, 2024

I have a similar problem, in my case the VPS ran out of space and I caused a clickhouse error (I think when saving). Now clickhouse won't start, try deleting the volume to see if it migrates back but nothing seems to happen.

2024.03.04 10:08:25.682096 [ 43 ] {} <Error> Application: DB::Exception: Suspiciously many (12) broken parts to remove.: Cannot attach table `default`.`outcomes_hourly_local` from metadata file /var/lib/clickhouse/metadata/default/outcomes_hourly_local.sql from query ATTACH TABLE default.outcomes_hourly_local (`org_id` UInt64, `project_id` UInt64, `key_id` UInt64, `timestamp` DateTime, `category` UInt8, `outcome` UInt8, `reason` LowCardinality(String), `quantity` UInt64, `times_seen` UInt64, `bytes_received` UInt64) ENGINE = SummingMergeTree PARTITION BY toMonday(timestamp) PRIMARY KEY (org_id, project_id, key_id, outcome, reason, timestamp) ORDER BY (org_id, project_id, key_id, outcome, reason, timestamp, category) TTL timestamp + toIntervalDay(90) SETTINGS index_granularity = 256: while loading database `default` from path /var/lib/clickhouse/metadata/default

sentry-self-hosted-clickhouse-1  |
sentry-self-hosted-clickhouse-1  | 0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8fe7c9a in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 1. DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x9041917 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 2. DB::NamesAndTypesList::readText(DB::ReadBuffer&) @ 0xfcb6a18 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 3. DB::IMergeTreeDataPart::loadColumns(bool) @ 0x10d416db in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 4. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x10d40b89 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 5. ? @ 0x10dd672f in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 6. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x902b638 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 7. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x902d1df in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 8. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x902891f in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 9. ? @ 0x902c203 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 10. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
sentry-self-hosted-clickhouse-1  | 11. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
sentry-self-hosted-clickhouse-1  |  (version 21.8.13.1.altinitystable (altinity build))
sentry-self-hosted-clickhouse-1  | 2024.03.04 10:08:25.643432 [ 185 ] {} <Error> default.outcomes_raw_local: Detaching broken part /var/lib/clickhouse/data/default/outcomes_raw_local/20240226_4841837_4857253_12830. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
sentry-self-hosted-clickhouse-1  | 2024.03.04 10:08:25.665171 [ 43 ] {} <Error> Application: Caught exception while loading metadata: Code: 231, e.displayText() = DB::Exception: Suspiciously many (12) broken parts to remove.: Cannot attach table `default`.`outcomes_hourly_local` from metadata file /var/lib/clickhouse/metadata/default/outcomes_hourly_local.sql from query ATTACH TABLE default.outcomes_hourly_local (`org_id` UInt64, `project_id` UInt64, `key_id` UInt64, `timestamp` DateTime, `category` UInt8, `outcome` UInt8, `reason` LowCardinality(String), `quantity` UInt64, `times_seen` UInt64, `bytes_received` UInt64) ENGINE = SummingMergeTree PARTITION BY toMonday(timestamp) PRIMARY KEY (org_id, project_id, key_id, outcome, reason, timestamp) ORDER BY (org_id, project_id, key_id, outcome, reason, timestamp, category) TTL timestamp + toIntervalDay(90) SETTINGS index_granularity = 256: while loading database `default` from path /var/lib/clickhouse/metadata/default, Stack trace (when copying this message, always include the lines below):
sentry-self-hosted-clickhouse-1  |
sentry-self-hosted-clickhouse-1  |
sentry-self-hosted-clickhouse-1  | 0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8fe7c9a in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 1. DB::MergeTreeData::loadDataParts(bool) @ 0x10d88537 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 2. DB::StorageMergeTree::StorageMergeTree(DB::StorageID const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::StorageInMemoryMetadata const&, bool, std::__1::shared_ptr<DB::Context>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::MergeTreeData::MergingParams const&, std::__1::unique_ptr<DB::MergeTreeSettings, std::__1::default_delete<DB::MergeTreeSettings> >, bool) @ 0x10fb2857 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 3. ? @ 0x10fa7ce7 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 4. DB::StorageFactory::get(DB::ASTCreateQuery const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Context>, std::__1::shared_ptr<DB::Context>, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, bool) const @ 0x10a44ec1 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 5. DB::createTableFromAST(DB::ASTCreateQuery, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Context>, bool) @ 0xff9c705 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 6. ? @ 0xff9a853 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 7. ? @ 0xff9b83f in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 8. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x902b638 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 9. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x902d1df in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x902891f in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 11. ? @ 0x902c203 in /usr/bin/clickhouse
sentry-self-hosted-clickhouse-1  | 12. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
sentry-self-hosted-clickhouse-1  | 13. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
sentry-self-hosted-clickhouse-1  |  (version 21.8.13.1.altinitystable (altinity build))
sentry-self-hosted-clickhouse-1  | 2024.03.04 10:08:25.682096 [ 43 ] {} <Error> Application: DB::Exception: Suspiciously many (12) broken parts to remove.: Cannot attach table `default`.`outcomes_hourly_local` from metadata file /var/lib/clickhouse/metadata/default/outcomes_hourly_local.sql from query ATTACH TABLE default.outcomes_hourly_local (`org_id` UInt64, `project_id` UInt64, `key_id` UInt64, `timestamp` DateTime, `category` UInt8, `outcome` UInt8, `reason` LowCardinality(String), `quantity` UInt64, `times_seen` UInt64, `bytes_received` UInt64) ENGINE = SummingMergeTree PARTITION BY toMonday(timestamp) PRIMARY KEY (org_id, project_id, key_id, outcome, reason, timestamp) ORDER BY (org_id, project_id, key_id, outcome, reason, timestamp, category) TTL timestamp + toIntervalDay(90) SETTINGS index_granularity = 256: while loading database `default` from path /var/lib/clickhouse/metadata/default

And after delete the volume and try to upgrade:

docker rm -v sentry-self-hosted-clickhouse-1

 Bootstrapping and migrating Snuba ...
 Network sentry-self-hosted_default  Creating
 Network sentry-self-hosted_default  Created
 Container sentry-self-hosted-redis-1  Creating
 Container sentry-self-hosted-zookeeper-1  Creating
 Container sentry-self-hosted-clickhouse-1  Creating
 Container sentry-self-hosted-redis-1  Created
 Container sentry-self-hosted-clickhouse-1  Created
 Container sentry-self-hosted-zookeeper-1  Created
 Container sentry-self-hosted-kafka-1  Creating
 Container sentry-self-hosted-kafka-1  Created
 Container sentry-self-hosted-redis-1  Starting
 Container sentry-self-hosted-zookeeper-1  Starting
 Container sentry-self-hosted-clickhouse-1  Starting
 Container sentry-self-hosted-clickhouse-1  Started
 Container sentry-self-hosted-zookeeper-1  Started
 Container sentry-self-hosted-zookeeper-1  Waiting
 Container sentry-self-hosted-redis-1  Started
 Container sentry-self-hosted-zookeeper-1  Healthy
 Container sentry-self-hosted-kafka-1  Starting
 Container sentry-self-hosted-kafka-1  Started
dependency failed to start: container sentry-self-hosted-clickhouse-1 is unhealthy
Error in install/bootstrap-snuba.sh:3.
'$dcr snuba-api bootstrap --no-migrate --force' exited with status 1
-> ./install.sh:main:32
--> install/bootstrap-snuba.sh:source:3

Looks like you've already sent this error to us, we're on it :)

thanks

@victorelec14
Copy link
Contributor

victorelec14 commented Mar 4, 2024

I just solved it by increasing the total of "max_suspicious_broken_parts" in the clickhouse configuration, by default they are 10/100 and I increased it to 1000.

Keep in mind that the parts that give errors will be omitted and you will not be able to recover them.

You can add this to your configuration.

<max_suspicious_broken_parts>1000</max_suspicious_broken_parts>

root@sentry:~/onpremise/clickhouse# cat config.xml
<yandex>
    <max_server_memory_usage_to_ram_ratio>
        <!-- This include is important!
         It is required for the version of Clickhouse
         used on ARM to read the environment variable. -->
        <include from_env="MAX_MEMORY_USAGE_RATIO"/>
    </max_server_memory_usage_to_ram_ratio>
    <logger>
        <level>warning</level>
        <console>true</console>
    </logger>
    <query_thread_log remove="remove"/>
    <query_log remove="remove"/>
    <text_log remove="remove"/>
    <trace_log remove="remove"/>
    <metric_log remove="remove"/>
    <asynchronous_metric_log remove="remove"/>

    <!-- Update: Required for newer versions of Clickhouse -->
    <session_log remove="remove"/>
    <part_log remove="remove"/>

    <profiles>
        <default>
            <log_queries>0</log_queries>
            <log_query_threads>0</log_query_threads>
        </default>
    </profiles>
    <merge_tree>
        <enable_mixed_granularity_parts>1</enable_mixed_granularity_parts>
        <max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
    </merge_tree>
</yandex>

References:
ClickHouse/ClickHouse#41423
ClickHouse/ClickHouse#41619
https://clickhouse.com/docs/en/operations/settings/merge-tree-settings

@azaslavsky
Copy link
Contributor

Thanks for throwing up that PR! I've left a comment, but it looks like a good thing to add.

@RoyvanEmpel
Copy link
Author

After raising the limit my sentry's clickhous DB also starts. Thanks.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Archived in project
Development

No branches or pull requests

5 participants