Skip to content

Commit b1316da

Browse files
committed
Fix: ust-consumer: metadata thread not woken-up after version change
Issue observed ============== The metadata regeneration test fails, very rarely, in the "streaming" case on the CI. The interesting part of the test boils down to: 1) start session 2) launch an app tracing one event 3) stop session 4) delete metadata file 5) start session 6) regenerate metadata 7) stop session 8) destroy session 9) read trace: babeltrace fails on an invalid metadata file. The problem is hard to capture, but modifying the test allows us to see that there appears to be a short window between steps 7 and 8 where the metadata file is empty or doesn't exist. Cause ===== When metadata is regenerated, its version is bumped and the metadata cache is "reset". In some cases, such as in this test, the new metadata will have exactly the same size as it had prior as nothing happened to change that (e.g. no new apps/probes were registered). When this occurs, the metadata thread is not woken-up by consumer_metadata_cache_write() as it sees that max_offset of the metadata cache didn't change; the data was replaced but it has the same size. The metadata consumption thread also checks for version bumps and resets the amount of consumed metadata. Hence, if the "cache write" operation woke up the metadata consumption thread, the stream's "ust metadata pushed" state would be reset and the new contents would be consumed. Solution ======== The metadata stream's "ust metadata pushed" position is directly reset to zero when a metadata version change is detected by the metadata cache. The metadata poll thread is also woken up to resume the consumption of the newly-available data. It is unclear why the change to the consumption position was only done on the metadata consumption thread's code path and not directly by the session daemon command handling. Note that a session rotation will also result in a reset of the pushed position and a wake-up of the metadata poll thread from the command handling thread. I am speculating that this couldn't be done due to the design of the locking at the time of the original implementation (I haven't checked). In implementing this change, the metadata reception code path is untangled a bit to separate the logic that affects the metadata stream from the logic that manages the metadata cache. I suspect the original error stems from a mix-up/confusion between both concerns. When a metadata version change happens, the metadata cache resets its 'max_offset' (in other words, it's current size) and notifies the caller. The caller then resets the "ust pushed metadata" position to zero and wakes-up the metadata thread to consume the new contents of the metadata cache. Known drawbacks =============== None. Signed-off-by: Jérémie Galarneau <[email protected]> Change-Id: I142ef957140d497ac7fc4294ca65a55c12518598
1 parent 934ba8f commit b1316da

File tree

5 files changed

+144
-66
lines changed

5 files changed

+144
-66
lines changed

src/common/consumer/consumer-metadata-cache.c

+35-52
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@
2323

2424
#include "consumer-metadata-cache.h"
2525

26+
enum metadata_cache_update_version_status {
27+
METADATA_CACHE_UPDATE_STATUS_VERSION_UPDATED,
28+
METADATA_CACHE_UPDATE_STATUS_VERSION_NOT_UPDATED,
29+
};
30+
2631
extern struct lttng_consumer_global_data consumer_data;
2732

2833
/*
@@ -74,60 +79,23 @@ void metadata_cache_reset(struct consumer_metadata_cache *cache)
7479
* Check if the metadata cache version changed.
7580
* If it did, reset the metadata cache.
7681
* The metadata cache lock MUST be held.
77-
*
78-
* Returns 0 on success, a negative value on error.
7982
*/
80-
static
81-
int metadata_cache_check_version(struct consumer_metadata_cache *cache,
82-
uint64_t version)
83+
static enum metadata_cache_update_version_status metadata_cache_update_version(
84+
struct consumer_metadata_cache *cache, uint64_t version)
8385
{
84-
int ret = 0;
86+
enum metadata_cache_update_version_status status;
8587

8688
if (cache->version == version) {
89+
status = METADATA_CACHE_UPDATE_STATUS_VERSION_NOT_UPDATED;
8790
goto end;
8891
}
8992

9093
DBG("Metadata cache version update to %" PRIu64, version);
91-
metadata_cache_reset(cache);
9294
cache->version = version;
95+
status = METADATA_CACHE_UPDATE_STATUS_VERSION_UPDATED;
9396

9497
end:
95-
return ret;
96-
}
97-
98-
/*
99-
* Write a character on the metadata poll pipe to wake the metadata thread.
100-
* Returns 0 on success, -1 on error.
101-
*/
102-
int consumer_metadata_wakeup_pipe(const struct lttng_consumer_channel *channel)
103-
{
104-
int ret = 0;
105-
const char dummy = 'c';
106-
107-
if (channel->monitor && channel->metadata_stream) {
108-
ssize_t write_ret;
109-
110-
write_ret = lttng_write(channel->metadata_stream->ust_metadata_poll_pipe[1],
111-
&dummy, 1);
112-
if (write_ret < 1) {
113-
if (errno == EWOULDBLOCK) {
114-
/*
115-
* This is fine, the metadata poll thread
116-
* is having a hard time keeping-up, but
117-
* it will eventually wake-up and consume
118-
* the available data.
119-
*/
120-
ret = 0;
121-
} else {
122-
PERROR("Wake-up UST metadata pipe");
123-
ret = -1;
124-
goto end;
125-
}
126-
}
127-
}
128-
129-
end:
130-
return ret;
98+
return status;
13199
}
132100

133101
/*
@@ -136,23 +104,31 @@ int consumer_metadata_wakeup_pipe(const struct lttng_consumer_channel *channel)
136104
* contiguous metadata in cache to the ring buffer. The metadata cache
137105
* lock MUST be acquired to write in the cache.
138106
*
139-
* Return 0 on success, a negative value on error.
107+
* See `enum consumer_metadata_cache_write_status` for the meaning of the
108+
* various returned status codes.
140109
*/
141-
int consumer_metadata_cache_write(struct lttng_consumer_channel *channel,
110+
enum consumer_metadata_cache_write_status
111+
consumer_metadata_cache_write(struct lttng_consumer_channel *channel,
142112
unsigned int offset, unsigned int len, uint64_t version,
143113
const char *data)
144114
{
145115
int ret = 0;
146116
struct consumer_metadata_cache *cache;
117+
enum consumer_metadata_cache_write_status status;
118+
bool cache_is_invalidated = false;
119+
uint64_t original_max_offset;
147120

148121
assert(channel);
149122
assert(channel->metadata_cache);
150123

151124
cache = channel->metadata_cache;
125+
ASSERT_LOCKED(cache->lock);
126+
original_max_offset = cache->max_offset;
152127

153-
ret = metadata_cache_check_version(cache, version);
154-
if (ret < 0) {
155-
goto end;
128+
if (metadata_cache_update_version(cache, version) ==
129+
METADATA_CACHE_UPDATE_STATUS_VERSION_UPDATED) {
130+
metadata_cache_reset(cache);
131+
cache_is_invalidated = true;
156132
}
157133

158134
DBG("Writing %u bytes from offset %u in metadata cache", len, offset);
@@ -162,18 +138,25 @@ int consumer_metadata_cache_write(struct lttng_consumer_channel *channel,
162138
len - cache->cache_alloc_size + offset);
163139
if (ret < 0) {
164140
ERR("Extending metadata cache");
141+
status = CONSUMER_METADATA_CACHE_WRITE_STATUS_ERROR;
165142
goto end;
166143
}
167144
}
168145

169146
memcpy(cache->data + offset, data, len);
170-
if (offset + len > cache->max_offset) {
171-
cache->max_offset = offset + len;
172-
ret = consumer_metadata_wakeup_pipe(channel);
147+
cache->max_offset = max(cache->max_offset, offset + len);
148+
149+
if (cache_is_invalidated) {
150+
status = CONSUMER_METADATA_CACHE_WRITE_STATUS_INVALIDATED;
151+
} else if (cache->max_offset > original_max_offset) {
152+
status = CONSUMER_METADATA_CACHE_WRITE_STATUS_APPENDED_CONTENT;
153+
} else {
154+
status = CONSUMER_METADATA_CACHE_WRITE_STATUS_NO_CHANGE;
155+
assert(cache->max_offset == original_max_offset);
173156
}
174157

175158
end:
176-
return ret;
159+
return status;
177160
}
178161

179162
/*

src/common/consumer/consumer-metadata-cache.h

+23-2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,27 @@
1111

1212
#include <common/consumer/consumer.h>
1313

14+
enum consumer_metadata_cache_write_status {
15+
CONSUMER_METADATA_CACHE_WRITE_STATUS_ERROR = -1,
16+
/*
17+
* New metadata content was appended to the cache successfully.
18+
* Previously available content remains valid.
19+
*/
20+
CONSUMER_METADATA_CACHE_WRITE_STATUS_APPENDED_CONTENT = 0,
21+
/*
22+
* The new content pushed to the cache invalidated the content that
23+
* was already present. The contents of the cache should be re-read.
24+
*/
25+
CONSUMER_METADATA_CACHE_WRITE_STATUS_INVALIDATED,
26+
/*
27+
* A metadata cache write can simply overwrite an already existing
28+
* section of the cache (and it should be a write-through with identical
29+
* data). From the caller's standpoint, there is no change to the state
30+
* of the cache.
31+
*/
32+
CONSUMER_METADATA_CACHE_WRITE_STATUS_NO_CHANGE,
33+
};
34+
1435
struct consumer_metadata_cache {
1536
char *data;
1637
uint64_t cache_alloc_size;
@@ -35,13 +56,13 @@ struct consumer_metadata_cache {
3556
pthread_mutex_t lock;
3657
};
3758

38-
int consumer_metadata_cache_write(struct lttng_consumer_channel *channel,
59+
enum consumer_metadata_cache_write_status
60+
consumer_metadata_cache_write(struct lttng_consumer_channel *channel,
3961
unsigned int offset, unsigned int len, uint64_t version,
4062
const char *data);
4163
int consumer_metadata_cache_allocate(struct lttng_consumer_channel *channel);
4264
void consumer_metadata_cache_destroy(struct lttng_consumer_channel *channel);
4365
int consumer_metadata_cache_flushed(struct lttng_consumer_channel *channel,
4466
uint64_t offset, int timer);
45-
int consumer_metadata_wakeup_pipe(const struct lttng_consumer_channel *channel);
4667

4768
#endif /* CONSUMER_METADATA_CACHE_H */

src/common/consumer/consumer.c

+37
Original file line numberDiff line numberDiff line change
@@ -877,6 +877,43 @@ static int write_relayd_stream_header(struct lttng_consumer_stream *stream,
877877
return outfd;
878878
}
879879

880+
/*
881+
* Write a character on the metadata poll pipe to wake the metadata thread.
882+
* Returns 0 on success, -1 on error.
883+
*/
884+
int consumer_metadata_wakeup_pipe(const struct lttng_consumer_channel *channel)
885+
{
886+
int ret = 0;
887+
888+
DBG("Waking up metadata poll thread (writing to pipe): channel name = '%s'",
889+
channel->name);
890+
if (channel->monitor && channel->metadata_stream) {
891+
const char dummy = 'c';
892+
const ssize_t write_ret = lttng_write(
893+
channel->metadata_stream->ust_metadata_poll_pipe[1],
894+
&dummy, 1);
895+
896+
if (write_ret < 1) {
897+
if (errno == EWOULDBLOCK) {
898+
/*
899+
* This is fine, the metadata poll thread
900+
* is having a hard time keeping-up, but
901+
* it will eventually wake-up and consume
902+
* the available data.
903+
*/
904+
ret = 0;
905+
} else {
906+
PERROR("Failed to write to UST metadata pipe while attempting to wake-up the metadata poll thread");
907+
ret = -1;
908+
goto end;
909+
}
910+
}
911+
}
912+
913+
end:
914+
return ret;
915+
}
916+
880917
/*
881918
* Trigger a dump of the metadata content. Following/during the succesful
882919
* completion of this call, the metadata poll thread will start receiving

src/common/consumer/consumer.h

+1
Original file line numberDiff line numberDiff line change
@@ -1052,5 +1052,6 @@ enum lttcomm_return_code lttng_consumer_init_command(
10521052
int lttng_consumer_clear_channel(struct lttng_consumer_channel *channel);
10531053
enum lttcomm_return_code lttng_consumer_open_channel_packets(
10541054
struct lttng_consumer_channel *channel);
1055+
int consumer_metadata_wakeup_pipe(const struct lttng_consumer_channel *channel);
10551056

10561057
#endif /* LIB_CONSUMER_H */

src/common/ust-consumer/ust-consumer.c

+48-12
Original file line numberDiff line numberDiff line change
@@ -1283,6 +1283,17 @@ static int snapshot_channel(struct lttng_consumer_channel *channel,
12831283
return ret;
12841284
}
12851285

1286+
static
1287+
void metadata_stream_reset_cache_consumed_position(
1288+
struct lttng_consumer_stream *stream)
1289+
{
1290+
ASSERT_LOCKED(stream->lock);
1291+
1292+
DBG("Reset metadata cache of session %" PRIu64,
1293+
stream->chan->session_id);
1294+
stream->ust_metadata_pushed = 0;
1295+
}
1296+
12861297
/*
12871298
* Receive the metadata updates from the sessiond. Supports receiving
12881299
* overlapping metadata, but is needs to always belong to a contiguous
@@ -1297,6 +1308,7 @@ int lttng_ustconsumer_recv_metadata(int sock, uint64_t key, uint64_t offset,
12971308
{
12981309
int ret, ret_code = LTTCOMM_CONSUMERD_SUCCESS;
12991310
char *metadata_str;
1311+
enum consumer_metadata_cache_write_status cache_write_status;
13001312

13011313
DBG("UST consumer push metadata key %" PRIu64 " of len %" PRIu64, key, len);
13021314

@@ -1320,10 +1332,40 @@ int lttng_ustconsumer_recv_metadata(int sock, uint64_t key, uint64_t offset,
13201332
health_code_update();
13211333

13221334
pthread_mutex_lock(&channel->metadata_cache->lock);
1323-
ret = consumer_metadata_cache_write(channel, offset, len, version,
1324-
metadata_str);
1335+
cache_write_status = consumer_metadata_cache_write(
1336+
channel, offset, len, version, metadata_str);
13251337
pthread_mutex_unlock(&channel->metadata_cache->lock);
1326-
if (ret < 0) {
1338+
switch (cache_write_status) {
1339+
case CONSUMER_METADATA_CACHE_WRITE_STATUS_NO_CHANGE:
1340+
/*
1341+
* The write entirely overlapped with existing contents of the
1342+
* same metadata version (same content); there is nothing to do.
1343+
*/
1344+
break;
1345+
case CONSUMER_METADATA_CACHE_WRITE_STATUS_INVALIDATED:
1346+
/*
1347+
* The metadata cache was invalidated (previously pushed
1348+
* content has been overwritten). Reset the stream's consumed
1349+
* metadata position to ensure the metadata poll thread consumes
1350+
* the whole cache.
1351+
*/
1352+
pthread_mutex_lock(&channel->metadata_stream->lock);
1353+
metadata_stream_reset_cache_consumed_position(
1354+
channel->metadata_stream);
1355+
pthread_mutex_unlock(&channel->metadata_stream->lock);
1356+
/* Fall-through. */
1357+
case CONSUMER_METADATA_CACHE_WRITE_STATUS_APPENDED_CONTENT:
1358+
/*
1359+
* In both cases, the metadata poll thread has new data to
1360+
* consume.
1361+
*/
1362+
ret = consumer_metadata_wakeup_pipe(channel);
1363+
if (ret) {
1364+
ret_code = LTTCOMM_CONSUMERD_ERROR_METADATA;
1365+
goto end_free;
1366+
}
1367+
break;
1368+
case CONSUMER_METADATA_CACHE_WRITE_STATUS_ERROR:
13271369
/* Unable to handle metadata. Notify session daemon. */
13281370
ret_code = LTTCOMM_CONSUMERD_ERROR_METADATA;
13291371
/*
@@ -1332,6 +1374,8 @@ int lttng_ustconsumer_recv_metadata(int sock, uint64_t key, uint64_t offset,
13321374
* waiting for the metadata cache to be flushed.
13331375
*/
13341376
goto end_free;
1377+
default:
1378+
abort();
13351379
}
13361380

13371381
if (!wait) {
@@ -2464,15 +2508,6 @@ int lttng_ustconsumer_close_wakeup_fd(struct lttng_consumer_stream *stream)
24642508
return ustctl_stream_close_wakeup_fd(stream->ustream);
24652509
}
24662510

2467-
static
2468-
void metadata_stream_reset_cache_consumed_position(
2469-
struct lttng_consumer_stream *stream)
2470-
{
2471-
DBG("Reset metadata cache of session %" PRIu64,
2472-
stream->chan->session_id);
2473-
stream->ust_metadata_pushed = 0;
2474-
}
2475-
24762511
/*
24772512
* Write up to one packet from the metadata cache to the channel.
24782513
*
@@ -3051,6 +3086,7 @@ int lttng_ustconsumer_data_pending(struct lttng_consumer_stream *stream)
30513086

30523087
assert(stream);
30533088
assert(stream->ustream);
3089+
ASSERT_LOCKED(stream->lock);
30543090

30553091
DBG("UST consumer checking data pending");
30563092

0 commit comments

Comments
 (0)