Skip to content

Update lib/events/dynamoevents to use aws-sdk-go-v2#44363

Merged
rosstimothy merged 1 commit intomasterfrom
tross/aws-sdk-v2-dynamoevents
Jul 23, 2024
Merged

Update lib/events/dynamoevents to use aws-sdk-go-v2#44363
rosstimothy merged 1 commit intomasterfrom
tross/aws-sdk-v2-dynamoevents

Conversation

@rosstimothy
Copy link
Copy Markdown
Contributor

This is a continuation of converting dynamodb components to use the latest version of the sdk that was started in
#44356.

This should have feature parity with the existing backend except for prometheus metrics. In an attempt to isolate the changes here the metrics are omitted for the time being and will be added in a follow up.

In addition, a few of the events test suite cases were updated to be more reliable when testing against a real backend.

@rosstimothy rosstimothy added the no-changelog Indicates that a PR does not require a changelog entry label Jul 17, 2024
@rosstimothy rosstimothy force-pushed the tross/aws-sdk-v2-dynamoevents branch 2 times, most recently from 953de8d to 811d9ac Compare July 18, 2024 13:19
@rosstimothy rosstimothy force-pushed the tross/aws-sdk-v2-dynamo branch 4 times, most recently from b08d9ec to eb52d5b Compare July 19, 2024 15:56
Base automatically changed from tross/aws-sdk-v2-dynamo to master July 19, 2024 16:43
This is a continuation of converting dynamodb components to use the
latest version of the sdk that was started in
#44356.

This should have feature parity with the existing backend except
for prometheus metrics. In an attempt to isolate the changes here
the metrics are omitted for the time being and will be added in a
follow up.

In addition, a few of the events test suite cases were updated to
be more reliable when testing against a real backend.
@rosstimothy rosstimothy force-pushed the tross/aws-sdk-v2-dynamoevents branch from 811d9ac to 55311ac Compare July 19, 2024 19:06
@rosstimothy rosstimothy marked this pull request as ready for review July 19, 2024 19:47
@github-actions github-actions bot requested review from lxea and smallinsky July 19, 2024 19:48
@github-actions github-actions bot added audit-log Issues related to Teleports Audit Log size/sm labels Jul 19, 2024
@rosstimothy
Copy link
Copy Markdown
Contributor Author

Friendly ping @smallinsky @lxea

@rosstimothy rosstimothy requested a review from zmb3 July 23, 2024 15:45
@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from lxea July 23, 2024 16:13
@rosstimothy rosstimothy added this pull request to the merge queue Jul 23, 2024
Merged via the queue into master with commit 80e6b4d Jul 23, 2024
@rosstimothy rosstimothy deleted the tross/aws-sdk-v2-dynamoevents branch July 23, 2024 16:32
hugoShaka added a commit that referenced this pull request Sep 5, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
hugoShaka added a commit that referenced this pull request Sep 5, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
hugoShaka added a commit that referenced this pull request Sep 5, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
hugoShaka added a commit that referenced this pull request Sep 8, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
github-merge-queue bot pushed a commit that referenced this pull request Sep 9, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
backport-bot-workflows bot pushed a commit that referenced this pull request Sep 9, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
hugoShaka added a commit that referenced this pull request Sep 11, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
github-merge-queue bot pushed a commit that referenced this pull request Sep 11, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
github-merge-queue bot pushed a commit that referenced this pull request Sep 11, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
mmcallister pushed a commit that referenced this pull request Sep 22, 2025
This commit fixes a data loss bug causing the DynamoDB cursor EventIndex
field to be sightly changed due to conversion issues. As this field is
used to index events, this could lead to paginated queries not returning
the right events, either returning events from before or after the
requirested page. In the worst case, this could cause a livelock as the
query continuisly processes the same events.

The data loss issue is caused by improper JSON unmarshalling of large
integers. This happened because of this reasons:
- JSON is fundamentally flawed as it offers a single number type "binary64"
  for all numbers, whether they are integers or float. Go's
  encoding/json library uses field types to detect if the number should
  be stored in an int64 or a float64.
- [The AWS SDK v2 migration PR](#44363)
  changed the cursor JSON unmarshalling logic and unmarshalled the
  cursor into `map[string]any`. This caused every integer field of
  `event` to round-trip through float64.
- [The Emit event fallback PR](#40854)
  changed the EventIndex value from a small incremental integer to a
  large unix nanosecond timestamp in case of conflict. The large value
  was no longer safe for storage in a float64.

The combination of those 3 factors caused the cursor EventIndex to get
corrupted and caused unexpected event query index offsets. When preseted
with a non-existing document, DynamoDB still hashes it and starts the
query from its supposed location in the index. This is why this issue
has not been detected for so long. Its consequences were:
- duplicated events returned on 2 consecutive pages (this case was
  handled properly by the event forwarder as it keeps track of the last
  processed event)
- livelock if the number of duplicated events exceed the page size
- non-forwarded events if the index offset was in the future
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audit-log Issues related to Teleports Audit Log no-changelog Indicates that a PR does not require a changelog entry size/sm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants