Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in lock-free SpscFifo::size() & SpscFifo::is_full() method #2156

Closed
elfenpiff opened this issue Jan 9, 2024 · 5 comments · Fixed by #2205
Closed

Bug in lock-free SpscFifo::size() & SpscFifo::is_full() method #2156

elfenpiff opened this issue Jan 9, 2024 · 5 comments · Fixed by #2205
Labels
question Further information is requested

Comments

@elfenpiff
Copy link
Contributor

Required information

Races that lead to random behavior in SpscFifo::size() and SpscFifo::is_full() method.
The SpscFifo has the contract that at most one thread can call push and at most one thread can call pop concurrently. All other methods can be called without restriction, since they do not modify the fifo.
One reasonable assumption is that size() returns in the range of [0; capacity[ and that the size it returns was valid at some timepoint in the past.

The implementation of the size method looks like:

return m_write_pos.load(std::memory_order_relaxed) - m_read_pos.load(std::memory_order_relaxed);

In the background, C++ will call the free function uint64_t operator-(uint64_t lhs, uint64_t rhs);. The evaluation order of the input arguments of the function can be freely chosen by the compiler.

Let's assume the following scenario.

  1. In the beginning, the value of m_write_pos == 5 and m_read_pos == 3
  2. The user calls SpscFifo::size() from thread A. After the value of m_write_pos.load(..) - (5) - is acquired the OS interrupts the method.
  3. Now, thread B and thread C are performing multiple push and pop operations according to the contract.
    The new state after the operations is m_write_pos == 8 and m_read_pos == 6.
  4. The OS schedules thread A again, and it continues by reading the value of m_read_pos.load(..) - (6) -. Now the subtraction is performed 5 - 6 == -1 which lead to an underflow. The underflow of an unsigned integer is well-defined, and the result is 2^64.
  5. The size() method returns 2^64.

Result: the size() method returns garbage values in a concurrent scenario.

is_full method has the same problem, but the severity should not be so severe since it returns a bool, but it can still happen that is_full returns a state that never existed in the past. If an algorithm is requiring that at least at some point in time the state was valid, it will fail.

Fix

Update the algorithm to

uint64_t write_pos = 0;
uint64_t read_pos = 0;
uint64_t size = 0;
do {
  // load current position value
  write_pos = m_write_pos.load(std::memory_order_relaxed);
  read_pos = m_read_pos.load(std::memory_order_relaxed);
  size = write_pos - read_pos;
  // ensure that the values did not change during the size calculation
} while (write_pos != m_write_pos.load(std::memory_order_relaxed) || read_pos != m_read_pos.load(std::memory_order_relaxed))
@budrus
Copy link
Contributor

budrus commented Jan 15, 2024

@elfenpiff I thought SoFi was an overflowing SPSC queue. With this in mind, would it be OK?. I also have doubts about your fix. You still have the problem as the other threads can change the world between the initial write and read position loads, or?

I think the contract has to be that only one thread changes the write position

@elBoberido
Copy link
Member

@budrus you misread the issue title. This is related to the FIFO not the SOFI and the SOFI already has the do-while loop. We just need to port it over.

@budrus
Copy link
Contributor

budrus commented Jan 15, 2024

Ahh, sorry I was coming from #2160 which is about SOFI. But the name is even SpScFiFo so we are talking about single producer + single consumer. My definition of this would be a single thread pushing and a single thread poping. Also not allowing the same publisher or subscriber to be used in more than one thread without something like a memory synchronizing mutex

@elBoberido
Copy link
Member

@budrus yeah, you are right. I totally overlooked that. this is a complete misuse of the SpscFifo. @elfenpiff do you have a use case for calling size or is_full from another thread than the producer or consumer thread?

@mossmaurice mossmaurice added bug Something isn't working question Further information is requested and removed bug Something isn't working labels Jan 22, 2024
@elBoberido
Copy link
Member

@elfenpiff closing this issue since the calling size and is_full from a different thread than the producer or consumer thread is not supported, at least not when these are running. The information one obtains from these functions is also absolutely useless in a third thread as long as the producer and/or consumer thread are running. If your use case relates to getting these information when either the producer or consumer thread isn't running anymore, then the current logic is safe to use and does not result in the issue you described.

@albtam albtam mentioned this issue Feb 25, 2024
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants