Skip to content
This repository has been archived by the owner on Feb 7, 2024. It is now read-only.

Laravel websockets periodically failed to broadcast event #717

Closed
rakshitbharat opened this issue Mar 17, 2021 · 11 comments
Closed

Laravel websockets periodically failed to broadcast event #717

rakshitbharat opened this issue Mar 17, 2021 · 11 comments
Labels
help wanted Extra attention is needed network Issues caused by the network configuration

Comments

@rakshitbharat
Copy link

We are a small team and we have trouble fixing the following bug:

We are implementing event broadcast in Laravel project using Beyondcode Laravel-Websockets and VueJs client that listens to echo private channel. When the event broadcasted, it works fine, but the next event broadcast is not delivered to the client. It needs at least 1 minutes with no activity for each event broadcasted, then it works again.

A bit strange is that this happens only in production server - the development server is fine.

backend:
laravel 8.0
jetstream 1.3
pusher-php-server 4.1
inertiajs 0.2.4
beyondcode/laravel-websockets 1.11.1

frontend:
axios 0.19
pusher-js 7.0
vue 2.6.11
tailwindcss 2.0.3
laravel-echo 1.8.1

@simonbuehler
Copy link
Contributor

using a queue? without a better diff of dev vs production config the infos are not enough to help

@fardiansyah-bioelite
Copy link

fardiansyah-bioelite commented Mar 19, 2021

hi @simonbuehler , we have exactly same issue as above. laravel-websockets running behind nginx proxy with ssl. dev & prod server use same config as following:

location @ws {
        proxy_pass             http://127.0.0.1:6001;
        proxy_set_header Host  $host;
        proxy_read_timeout     60;
        proxy_connect_timeout  60;
        proxy_redirect         off;

        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
   }

after using laravel queue, the event broadcast is better but sometime it still hit the issue (event not firing). any suggests to handle this situation?we hit the issue even with only 1 active user that connect to the server.

we are on cloud vm with 3 vCPU and 4G RAM with ssd

@simonbuehler
Copy link
Contributor

maybe first try with another BROADCAST_DRIVER (log) and see if events always get fired then, do the events broadcast or broadcastnow ?
what does the debug log of artisan websockets:serve show and does the queue:listen show the events coming in?

@shoemoney
Copy link

Before implementing the nchan.io mod I found that adding a buffer on nginx helped if the queue is sync. If your using async queues then that is very odd. Using the BROADCAST_DRIVER log is a great suggestion to make sure they are firing.

I highly recommend the nchan mod tho... amazing flexability on the frontend for pub/sub and sockets. Comes pretty much default with all nginx installs.

https://nchan.io/

@fardiansyah-bioelite
Copy link

fardiansyah-bioelite commented Mar 23, 2021

maybe first try with another BROADCAST_DRIVER (log) and see if events always get fired then, do the events broadcast or broadcastnow ?
what does the debug log of artisan websockets:serve show and does the queue:listen show the events coming in?

actually, i have some broadcast events. some of them run at scheduled jobs and others on controller. using BROADCAST_DRIVER log, i see not all of these events are get fired. another exception on log is below:

[2021-03-23 19:42:22] local.ERROR: pcntl_alarm() expects parameter 1 to be int, string given {"exception":"[object] (ErrorException(code: 0): pcntl_alarm() expects parameter 1 to be int, string given at /project_folder/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:211)

i think exception above is in relation with following exception, so the event failed to get fired.

[2021-03-23 19:42:35] local.ERROR: App\Events\TimeoutWarningSent has been attempted too many times or run too long. The job may have previously timed out. {"exception":"[object] (Illuminate\\Queue\\MaxAttemptsExceededException(code: 0): App\\Events\\TimeoutWarningSent has been attempted too many times or run too long. The job may have previously timed out. at /project_folder/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:717)

on queue:listen the event is coming, but after Processing line it shown as Failed instead of Processed.

@fardiansyah-bioelite
Copy link

Before implementing the nchan.io mod I found that adding a buffer on nginx helped if the queue is sync. If your using async queues then that is very odd. Using the BROADCAST_DRIVER log is a great suggestion to make sure they are firing.

I highly recommend the nchan mod tho... amazing flexability on the frontend for pub/sub and sockets. Comes pretty much default with all nginx installs.

https://nchan.io/

hi @shoemoney ,

currently i use redis as queue driver. using BROADCAST_DRIVER log result as above on log.

thanks for recommending nchan, will take a look for it.

@shoemoney
Copy link

maybe first try with another BROADCAST_DRIVER (log) and see if events always get fired then, do the events broadcast or broadcastnow ?
what does the debug log of artisan websockets:serve show and does the queue:listen show the events coming in?

actually, i have some broadcast events. some of them run at scheduled jobs and others on controller. using BROADCAST_DRIVER log, i see not all of these events are get fired. another exception on log is below:

[2021-03-23 19:42:22] local.ERROR: pcntl_alarm() expects parameter 1 to be int, string given {"exception":"[object] (ErrorException(code: 0): pcntl_alarm() expects parameter 1 to be int, string given at /project_folder/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:211)

i think exception above is in relation with following exception, so the event failed to get fired.

[2021-03-23 19:42:35] local.ERROR: App\Events\TimeoutWarningSent has been attempted too many times or run too long. The job may have previously timed out. {"exception":"[object] (Illuminate\\Queue\\MaxAttemptsExceededException(code: 0): App\\Events\\TimeoutWarningSent has been attempted too many times or run too long. The job may have previously timed out. at /project_folder/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:717)

on queue:listen the event is coming, but after Processing line it shown as Failed instead of Processed.

I am not sure you can run it with timeouts like you are.... the pcntl extention doesnt take a string like the job scheduler. Hopefully the BeyondCode people will chime in.

I have only ran it as a daemon with the artisan command so not sure on that.

@shoemoney
Copy link

Before implementing the nchan.io mod I found that adding a buffer on nginx helped if the queue is sync. If your using async queues then that is very odd. Using the BROADCAST_DRIVER log is a great suggestion to make sure they are firing.
I highly recommend the nchan mod tho... amazing flexability on the frontend for pub/sub and sockets. Comes pretty much default with all nginx installs.
https://nchan.io/

hi @shoemoney ,

currently i use redis as queue driver. using BROADCAST_DRIVER log result as above on log.

thanks for recommending nchan, will take a look for it.

Ya if you use nginx I think you will be very impressed. It is so dynamic and flexible and works great with socket.io. literally you can match about anything.....

Like for instance....

location ~ /pricestreams/(\w+)$ {
nchan_pubsub;
nchan_channel_id "$1" “OHLCV;
nchan_group_max_subscribers 25;
nchan_store_messages on;
nchan_message_buffer_length 0;

 nchan_unsubscribe_request /upstream/unsub;
 nchan_subscribe_request /upstream/sub;

 nchan_websocket_ping_interval 30;
 nchan_websocket_client_heartbeat _ping _pong;

}

location = /pub/lastUpdates {
nchan_publisher;
nchan_channel_id "lastUpdates";

 nchan_store_messages off;
 nchan_message_buffer_length 0;

}

location ~ /pub/lastUpdates/(\w+)$ {
nchan_publisher;
nchan_channel_id "lastUpdates/$1";
nchan_store_messages off;
nchan_message_buffer_length 0;
}

location ~ /sub/account/(.*)$ {
nchan_subscriber;
nchan_channel_id "account/$1";
nchan_group_max_subscribers 5;
nchan_unsubscribe_request /upstream/unsub;
nchan_subscribe_request /upstream/sub;
}

location ~ /pub/account/(.*)$ {
nchan_publisher;
nchan_channel_id "account/$1";

 nchan_store_messages off;
 nchan_message_buffer_length 0;

}

location = /upstream/unsub {
proxy_pass https://api-{{ hostname }}/api/wsCallback/unsubscribe;
proxy_ignore_client_abort on;
proxy_set_header X-Subscriber-Type $nchan_subscriber_type;
proxy_set_header X-Subscriber-Addr $remote_addr;
proxy_set_header X-Channel-Id $nchan_channel_id;
proxy_set_header X-Original-URI $request_uri;
}

location = /upstream/sub {
proxy_pass https://api-{{ hostname }}/api/wsCallback/subscribe;
proxy_set_header X-Subscriber-Type $nchan_subscriber_type;
proxy_set_header X-Subscriber-Addr $remote_addr;
proxy_set_header X-Message-Id $nchan_message_id;
proxy_set_header X-Channel-Id $nchan_channel_id;
proxy_set_header X-Original-URI $request_uri;
}
location /status {
nchan_stub_status;
}

@fardiansyah-bioelite
Copy link

fardiansyah-bioelite commented Mar 24, 2021

Hi @shoemoney ,
thanks for your great example. honestly this nchan is new thing for me. i have not heard about this before. based on your experience, did you have implement horizontal scaling using this nchan and socket.io?

we have plan to try 3rd party service for easier scaling and reduce the complexity to manage websockets server.

@shoemoney
Copy link

Hi @shoemoney ,
thanks for your great example. honestly this nchan is new thing for me. i have not heard about this before. based on your experience, did you have implement horizontal scaling using this nchan and socket.io?

we have plan to try 3rd party service for easier scaling and reduce the complexity to manage websockets server.

Yes because you can use redis with it, it can scale to the moon.

*** Side note : I just started using this this morning and am very impressed: https://github.com/walkor/phpsocket.io

@rennokki rennokki added help wanted Extra attention is needed network Issues caused by the network configuration labels Mar 30, 2021
@mdprotacio
Copy link
Contributor

A bit strange is that this happens only in production server - the development server is fine.

I'd assume you have horizontal scaling in place and multiple instances of the websocket? It's also our problem and we addressed it using #778. There's quite a lot of issues that we have encountered and tried to address.

@mpociot mpociot closed this as completed Feb 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed network Issues caused by the network configuration
Projects
None yet
Development

No branches or pull requests

7 participants