Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server has a large number of CLOSE_WAIT (client has already closed) #633

Closed
cool-colo opened this issue Aug 24, 2016 · 11 comments
Closed

Server has a large number of CLOSE_WAIT (client has already closed) #633

cool-colo opened this issue Aug 24, 2016 · 11 comments
Assignees
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Milestone

Comments

@cool-colo
Copy link

cool-colo commented Aug 24, 2016

Configuration: SRS as the edge server, nginx rtmp as the origin server.
Reproduction steps:

  1. Request for a non-existent stream (using VLC repeatedly also has the same effect).
    for i in {1..2000}; do curl -m 1 "http://192.168.5.222:8090/live/livestream${i}.flv"; done
    Results:
  2. A large number of CLOSE WAIT connections.
    netstat -an | grep 8090 | grep CLOSE_WAIT | wc -l
    1963
  3. The edge server keeps pulling 2000 non-existent streams from the origin server.

Note: This situation does not occur with rtmp requests. Could it be that the http server side is not closing the file descriptor?

Configuration file:
listen 1936;
max_connections 8196;
pid objs/srs.pid;
srs_log_level verbose;
srs_log_file objs/srs.log;
http_server {
enabled on;
listen 8090;
dir ./objs/nginx/html;
}
vhost defaultVhost {
mode remote;
origin 127.0.0.1:1939;
http_remux {
enabled on;
mount [vhost]/[app]/[stream].flv;
hstrs on;
}
}

SRS log:
[2016-08-24 13:52:29.861][trace][18243][1012] edge pull connected, url=rtmp://127.0.0.1:1939/live/livestream146, server=127.0.0.1:1939
[2016-08-24 13:52:29.861][trace][18243][6114] edge pull connected, url=rtmp://127.0.0.1:1939/live/livestream1002, server=127.0.0.1:1939
[2016-08-24 13:52:29.861][trace][18243][10044] edge pull connected, url=rtmp://127.0.0.1:1939/live/livestream1671, server=127.0.0.1:1939
[2016-08-24 13:52:29.861][trace][18243][7686] edge pull connected, url=rtmp://127.0.0.1:1939/live/livestream1268, server=127.0.0.1:1939
[2016-08-24 13:52:29.861][trace][18243][9228] edge pull connected, url=rtmp://127.0.0.1:1939/live/livestream1530, server=127.0.0.1:1939
[2016-08-24 13:52:29.866][warn][18243][11766][104] origin disconnected, retry. ret=1007
[2016-08-24 13:52:29.866][warn][18243][8412][104] origin disconnected, retry. ret=1007
[2016-08-24 13:52:29.867][warn][18243][1294][104] origin disconnected, retry. ret=1007
[2016-08-24 13:52:29.867][warn][18243][9888][104] origin disconnected, retry. ret=1007
[2016-08-24 13:52:29.867][warn][18243][11292][104] origin disconnected, retry. ret=1007
[2016-08-24 13:52:29.867][warn][18243][3832][104] origin disconnected, retry. ret=1007

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Aug 24, 2016

TCP is like this, unsolvable.

TRANS_BY_GPT3

@wanghuan578
Copy link

wanghuan578 commented Nov 28, 2016

I solve the resource leakage and CLOSE_WAIT problem using the following methods.

int SrsLiveStream::serve_http(ISrsHttpResponseWriter* w, ISrsHttpMessage* r)
{
    int ret = ERROR_SUCCESS;
    ISrsStreamEncoder* enc = NULL;
    ssize_t down_volum = 0;

    time_t start_time;
    time_t last_time;
    time(&start_time);
    last_time = start_time;

    srs_assert(entry);

    if (srs_string_ends_with(entry->pattern, ".flv")) {
        w->header()->set_content_type("video/x-flv");
#ifdef SRS_PERF_FAST_FLV_ENCODER
        enc = new SrsFastFlvStreamEncoder();
#else
        enc = new SrsFlvStreamEncoder();
#endif
    } else if (srs_string_ends_with(entry->pattern, ".aac")) {
        w->header()->set_content_type("audio/x-aac");
        enc = new SrsAacStreamEncoder();
    } else if (srs_string_ends_with(entry->pattern, ".mp3")) {
        w->header()->set_content_type("audio/mpeg");
        enc = new SrsMp3StreamEncoder();
    } else if (srs_string_ends_with(entry->pattern, ".ts")) {
        w->header()->set_content_type("video/MP2T");
        enc = new SrsTsStreamEncoder();
    } else {
        ret = ERROR_HTTP_LIVE_STREAM_EXT;
        srs_error("http: unsupported pattern %s", entry->pattern.c_str());
        return ret;
    }
    SrsAutoFree(ISrsStreamEncoder, enc);
    
    // create consumer of souce, ignore gop cache, use the audio gop cache.
    SrsConsumer* consumer = NULL;
    if ((ret = source->create_consumer(NULL, consumer, true, true, !enc->has_cache())) != ERROR_SUCCESS) {
        srs_error("http: create consumer failed. ret=%d", ret);
        return ret;
    }
    SrsAutoFree(SrsConsumer, consumer);
    srs_verbose("http: consumer created success.");

    SrsPithyPrint* pprint = SrsPithyPrint::create_http_stream();
    SrsAutoFree(SrsPithyPrint, pprint);
    
    SrsMessageArray msgs(SRS_PERF_MW_MSGS);
    
    // the memory writer.
    SrsStreamWriter writer(w);
    if ((ret = enc->initialize(&writer, cache)) != ERROR_SUCCESS) {
        srs_error("http: initialize stream encoder failed. ret=%d", ret);
        return ret;
    }
    
    // if gop cache enabled for encoder, dump to consumer.
    if (enc->has_cache()) {
        if ((ret = enc->dump_cache(consumer, source->jitter())) != ERROR_SUCCESS) {
            srs_error("http: dump cache to consumer failed. ret=%d", ret);
            return ret;
        }
    }
    
#ifdef SRS_PERF_FAST_FLV_ENCODER
    SrsFastFlvStreamEncoder* ffe = dynamic_cast<SrsFastFlvStreamEncoder*>(enc);
#endif

    int repeat = 0;
    int repeat_max = 0;
    bool ingester_auto_release = false;
    bool is_succeed = false;

    if(_srs_config->get_ingester_auto_release_enabled()) {
	
		//srs_trace("ingester auto release on");

		repeat_max = _srs_config->get_candidate_source_timeout();

		ingester_auto_release = true;
    }


    // TODO: free and erase the disabled entry after all related connections is closed.
    while (entry->enabled) {
        pprint->elapse();

        // get messages from consumer.
        // each msg in msgs.msgs must be free, for the SrsMessageArray never free them.
        int count = 0;
        if ((ret = consumer->dump_packets(&msgs, count)) != ERROR_SUCCESS) {
            srs_error("http: get messages from consumer failed. ret=%d", ret);
            return ret;
        }
        
        if (count <= 0) {
            srs_info("http: sleep %dms for no msg", SRS_CONSTS_RTMP_PULSE_TIMEOUT_US);
            // directly use sleep, donot use consumer wait.
            st_usleep(SRS_CONSTS_RTMP_PULSE_TIMEOUT_US);
	    
	    if(ingester_auto_release) {//wanghuan debug start
	        
	        if(++repeat > repeat_max) {//try max times give up
 	    
		    	srs_error("get rtmp stream: %s time out, quit.", entry->pattern.c_str());
			srs_access("session :[%s] no available source found, quit. url = %s", r->query_get("session").c_str(), r->url().c_str());
		    	return 1008;
	        }
			else {
				if(repeat > 1) {
		    			srs_warn("%s source not found temporarily, retry times = %d", entry->pattern.c_str(), repeat);
				}
			}
	    }//wanghuan debug end
            
            // ignore when nothing got.
            continue;
        }
	
		if(ingester_auto_release) {

		    if(repeat > 0) {

		    	repeat = 0;
		    }
		}

	if(!is_succeed) {
 	
	srs_access("session :[%s] source found, excellently. url = %s", r->query_get("session").c_str(), r->url().c_str());
		is_succeed = true;
	}

        if (pprint->can_print()) {
            srs_info("-> "SRS_CONSTS_LOG_HTTP_STREAM" http: got %d msgs, age=%d, min=%d, mw=%d", 
                count, pprint->age(), SRS_PERF_MW_MIN_MSGS, SRS_CONSTS_RTMP_PULSE_TIMEOUT_US / 1000);
        }
      
	ssize_t nbytes = 0; 
        // sendout all messages.
#ifdef SRS_PERF_FAST_FLV_ENCODER
        if (ffe) {
            ret = ffe->write_tags(msgs.msgs, count, &nbytes);
        } else {
            ret = streaming_send_messages(enc, msgs.msgs, count);
        }
#else
        ret = streaming_send_messages(enc, msgs.msgs, count);//wanghuan debug
#endif
	
	down_volum += nbytes;
	
	time_t now;
	time(&now);

	if(now - last_time > FLV_ACCESS_LOG_CHECK_INTERVAL)
        {
                //srs_trace("get flv checking...");

                void *handle = GetAccessShm();
                if(NULL != handle)
                {
                        AccessInfo_t access;
                        memset(&access, 0x00, sizeof(AccessInfo_t));
                        access.AccessType = ACCESS_TYPE_GET_STREAM;
                        strcpy(access.Address, r->get_http_client_ip()->c_str());
			srs_trace("access.address = %s", access.Address);

			std::string version = r->query_get("client_version");			
			if(!version.empty()) {
				strcpy(access.Client.Version, version.c_str());
			}
			access.DownloadVolum = down_volum;
			srs_trace("get DownloadVolum = %d", access.DownloadVolum);
			access.StartTime = start_time;
			strcpy(access.Uri, r->url().c_str());
			AccessPost(NULL, &access, handle);

                        DestroyAccessShm(handle);
                }
                else
                {
                        srs_trace("Get Shm Failed.");
                }

                last_time = now;
        }

        // free the messages.
        for (int i = 0; i < count; i++) {
            SrsSharedPtrMessage* msg = msgs.msgs[i];
            srs_freep(msg);
        }
        
        // check send error code.
        if (ret != ERROR_SUCCESS) {
            if (!srs_is_client_gracefully_close(ret)) {
		srs_error("http: send messages to client failed. ret=%d", ret);
            }
	
	    //srs_trace("get flv checking...");

	    void *handle = GetAccessShm();
	    if(NULL != handle) {

		AccessInfo_t access;
		memset(&access, 0x00, sizeof(AccessInfo_t));
		access.AccessType = ACCESS_TYPE_GET_STREAM;
		strcpy(access.Address, r->get_http_client_ip()->c_str());
		std::string version = r->query_get("client_version");
		if(!version.empty()) {
			strcpy(access.Client.Version, version.c_str());
		}
		access.DownloadVolum = down_volum;
		srs_trace("get DownloadVolum = %d end", access.DownloadVolum);
		access.StartTime = start_time;

		time_t now;
		time(&now);
		access.EndTime = now;

		strcpy(access.Uri, r->url().c_str());
		AccessPost(NULL, &access, handle);
	
		DestroyAccessShm(handle);
	    }


            return ret;
        }
    }
   
    return ret;
}

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Nov 28, 2016

  1. Github has a feature called PR, where you can submit a PR.
  2. You should first specify where the leak occurred, provide evidence of the problem, and then prescribe a solution.

TRANS_BY_GPT3

@walkermi
Copy link
Contributor

walkermi commented Apr 25, 2017

This is not a TCP issue, but a bug in SRS. When starting the http-flv function, the player requests a non-existent stream. The program enters an infinite loop in serve_http, unable to proceed with subsequent execution, unable to close the socket, and unable to send out FIN, resulting in the accumulation of CLOSE_WAIT.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Apr 25, 2017

Well, not checking the fd status resulted in closing it before exiting the loop. It should be fixed.

TRANS_BY_GPT3

@wanghuan578
Copy link

wanghuan578 commented Apr 25, 2017

It is obviously unrealistic to solve the fd leakage elegantly under the current architecture. However, it can be resolved using my more conservative approach. The ingester can be set with a timeout for returning to the source. If the ingester fails to return within the specified time, it can be released at the application level, thus resolving the close-wait issue.

Reference to the previous comment: ingester_auto_release

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Apr 25, 2017

I will find time to fix it over the weekend.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Apr 25, 2017

If you think your solution is simple and reliable, you should submit a PR, because a PR can show what modifications have been made. If you just paste the code into an issue, I won't be able to understand what changes have been made.

TRANS_BY_GPT3

@zhoulipeng
Copy link

zhoulipeng commented May 16, 2017

May I ask if this problem has been resolved? We encountered it during our usage.

TRANS_BY_GPT3

@zhoulipeng
Copy link

zhoulipeng commented May 17, 2017

The problem of merging into 636 has been resolved, at ae54501.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

Fixed by f2b4bc7

@winlinvip winlinvip self-assigned this Sep 23, 2021
@winlinvip winlinvip added the Bug It might be a bug. label Sep 23, 2021
@winlinvip winlinvip added this to the 2.0 milestone Sep 23, 2021
@winlinvip winlinvip changed the title 服务端大量CLOSE_WAIT(播放端已经关闭) Server has a large number of CLOSE_WAIT (client has already closed) Jul 27, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Projects
None yet
Development

No branches or pull requests

5 participants