Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some routes are not sent to peers when commands are sent too quick #736

Closed
ravikumar727 opened this issue Oct 19, 2017 · 12 comments
Closed

Comments

@ravikumar727
Copy link

ravikumar727 commented Oct 19, 2017

ISSUE TYPE
  • Bug Report
OS
Linux agent-01001.node 4.1.12-94.3.9.el7uek.x86_64 #2 SMP Fri Jul 14 20:09:40 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux
VERSION
(docker) [agent-01001 /]# exabgp --version
ExaBGP : 3.4.20
Python : 2.7.5 (default, Sep  5 2016, 02:30:38)  [GCC 4.8.5 20150623 (Red Hat 4.8.5-9)]
Uname  : #2 SMP Fri Jul 14 20:09:40 PDT 2017
ENVIRONMENT
Empty output
CONFIGURATION

SUMMARY

We have an agent that runs as the ExaBGP child process. When the process starts, it does the bootstrap and gets the route details from the route provider service. The agent gets around 15000 routes in 1-2 seconds and pushes it ExaBGP as commands. I am observing that around 300 routes in 15000 are not sent to the BGP peers.

I have looked at the RIB and they have all the routes. I have instrumented the code in the store.py::updates to confirm if update groups are missing those routes. I couldn't find them missing there. With tcpdump, I have validated that these routes are not leaving the wire from the ExaBGP node.

When I added 50 millisecond sleep between commands from agent, it is able to handle without dropping any routes.

May I know where else can these routes be dropped? Any pointers to the module that could potentially miss these routes would be great.

		num_sorted_dict = len(self._modify_sorted)
		num_nlri_dict = len(self._modify_nlri)

		logger = Logger()

		# Print modified dict
		num_sorted_dict_entries = 0
		for key, value in self._modify_sorted.items():
			for key1, value1 in value.items():
				hex_key1 = self._convert_bytes_to_hexstring(key1)
				logger.store("store:updates modified list entry {} -> {} -> {}".format(key, hex_key1, value1.extensive()))
				num_sorted_dict_entries = num_sorted_dict_entries + 1
		logger.store("store:updates clearing _modify_nlri:{}, _modify_sorted:{}, entries in modified_sorted: {}".format(num_nlri_dict, num_sorted_dict, num_sorted_dict_entries))

		self._modify_sorted = {}
		self._modify_nlri = {}
@4000000059e7eaee39661d5c.s:2017-10-18_23:59:32.74975 Wed, 18 Oct 2017 23:59:32 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:48, _modify_sorted:5, entries in modified_sorted: 48
@4000000059e7eaee39661d5c.s:2017-10-18_23:59:32.76814 Wed, 18 Oct 2017 23:59:32 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:0, _modify_sorted:0, entries in modified_sorted: 0
@4000000059e7eaee39661d5c.s:2017-10-18_23:59:32.78154 Wed, 18 Oct 2017 23:59:32 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:67, _modify_sorted:5, entries in modified_sorted: 67
@4000000059e7eaee39661d5c.s:2017-10-18_23:59:32.90707 Wed, 18 Oct 2017 23:59:32 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:0, _modify_sorted:0, entries in modified_sorted: 0
@4000000059e7eaee39661d5c.s:2017-10-18_23:59:32.92866 Wed, 18 Oct 2017 23:59:32 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:110, _modify_sorted:6, entries in modified_sorted: 110
@4000000059e7eaef2417ccac.s:2017-10-18_23:59:32.98609 Wed, 18 Oct 2017 23:59:32 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:0, _modify_sorted:0, entries in modified_sorted: 0
@4000000059e7eaef2417ccac.s:2017-10-18_23:59:32.99091 Wed, 18 Oct 2017 23:59:32 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:127, _modify_sorted:10, entries in modified_sorted: 127
@4000000059e7eaf02a39e9e4.s:2017-10-18_23:59:34.37156 Wed, 18 Oct 2017 23:59:34 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:763, _modify_sorted:37, entries in modified_sorted: 763
@4000000059e7eaf129ce722c.s:2017-10-18_23:59:35.24238 Wed, 18 Oct 2017 23:59:35 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:594, _modify_sorted:1, entries in modified_sorted: 594
@4000000059e7eaf20df547c4.s:2017-10-18_23:59:36.14813 Wed, 18 Oct 2017 23:59:36 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:632, _modify_sorted:1, entries in modified_sorted: 632
@4000000059e7eaf30eea041c.s:2017-10-18_23:59:37.07389 Wed, 18 Oct 2017 23:59:37 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:616, _modify_sorted:5, entries in modified_sorted: 616
@4000000059e7eaf41b9656fc.s:2017-10-18_23:59:38.09207 Wed, 18 Oct 2017 23:59:38 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:578, _modify_sorted:11, entries in modified_sorted: 578
@4000000059e7eaf4395b28ac.s:2017-10-18_23:59:38.62089 Wed, 18 Oct 2017 23:59:38 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:41, _modify_sorted:2, entries in modified_sorted: 41
@4000000059e7eaf52ff82c9c.s:2017-10-18_23:59:39.68822 Wed, 18 Oct 2017 23:59:39 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:3259, _modify_sorted:44, entries in modified_sorted: 3259
@4000000059e7eaf52ff82c9c.s:2017-10-18_23:59:39.73234 Wed, 18 Oct 2017 23:59:39 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:1, _modify_sorted:1, entries in modified_sorted: 1
@4000000059e7eaf52ff82c9c.s:2017-10-18_23:59:39.74336 Wed, 18 Oct 2017 23:59:39 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:36, _modify_sorted:2, entries in modified_sorted: 36
@4000000059e7eaf7347534f4.s:2017-10-18_23:59:41.86196 Wed, 18 Oct 2017 23:59:41 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:1179, _modify_sorted:19, entries in modified_sorted: 1179
@4000000059e7eaf80fa8d7ac.s:2017-10-18_23:59:41.98371 Wed, 18 Oct 2017 23:59:41 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:1179, _modify_sorted:19, entries in modified_sorted: 1179
@4000000059e7eaf92885c974.s:2017-10-18_23:59:43.55874 Wed, 18 Oct 2017 23:59:43 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:925, _modify_sorted:20, entries in modified_sorted: 925
@4000000059e7eaf92885c974.s:2017-10-18_23:59:43.64315 Wed, 18 Oct 2017 23:59:43 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:925, _modify_sorted:20, entries in modified_sorted: 925
@4000000059e7eafa0848431c.s:2017-10-18_23:59:43.81502 Wed, 18 Oct 2017 23:59:43 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:99, _modify_sorted:1, entries in modified_sorted: 99
@4000000059e7eafa0848431c.s:2017-10-18_23:59:43.83910 Wed, 18 Oct 2017 23:59:43 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:99, _modify_sorted:1, entries in modified_sorted: 99
@4000000059e7eafb286cf9e4.s:2017-10-18_23:59:45.63509 Wed, 18 Oct 2017 23:59:45 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:1202, _modify_sorted:6, entries in modified_sorted: 1202
@4000000059e7eafc019b04b4.s:2017-10-18_23:59:45.87167 Wed, 18 Oct 2017 23:59:45 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:1202, _modify_sorted:6, entries in modified_sorted: 1202
@4000000059e7eafc019b04b4.s:2017-10-18_23:59:45.87265 Wed, 18 Oct 2017 23:59:45 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:0, _modify_sorted:0, entries in modified_sorted: 0
@4000000059e7eafc019b04b4.s:2017-10-18_23:59:45.87307 Wed, 18 Oct 2017 23:59:45 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:0, _modify_sorted:0, entries in modified_sorted: 0
@4000000059e7eb6f2cdf86a4.s:2017-10-18_23:59:47.63290 Wed, 18 Oct 2017 23:59:47 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:843, _modify_sorted:5, entries in modified_sorted: 843
@4000000059e7eb6f2cdf86a4.s:2017-10-18_23:59:47.72664 Wed, 18 Oct 2017 23:59:47 | INFO     | 6989   | store         | store:updates clearing _modify_nlri:843, _modify_sorted:5, entries in modified_sorted: 843
STEPS TO REPRODUCE
EXPECTED RESULTS

All the routes announced to ExaBGP should be announced

ACTUAL RESULTS

Few routes are not announced to BGP peers

IMPORTANCE
@thomas-mangin
Copy link
Member

Are you flushing the data on the STDOUT between each command ? If not this likely the reason and you need to do it. I should update the documentation to make this a requirement.

@ravikumar727
Copy link
Author

@thomas-mangin ,
Yes. I am flushing the data between the commands.

I can see the route is added in store.py: insert_announced. Both dict_nlri & dict_sorted has the change.

However after below line in store.py:updates method, some routes are missed.
for attr_index, dict_change in dict_sorted.items():

I am trying to identify where it is getting dropped. Is there any change done to dict_sorted after routes are inserted?

@thomas-mangin
Copy link
Member

That's bad .. I can not look into it right now but will do.

@ravikumar727
Copy link
Author

ravikumar727 commented Oct 20, 2017

Thanks @thomas-mangin .

Btw, I am using only for IPv4 VPN Unicast family.

@thomas-mangin
Copy link
Member

Are all the updates announcement or do you have some withdraw ?

@ravikumar727
Copy link
Author

this is happening during bootstrap and will be only announcements.

@ravikumar727
Copy link
Author

ravikumar727 commented Oct 21, 2017

I found the issue. The incoming commands are added to RIB._modify_sorted through insert_announced method. The worker that is getting all the routes from RIB._modify_sorted and constructs the UPDATE message and sends it. Since our usage is pushing the commands during bootstrap faster than the worker could construct the message, the UPDATE message constructed leaves out some of the commands that were added after the enumerator started with dict_sorted.items(). However RIB._modify_sorted is cleared at the end of the enumeration that is causing the routes added after the enumeration start to be dropped and never announced to the peers.

I have the fix that addressed this issue. I have moved the clearing of RIB._modify_sorted to be done just after getting the reference in dict_sorted. I will send out the pull request soon.

2017-10-20_22:13:52.12815 Fri, 20 Oct 2017 22:13:52 | INFO | 5803 | store | store:updates clearing _modify_nlri:290, _modify_sorted:11, entries in modified_sorted: 290
2017-10-20_22:13:52.12824 Fri, 20 Oct 2017 22:13:52 | INFO | 5803 | message | Peer 10.19.128.1 ASN 12345 >> 283 UPDATE(s)

@ravikumar727
Copy link
Author

@thomas-mangin, pls take a look at #737

@thomas-mangin
Copy link
Member

Thank you very much @ravikumar727 for tracing and fixing this bug - I checked master and it seems the bug was fixed during the large code rewrite the RIB got (totally unexpected side effect).

@thomas-mangin
Copy link
Member

fixed with #737 - thank you.

@ravikumar727
Copy link
Author

@thomas-mangin , may i know if i can have a version released with the fix? we need to patch our system with the fix.

@thomas-mangin
Copy link
Member

thomas-mangin commented Oct 23, 2017

Upload 3.4.21 to GitHub and Pypi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants