-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building a static version on RHEL 6.x systems #15
Comments
To build a statically linked bruce, you can tweak the SConstruct file as shown by the diff below. When I do this, I see the warnings shown below. I'm not sure what would be a good workaround for the getpwnam() and getaddrinfo() issues. Maybe an easier option would be to set up an RHEL 6.x development box and tweak the third line of the file centos6/gcc482.spec:
to specify an alternate installation location for gcc 4.8.2. Assuming that you have write access to some directory, let's say /home/mgimelfarb, on the locked down machines, you might try cloning bruce's git repo into /home/mgimelfarb/bruce on your development box, and editing the above line so gcc 4.8.2 is installed in /home/mgimelfarb/bruce/opt/gcc. After building gcc 4.8.2 and bruce you could then make a tarball of the entire repo directory, transfer it to the locked down boxes, and try running it (remember to set environment variables PATH=/home/mgimelfarb/bruce/opt/gcc/bin:$PATH and LD_LIBRARY_PATH=/home/mgimelfarb/bruce/opt/gcc/lib64).
|
@dspeterson, thank you for the instructions. I actually tried doing a build_all with the following command line:
Not sure if that's something that could be easily corrected, but wanted to mention it. Otherwise, I've been running some tests and so far, so good, but did get a segfault on bruce.test:
Not looked into it yet. |
Regarding the error building bruce/client/libbruce_client.so, it's probably because you're building a shared library, and specifying -static. Since you're statically linking, I would tweak the build so that libbruce_client.so isn't built. Regarding the seg fault, does it only occur when running unit tests on a statically linked bruce, or does it also occur when building in the usual manner without static linking? |
@dspeterson, while I didn't do what you asked me to just yet, I did do a static debug build.
BTW, no issues with shared library this time, which is somewhat inexplicable. |
I believe the segfault and shared library build failure aren't reproducing for you in a debug build because the debug version isn't actually getting linked statically. The diff I posted above showing how to tweak the SConstruct file for static linking only enables it for release builds. Sorry about that - it was an oversight on my part. To also enable static linking for debug builds, you need to make the same change to LINKFLAGS inside set_debug_options() as the diff does inside set_release_options(). When I do a debug build with static linking in my CentOS 6.7 development VM, I see the shared library build failure and also intermittent segfaults, which only occur when static linking is used. Looking at a core file I see that bruce is crashing inside a call to getaddrinfo() as shown in the stack trace below. I believe this is almost certainly related to the following warning emitted during the build:
Given this issue with getaddrinfo() and static linking, I think your best bet is to link dynamically, and try the workaround I suggest above which involves editing centos6/gcc482.spec so you end up with a big tarball containing both bruce and gcc 4.8.2. Admittedly, it's not a very pretty solution but I think it will work. About load testing with bruce and the mock kafka server, take a look at the command line args for the
|
When load testing with real kafka brokers, a good stress test is to shut down a broker while sending messages through bruce. Then wait a while, start the broker again, and run the kafka-preferred-replica-election.sh tool that comes with kafka to reassign partition leadership. In addition to doing a clean shutdown, try an unclean shutdown (i.e. kill -9) to simulate a broker crash. While performing these steps, you can watch bruce's log messages to see how it handles the failures. Also try shutting down more than one broker. You can try this with different workloads, including multiple client machines sending messages through bruce, to get an idea of the cluster's performance characteristics and ability to tolerate and recover from failures. While doing this, try running some consumers and check the message flow to verify that there is no message loss. When one or more brokers are down, use bruce's web interface to see if message backlogs develop, and if so how quickly. If messages start getting backlogged, see how quickly the backlogs disappear after bringing broker(s) back up. This will help you decide whether your cluster is adequately provisioned to tolerate failures, how large a value to specify for bruce's --msg_buffer_max option, and whether enough message batching is being performed. Bruce's batching configuration has a large influence on kafka cluster performance, and inadequate batching even for a single relatively low volume topic can quickly degrade performance. The sensitivity of kafka cluster performance to reduced batching is greater than one might expect just from considering physical factors such as network latency and disk performance. When batching was greatly decreased or eliminated, we discovered by running the brokers with strace that they spent a good deal of time repeatedly adding and removing file descriptors to/from an epoll set, which is a slow operation. Modifying kafka's implementation to avoid this behavior would likely be difficult since it is written in Scala, and therefore relies on Scala APIs for simultaneously monitoring multiple TCP connections, rather than directly making epoll-related system calls as can be done in C or C++ code. Kafka's design relies heavily on batching to achieve good performance, so this sort of problem that occurs only when batching is nearly or completely disabled shouldn't be viewed as a major issue, but it's good to be aware of. If there are topics for which very low latency is desired, you may have to make a tradeoff between low latency and broker cluster performance. At if(we), we replaced a legacy data pipeline with a new system consisting of bruce, kafka, and consumers (both internally developed at if(we) and open source consumers developed elsewhere). Before deploying to production, we did extensive testing of this sort in our QA environment to gain confidence in the integrity of the new data pipeline. We ran both old and new data pipelines in production for a while to facilitate a gradual switchover and provide a fallback option in case unexpected problems occur. |
While there is a great writeup on building gcc 4.8.2 and then building bruce with it, I was wondering if there was a way to build bruce statically. If so, what would be needed to get that accomplished in RHEL 6.x and derivatives (OEL 6.6 in my case).
I've got a usecase of running bruce on a set of locked down machine with little remote access due to strict compliance issues, and looking for an easier deployment options.
The text was updated successfully, but these errors were encountered: