-
Notifications
You must be signed in to change notification settings - Fork 0
Upgrade exporter to include functionality for Pod autodiscovery in k8s environments #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ClementineM12 fantastic work here 👏🏼 👏🏼 👏🏼
Implementing an auto-discovery mechanism for fpm metrice is beyond the extra mile 💯
One question that come to mind is, since we have implemented the autodiscovery why don't we deploy this once, and scrape all php apps using a single exporter instead of deploying one on each namespace?
This is indeed an item for discussion. When considering the option of deploying a single exporter for all namespaces, I foresee the following potential drawback:
In contrast, the option of deploying one exporter per namespace I believe comes with some advantages:
I believe that the primary downside of this approach is the additional resource overhead. Given that our PHP applications are high-priority—especially in the production environment—ensuring reliable and low-latency scraping is critical. One possible approach could be to differentiate the deployment strategy between development and production environments, for instance, deploying a single exporter in development and multiple exporters in production. This is of course my point of view. Let me know if I am misunderstanding anything, and what is your opinion to have a clear path on our approach @dstrants. cc: @dimoschi |
|
I agree with @ClementineM12 on separation, the exporter itself should consume negligible resources and thus it shouldn't be a problem. FPM exporter also provides some useful statistics on number of requests processed that can be used for autoscaling eventually so being a reliable source is important. Regarding the approach it is above and beyond, not that I don't know what you are capable, but you still manage to amaze me! Great work! 🚀 I still haven't found the time to review it properly though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work. It would be great to be able to set the port with a Flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @ClementineM12 you went the extra mile and beyond here. I honestly did not expect such a robust result when we created the ticket 💯
I have left a non-blocking UX suggestion, you can consider, only if you have the time.
I am not approving since there are pending requests from @dimoschi
Awesome piece of work Christina you nailed it 👏🏼 👏🏼 👏🏼
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Christina, I am a bit out of date here. What is the status of this. I remember you had some issues with the pod tracker here. Have we figured it out, is this ready for final review?
Hi @dstrants, the description I have is up to date with the current state of the ticket. I can either proceed with finding a solution or review the issue and open a new ticket to address it. You can also test the exporter on your own as I have this deployed in the content namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ClementineM12 I have just checked the code again and it looks great. Very good code organisation and readability, easy to follow despite my lack of familiarity with the golang 👏🏼.
Regarding the issue you mention on the description is the the error following the one you are referring to?
time="2025-01-13T12:10:49Z" level=error msg="Inconsistent active and idle processes reported. Set
--phpfpm.fix-process-countto have this calculated by php-fpm_exporter instead."
If this is it, have you tried to set --phpfpm.fix-process-count on the exporter and see how it goes?
Hey @dstrants, thanks for reviewing the implementation! I actually just tried reproducing the error but could not, by scaling up and down the deployment for |
This actually makes more sense. What I did was just deleting the pod. Let me try your way and I will get back to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again great work. We have tested it in services-dev and it works like a charm. The error you are referring is probably related to the way Go runtime handles goroutines and probably in difference between updating PoolManager and updating metrics.
Looks good.
Implement functionality to dynamically retrieve and update pod IPs when
PHP_FPM_K8S_AUTOTRACKINGis set totrue. This feature is particularly useful in Kubernetes environments where pod IPs change frequently, ensuring the system dynamically adapts to these changes without manual intervention.Introduce three new input variables:
php-fpmservice pod listens, directly (not behind nginx). If this is not set, it will default to port9000.PodPhases in PoolManager
The following structure in the PoolManager was implemented in order to keep track of the different pod phases (e.g., running, deleted):
Workflow Overview
Important
There is an issue that needs to be addressed when a pod is removed from the pool manager. Currently, when a pod is dropped, the following error occurs, disrupting the connectivity of the exporter. Although the disruption is brief, we need to implement a smoother way to remove the pod URI from the pool manager without causing an error, as this is not an actual error.
Note
The implementation was tested in the
contentnamespace inservices-dev-k8s.