Skip to content

Conversation

@anandamideShakyan
Copy link
Contributor

Description

The current setup scripts generate excessive output, the goal is to reduce the log output by implementing the following changes:

  1. Avoid printing the files extracted by tar.
  2. Silence the download progress from curl by using an appropriate curl option.
  3. Suppress unnecessary CMake messages, logging only warnings instead of info or status messages.
  4. Introduce a new environment variable to enable the extensive logging when needed.
  5. These changes will help minimize log size while maintaining flexibility for detailed logging when required.

Motivation and Context

The current setup scripts generate excessive log output, which can overwhelm users and make it difficult to identify important information. This excessive logging increases log file sizes unnecessarily, which can be problematic for storage and debugging. By implementing changes to reduce log output, we aim to streamline the logging process, improve readability, and maintain flexibility for detailed logging when required.

Impact

  1. Improved Log Readability: By reducing unnecessary output, users can focus on critical information, making debugging and monitoring more efficient.
  2. Reduced Log Size: Suppressing excessive logging will significantly decrease log file sizes, saving storage space and improving performance.
  3. Flexibility: Introducing an environment variable to enable extensive logging ensures that detailed logs are still available when needed for debugging or analysis.
  4. Enhanced User Experience: A cleaner and more concise logging output will improve the overall user experience, especially for those who do not require verbose logging.

@anandamideShakyan anandamideShakyan requested a review from a team as a code owner January 29, 2025 10:33
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 29, 2025
@prestodb-ci prestodb-ci requested review from a team, infvg and sh-shamsan and removed request for a team January 29, 2025 10:33
@majetideepak majetideepak requested a review from czentgr January 31, 2025 18:01
@majetideepak
Copy link
Collaborator

@czentgr any feedback?

@majetideepak majetideepak changed the title [native] Reduce Excessive Logging During Setup and Build Process [native] Allow options for wget and tar commands during setup Jan 31, 2025
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anandamideShakyan Can you fix the commit title as the PR title? Thanks.

@majetideepak
Copy link
Collaborator

@anandamideShakyan the commit title must be Allow options for wget and tar commands during setup
Can you please update? Thanks.

@majetideepak
Copy link
Collaborator

@anandamideShakyan My bad. Please include the [native] prefix as well :).
[native] Allow options for wget and tar commands during setup

@majetideepak
Copy link
Collaborator

Can you also rebase with the master branch? Thanks.

Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czentgr czentgr merged commit 0299a29 into prestodb:master Feb 7, 2025
61 checks passed
Copy link
Contributor

@PingLiuPing PingLiuPing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anandamideShakyan
Looking at the your fix and your PR description as following

  1. Avoid printing the files extracted by tar.
  2. Silence the download progress from curl by using an appropriate curl option.
  3. Suppress unnecessary CMake messages, logging only warnings instead of info or status messages.
  4. Introduce a new environment variable to enable the extensive logging when needed.
  5. These changes will help minimize log size while maintaining flexibility for detailed logging when required.

I cannot understand why adding -v to tar and a empty option to wget can fix the issues you described.
Can you please help to explain more?
Especially when adding -v option to tar command the output will include verbose information, and this is completely contrary to your original intention.

Thanks

@anandamideShakyan
Copy link
Contributor Author

Hi @PingLiuPing. To keep the behaviour consistent with the old setup I am passing empty option to wget and -v to tar. The users can set this to "-q" and "" explicitly when needed. See Deepak's comments on the setup-centos.sh file.

@PingLiuPing
Copy link
Contributor

@anandamideShakyan Thank you.
But still it doesn't fix what you have described in this issue. With your code changes by default it changes nothing, when user setting TAR_OPTIONS to empty it can only impact gperf which I think it just contribute a very small part to the log file size.

Since WGET_OPTIONS and TAR_OPTIONS are local variable defined in this scripts , it just affects gperf

wget ${WGET_OPTIONS} http://ftp.gnu.org/pub/gnu/gperf/gperf-3.1.tar.gz &&
  tar ${TAR_OPTIONS} -xzf gperf-3.1.tar.gz &&

because these two variables are not been used in other places such as velox scritps.

So, I cannot undertand what's the benefit we can get by adding this two variables to allowing users to control the wget and tar for gperf.

And TAR_OPTIONS is actually a environment variable that can be recognized by tar command. So, after your code changes, suppose before execute this script the TAR_OPTIONShas been exported somewhere, the -v will affect all the tar command after the assignment.

@anandamideShakyan
Copy link
Contributor Author

anandamideShakyan commented Mar 28, 2025

@PingLiuPing. You can find the velox PR here: https://github.com/facebookincubator/velox/pull/12201/files. The main reason for this change was to reduce log size in SPS pipeline in lakehouse. This was first raised as an internal PR but then opensource PR was also opened. The log size did get reduced to less than 16MB which was the primary requirement see : https://github.ibm.com/lakehouse/presto/issues/2606. As for TAR_OPTIONS, I guess we have to come up with a new variable name. Do you have a good suggestion?

@prestodb-ci prestodb-ci mentioned this pull request Mar 28, 2025
30 tasks
@PingLiuPing
Copy link
Contributor

@anandamideShakyan , after checking the code changes I don't find any places use TAR_OPTIONS in velox scripts which is the biggest part to download and untar the third party libraries.
So, I would suggest simply remove TAR_OPTIONS from presto-native-execution/scripts/setup-centos.sh. Or setting the default value to empty, default value of -v is not helpful to reduce the log size.

@majetideepak
Copy link
Collaborator

I agree. We should remove TAR_OPTIONS from presto-native-execution/scripts/setup-centos.sh since tar natively supports it and is placed first. Let's also remove the "-v" argument since TAR_OPTIONS cannot override that.
https://www.gnu.org/software/tar/manual/html_section/using-tar-options.html

@anandamideShakyan
Copy link
Contributor Author

I have removed TAR_OPTIONS and raised a PR : #24854

@PingLiuPing
Copy link
Contributor

I have removed TAR_OPTIONS and raised a PR : #24854

Thanks @anandamideShakyan .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants