Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MSHARED-1216] Use caching output stream #73

Merged
merged 1 commit into from
Apr 4, 2023
Merged

Conversation

gnodet
Copy link
Contributor

@gnodet gnodet commented Mar 24, 2023

JIRA issue: https://issues.apache.org/jira/browse/MSHARED-1216

Use a caching output stream when copying / filtering resources. This has the advantage of not modifying the file if the content has not changed (and thus will avoid having to compile / process / package the resource again and having a cascading effect in the reactor to downstream dependencies).
This deprecates the overwrite flag introduced by MSHARED-67 / MRESOURCES-21.

The change has the same effect as if always using overwrite=true, but only if the content has changed. If the content is unchanged, the file will not be modified and will not trigger downstream processing based on last modified time. Note that the overwrite flag was only effective when not filtering, which was incoherent too.

@gnodet gnodet changed the title Use caching output stream [MSHARED-1216] Use caching output stream Mar 27, 2023
@gnodet gnodet requested a review from olamy March 27, 2023 06:57
@olamy
Copy link
Member

olamy commented Mar 31, 2023

The change has the same effect as if always using overwrite=true, but only if the content has changed. If the content is unchanged, the file will not be modified and will not trigger downstream processing based on last modified time. Note that the overwrite flag was only effective when not filtering, which was incoherent too.

not sure exactly about the use case.
what do you mean with content?
in case of a pom with
<properties> <foo>bar</bar> </properties>
a file is filtering using this property.
second run using mvn ..... -Dfoo=blabla
Technically the content of the file hasn’t changed but the properties for filtering have.
if I remember correctly it was a reason why it was different in case of filtering activated or not.

@gnodet
Copy link
Contributor Author

gnodet commented Mar 31, 2023

The change has the same effect as if always using overwrite=true, but only if the content has changed. If the content is unchanged, the file will not be modified and will not trigger downstream processing based on last modified time. Note that the overwrite flag was only effective when not filtering, which was incoherent too.

not sure exactly about the use case. what do you mean with content? in case of a pom with <properties> <foo>bar</bar> </properties> a file is filtering using this property. second run using mvn ..... -Dfoo=blabla Technically the content of the file hasn’t changed but the properties for filtering have. if I remember correctly it was a reason why it was different in case of filtering activated or not.

@olamy By content, I refer to what is actually written to the file. All transformations which happen before writing to the file will be completely taken into account. The caching stream simply avoid rewriting the exact same file again and again. What happens underneath is that when data is pushed to the OutputStream, the stream will read what is in the file and any change will switch to a write-through mode. If what has been written to the stream is exactly the same has what is on the file system, the file will simply have been read, not written, but any change will be written to the disk. This ensures that overwriting a file with the same data will not touch the file at all.

The overwrite=true flag was introduced as a way to bypass the timestamp check which is usually used to avoid writing the file again. Imho, the problem is not the time spent to copy the file, but rather the fact that overwriting the file does incur some costs during the next build phases : if the file has changed, a re-compilation may occur, then a re-packaging, which cascades to the whole reactor. This new mechanism is more reliable than the timestamp mechanism, as it compares the actual content of the files.

@gnodet gnodet merged commit feef97d into apache:master Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants