mirrored from git://gcc.gnu.org/git/gcc.git
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Releases/gcc 12 #65
Open
jacopobrusini
wants to merge
2,622
commits into
master
Choose a base branch
from
releases/gcc-12
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Releases/gcc 12 #65
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time. Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place. |
atahanozbayram
approved these changes
Apr 2, 2024
For function arguments/return, when it's BLK mode, it's put in a parallel with an expr_list, and the expr_list contains the real mode and registers. Current ix86_check_avx_upper_register only checked for SSE_REG_P, and failed to handle that. The patch extend the handle to each subrtx. gcc/ChangeLog: PR target/116512 * config/i386/i386.cc (ix86_check_avx_upper_register): Iterate subrtx to scan for avx upper register. (ix86_check_avx_upper_stores): Inline old ix86_check_avx_upper_register. (ix86_avx_u128_mode_needed): Ditto, and replace FOR_EACH_SUBRTX with call to new ix86_check_avx_upper_register. gcc/testsuite/ChangeLog: * gcc.target/i386/pr116512.c: New test. (cherry picked from commit ab214ef)
The intrin for non-optimized got a typo in mask type, which will cause the high bits of __mmask32 being unexpectedly zeroed. The test does not fail under O0 with current 1b since the testcase is wrong. We need to include avx512-mask-type.h after SIZE is defined, or it will always be __mmask8. That problem also happened in AVX10.2 testcases. I will write a seperate patch to fix that. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32. (_mm512_fpclass_ph_mask): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.
Update analyze_parms not to disable function parameter analysis for -ffat-lto-objects. Tested on x86-64, there are no differences in zstd with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects". PR ipa/116410 * ipa-modref.cc (analyze_parms): Always analyze function parameter for LTO. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit 2f1689e)
Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression in a TImode register. Otherwise, the TImode variable will be put in the GPR save area which guarantees only 8-byte alignment. gcc/ PR target/116621 * config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for a PARALLEL BLKmode container of an EXPR_LIST expression in a TImode register. gcc/testsuite/ PR target/116621 * gcc.target/i386/pr116621.c: New test. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit fa7bbb0)
Loop distribution does different analysis with -g0/-g due to counting a debug stmt starting a BB against a limit which will everntually lead to different IVOPTs choices. I've fixed a possible IVOPTs issue on the way even though it doesn't make a difference here. PR tree-optimization/116290 * tree-loop-distribution.cc (determine_reduction_stmt_1): PHIs have no debug variants. Start with first non-debug real stmt. * tree-ssa-loop-ivopts.cc (find_givs_in_bb): Do not analyze debug stmts. * gcc.dg/pr116290.c: New testcase. (cherry picked from commit 5667400)
The following reverts a bogus fix done for PR101009 and instead makes sure we get into the same_access_functions () case when computing the distance vector for g[1] and g[1] where the constants ended up having different types. The generic code doesn't seem to handle loop invariant dependences. The special case gets us both ( 0 ) and ( 1 ) as distance vectors while formerly we got ( 1 ), which the PR101009 fix changed to ( 0 ) with bad effects on other cases as shown in this PR. PR tree-optimization/116768 * tree-data-ref.cc (build_classic_dist_vector_1): Revert PR101009 change. * tree-chrec.cc (eq_evolutions_p): Make sure (sizetype)1 and (int)1 compare equal. * gcc.dg/torture/pr116768.c: New testcase. (cherry picked from commit 5b5a36b)
@2) Transforming -fma (-a, b, -c) to fma (a, b, c) is only valid when not rounding towards -inf or +inf as the sign of the multiplication changes. PR middle-end/116891 * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)): Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING. (cherry picked from commit c53bd48)
On Mon, Oct 14, 2024 at 08:53:29AM +0200, Jakub Jelinek wrote: > > PR middle-end/116891 > > * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)): > > Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING. > > Guess it would be nice to have a testcase which FAILs without the patch and > PASSes with it, but it can be added later. I've added such a testcase now, and additionally found the fix only fixed one of the 4 problematic similar cases. Here is a patch which fixes the others too and adds the testcases. fma-pr116891.c FAILed without your patch, FAILs with your patch too (but only due to the bar/baz/qux checks) and PASSes with the patch. 2024-10-15 Jakub Jelinek <[email protected]> PR middle-end/116891 * match.pd ((negate (fmas@3 @0 @1 @2)) -> (IFN_FNMS @0 @1 @2)): Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING. ((negate (IFN_FMS@3 @0 @1 @2)) -> (IFN_FNMA @0 @1 @2)): Likewise. ((negate (IFN_FNMA@3 @0 @1 @2)) -> (IFN_FMS @0 @1 @2)): Likewise. * gcc.dg/pr116891.c: New test. * gcc.target/i386/fma-pr116891.c: New test. (cherry picked from commit 4366f0c)
…ication For vector types we have to make sure the comparison result is a vector type and the resulting compare operation is supported. As the resulting compare is never an equality compare I didn't bother to check for the cbranch case. PR tree-optimization/117104 * match.pd ((cmp:c (minmax:c @0 @1) @0) -> (out @0 @1)): Properly guard the vector case. * gcc.dg/pr117104.c: New testcase. (cherry picked from commit f54d42e)
The diagnostics code fails to handle non-constant domain max. PR tree-optimization/117254 * gimple-ssa-warn-access.cc (maybe_warn_nonstring_arg): Check the array domain max is constant before using it. * gcc.dg/pr117254.c: New testcase. (cherry picked from commit d464a52)
STMT_VINFO_SLP_VECT_ONLY isn't properly computed as union of all group members and when the group is later split due to duplicates not all sub-groups inherit the flag. PR tree-optimization/117307 * tree-vect-data-refs.cc (vect_analyze_data_ref_accesses): Properly compute STMT_VINFO_SLP_VECT_ONLY. Set it on all parts of a split group. * gcc.dg/vect/pr117307.c: New testcase. (cherry picked from commit 1972230)
When we decompose a complex load only used as real and imaginary parts we fail to honor IL constraints which are that a BIT_FIELD_REF of register type should be outermost in a ref. The following simply avoids the transform when the complex load has such a BIT_FIELD_REF. PR tree-optimization/117417 * tree-ssa-forwprop.cc (pass_forwprop::execute): Avoid decomposing BIT_FIELD_REF complex load. * gcc.dg/torture/pr117417.c: New testcase. (cherry picked from commit d976daa)
This patch removes the (unnecessary) CPP_PRAGMA_EOL case from cp_parser_cache_defarg, which currently has the result that any pragmas in the NSDMI cause an error. PR c++/118147 gcc/cp/ChangeLog: * parser.cc (cp_parser_cache_defarg): Don't error when CPP_PRAGMA_EOL. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/nsdmi-defer7.C: New test. Signed-off-by: Nathaniel Shead <[email protected]> (cherry picked from commit f3ccc57)
We are initializing both the call graph node count and the entry block count of the function with the head_count value from the profile. Count propagation algorithm may refine the entry block count and we may end up with a case where the call graph node count is set to zero but the entry block count is non-zero. That becomes a problem because we have this code in execute_fixup_cfg: profile_count num = node->count; profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; bool scale = num.initialized_p () && !(num == den); Here if num is 0 but den is not 0, scale becomes true and we lose the counts in if (scale) bb->count = bb->count.apply_scale (num, den); This is what happened in the issue reported in PR116743 (a 10% regression in MySQL HAMMERDB tests). 3d9e676 made an improvement in AutoFDO count propagation, which caused a mismatch between the call graph node count (zero) and the entry block count (non-zero) and subsequent loss of counts as described above. The fix is to update the call graph node count once we've done count propagation. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: PR gcov-profile/116743 * auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count and the entry block count. (cherry picked from commit e683c6b)
…PR118255] We currently reject the following code === code here === template <int non_template> struct S { friend class non_template; }; class non_template {}; S<0> s; === code here === While EDG agrees with the current behaviour, clang and MSVC don't (see https://godbolt.org/z/69TGaabhd), and I believe that this code is valid, since the friend clause does not actually declare a type, so it cannot shadow anything. The fact that we didn't error out if the non_template class was declared before S backs this up as well. This patch fixes this by skipping the call to check_template_shadow for hidden bindings. PR c++/118255 gcc/cp/ChangeLog: * name-lookup.cc (pushdecl): Don't call check_template_shadow for hidden bindings. gcc/testsuite/ChangeLog: * g++.dg/lookup/pr99116-1.C: Adjust test expectation. * g++.dg/template/friend84.C: New test. (cherry picked from commit b5a0692)
…[PR118067] SImode and DImode moves from/to mask registers are valid only with AVX512BW, so mark relevant alternatives in *movsi_internal and *movdi_internal as such. PR target/118067 gcc/ChangeLog: * config/i386/i386.md (*movdi_internal): Disable alternatives from/to mask registers without AVX512BW. (*movsi_internal): Ditto.
Since the introduction of gdc.test/runnable/test23514.d, it's exposed an incorrect compilation when adding a 64-bit constant to a link-time address. The current cast to size_t causes a loss of precision, which can result in incorrect compilation. PR d/114434 gcc/d/ChangeLog: * expr.cc (ExprVisitor::visit (PtrExp *)): Get the offset as a dinteger_t rather than a size_t. (ExprVisitor::visit (SymOffExp *)): Likewise. gcc/testsuite/ChangeLog: * gdc.test/runnable/test23514.d: New test. (cherry picked from commit 9ab3895)
We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to be more complicated. gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot products in TSVC. Curiously enough breaking these out into microbenchmark reversed the situation and it turns out that the performance depends on how indices are distributed. gather is loss if indices are sequential, neutral if they are random and win for some strides (4, 8). This seems to be similar to earlier zens, so I think (especially for backporting znver5 support) that it makes sense to be conistent and disable gather unless we work out a good heuristics on when to use it. Since we typically do not know the indices in advance, I don't see how that can be done. I opened PR116582 with some examples of wins and loses gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5. (cherry picked from commit d82edbe)
PR d/111650 gcc/d/ChangeLog: * decl.cc (get_fndecl_arguments): Move generation of frame type to ... (DeclVisitor::visit (FuncDeclaration *)): ... here, after the call to build_closure. gcc/testsuite/ChangeLog: * gdc.dg/pr111650.d: New test. (cherry picked from commit 4d4929f)
2025-01-23 John David Anglin <[email protected]> gcc/ChangeLog: * config/pa/pa32-regs.h (ADDITIONAL_REGISTER_NAMES): Change register 86 name to "%fr31L".
The loop checking for built-in constant operand restrictions was missing some operands due to the loop limit being too small. Fixing that exposed a testsuite failure which is caused by a typo in the pmxvi4ger8pp definition where we had made the PMASK field too small. 2025-01-16 Peter Bergner <[email protected]> gcc/ * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Use correct array size for the loop limit. * config/rs6000/rs6000-builtins.def: Fix field size for PMASK operand. (cherry picked from commit 1a2d63a)
For invalid constant operand values used in built-in functions, return const0_rtx to signify an error occurred during expansion. 2025-01-16 Peter Bergner <[email protected]> gcc/ * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Return const0_rtx when there is an error. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-error.c: New test. (cherry picked from commit 0696af7)
NinaRanns
pushed a commit
to NinaRanns/gcc
that referenced
this pull request
Jan 28, 2025
…on-r15-7214-g0710024b5bd861 Contracts nonattr rebase on r15 7214 g0710024b5bd861
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support for Apple Silicon!!!