Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releases/gcc 12 #65

Open
wants to merge 2,622 commits into
base: master
Choose a base branch
from
Open

Releases/gcc 12 #65

wants to merge 2,622 commits into from

Conversation

jacopobrusini
Copy link

Support for Apple Silicon!!!

@jwakely
Copy link
Contributor

jwakely commented Feb 21, 2024

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

GCC Administrator and others added 28 commits August 24, 2024 00:19
For function arguments/return, when it's BLK mode, it's put in a
parallel with an expr_list, and the expr_list contains the real mode
and registers.
Current ix86_check_avx_upper_register only checked for SSE_REG_P, and
failed to handle that. The patch extend the handle to each subrtx.

gcc/ChangeLog:

	PR target/116512
	* config/i386/i386.cc (ix86_check_avx_upper_register): Iterate
	subrtx to scan for avx upper register.
	(ix86_check_avx_upper_stores): Inline old
	ix86_check_avx_upper_register.
	(ix86_avx_u128_mode_needed): Ditto, and replace
	FOR_EACH_SUBRTX with call to new
	ix86_check_avx_upper_register.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr116512.c: New test.

(cherry picked from commit ab214ef)
The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. That problem also happened in AVX10.2 testcases.
I will write a seperate patch to fix that.

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h
	(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
	(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.
Update analyze_parms not to disable function parameter analysis for
-ffat-lto-objects.  Tested on x86-64, there are no differences in zstd
with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects".

	PR ipa/116410
	* ipa-modref.cc (analyze_parms): Always analyze function parameter
	for LTO.

Signed-off-by: H.J. Lu <[email protected]>
(cherry picked from commit 2f1689e)
Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression
in a TImode register.  Otherwise, the TImode variable will be put in
the GPR save area which guarantees only 8-byte alignment.

gcc/

	PR target/116621
	* config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for
	a PARALLEL BLKmode container of an EXPR_LIST expression in a
	TImode register.

gcc/testsuite/

	PR target/116621
	* gcc.target/i386/pr116621.c: New test.

Signed-off-by: H.J. Lu <[email protected]>
(cherry picked from commit fa7bbb0)
rguenth and others added 29 commits January 17, 2025 09:52
Loop distribution does different analysis with -g0/-g due to counting
a debug stmt starting a BB against a limit which will everntually
lead to different IVOPTs choices.  I've fixed a possible IVOPTs
issue on the way even though it doesn't make a difference here.

	PR tree-optimization/116290
	* tree-loop-distribution.cc (determine_reduction_stmt_1): PHIs
	have no debug variants.  Start with first non-debug real stmt.
	* tree-ssa-loop-ivopts.cc (find_givs_in_bb): Do not analyze
	debug stmts.

	* gcc.dg/pr116290.c: New testcase.

(cherry picked from commit 5667400)
The following reverts a bogus fix done for PR101009 and instead makes
sure we get into the same_access_functions () case when computing
the distance vector for g[1] and g[1] where the constants ended up
having different types.  The generic code doesn't seem to handle
loop invariant dependences.  The special case gets us both
( 0 ) and ( 1 ) as distance vectors while formerly we got ( 1 ),
which the PR101009 fix changed to ( 0 ) with bad effects on other
cases as shown in this PR.

	PR tree-optimization/116768
	* tree-data-ref.cc (build_classic_dist_vector_1): Revert
	PR101009 change.
	* tree-chrec.cc (eq_evolutions_p): Make sure (sizetype)1
	and (int)1 compare equal.

	* gcc.dg/torture/pr116768.c: New testcase.

(cherry picked from commit 5b5a36b)
 @2)

Transforming -fma (-a, b, -c) to fma (a, b, c) is only valid when
not rounding towards -inf or +inf as the sign of the multiplication
changes.

	PR middle-end/116891
	* match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
	Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.

(cherry picked from commit c53bd48)
On Mon, Oct 14, 2024 at 08:53:29AM +0200, Jakub Jelinek wrote:
> >     PR middle-end/116891
> >     * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
> >     Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
>
> Guess it would be nice to have a testcase which FAILs without the patch and
> PASSes with it, but it can be added later.

I've added such a testcase now, and additionally found the fix only fixed
one of the 4 problematic similar cases.

Here is a patch which fixes the others too and adds the testcases.
fma-pr116891.c FAILed without your patch, FAILs with your patch too (but
only due to the bar/baz/qux checks) and PASSes with the patch.

2024-10-15  Jakub Jelinek  <[email protected]>

	PR middle-end/116891
	* match.pd ((negate (fmas@3 @0 @1 @2)) -> (IFN_FNMS @0 @1 @2)):
	Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
	((negate (IFN_FMS@3 @0 @1 @2)) -> (IFN_FNMA @0 @1 @2)): Likewise.
	((negate (IFN_FNMA@3 @0 @1 @2)) -> (IFN_FMS @0 @1 @2)): Likewise.

	* gcc.dg/pr116891.c: New test.
	* gcc.target/i386/fma-pr116891.c: New test.

(cherry picked from commit 4366f0c)
…ication

For vector types we have to make sure the comparison result is a vector
type and the resulting compare operation is supported.  As the resulting
compare is never an equality compare I didn't bother to check for the
cbranch case.

	PR tree-optimization/117104
	* match.pd ((cmp:c (minmax:c @0 @1) @0) -> (out @0 @1)): Properly
	guard the vector case.

	* gcc.dg/pr117104.c: New testcase.

(cherry picked from commit f54d42e)
The diagnostics code fails to handle non-constant domain max.

	PR tree-optimization/117254
	* gimple-ssa-warn-access.cc (maybe_warn_nonstring_arg):
	Check the array domain max is constant before using it.

	* gcc.dg/pr117254.c: New testcase.

(cherry picked from commit d464a52)
STMT_VINFO_SLP_VECT_ONLY isn't properly computed as union of all
group members and when the group is later split due to duplicates
not all sub-groups inherit the flag.

	PR tree-optimization/117307
	* tree-vect-data-refs.cc (vect_analyze_data_ref_accesses):
	Properly compute STMT_VINFO_SLP_VECT_ONLY.  Set it on all
	parts of a split group.

	* gcc.dg/vect/pr117307.c: New testcase.

(cherry picked from commit 1972230)
When we decompose a complex load only used as real and imaginary
parts we fail to honor IL constraints which are that a BIT_FIELD_REF
of register type should be outermost in a ref.  The following
simply avoids the transform when the complex load has such a
BIT_FIELD_REF.

	PR tree-optimization/117417
	* tree-ssa-forwprop.cc (pass_forwprop::execute): Avoid
	decomposing BIT_FIELD_REF complex load.

	* gcc.dg/torture/pr117417.c: New testcase.

(cherry picked from commit d976daa)
This patch removes the (unnecessary) CPP_PRAGMA_EOL case from
cp_parser_cache_defarg, which currently has the result that any pragmas
in the NSDMI cause an error.

	PR c++/118147

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_cache_defarg): Don't error when
	CPP_PRAGMA_EOL.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/nsdmi-defer7.C: New test.

Signed-off-by: Nathaniel Shead <[email protected]>
(cherry picked from commit f3ccc57)
We are initializing both the call graph node count and
the entry block count of the function with the head_count value
from the profile.

Count propagation algorithm may refine the entry block count
and we may end up with a case where the call graph node count
is set to zero but the entry block count is non-zero. That becomes
a problem because we have this code in execute_fixup_cfg:

 profile_count num = node->count;
 profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
 bool scale = num.initialized_p () && !(num == den);

Here if num is 0 but den is not 0, scale becomes true and we
lose the counts in

if (scale)
  bb->count = bb->count.apply_scale (num, den);

This is what happened in the issue reported in PR116743
(a 10% regression in MySQL HAMMERDB tests).
3d9e676 made an improvement in
AutoFDO count propagation, which caused a mismatch between
the call graph node count (zero) and the entry block count (non-zero)
and subsequent loss of counts as described above.

The fix is to update the call graph node count once we've done count propagation.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
	PR gcov-profile/116743
	* auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count
	and the entry block count.

(cherry picked from commit e683c6b)
…PR118255]

We currently reject the following code

=== code here ===
template <int non_template> struct S { friend class non_template; };
class non_template {};
S<0> s;
=== code here ===

While EDG agrees with the current behaviour, clang and MSVC don't (see
https://godbolt.org/z/69TGaabhd), and I believe that this code is valid,
since the friend clause does not actually declare a type, so it cannot
shadow anything. The fact that we didn't error out if the non_template
class was declared before S backs this up as well.

This patch fixes this by skipping the call to check_template_shadow for
hidden bindings.

	PR c++/118255

gcc/cp/ChangeLog:

	* name-lookup.cc (pushdecl): Don't call check_template_shadow
	for hidden bindings.

gcc/testsuite/ChangeLog:

	* g++.dg/lookup/pr99116-1.C: Adjust test expectation.
	* g++.dg/template/friend84.C: New test.

(cherry picked from commit b5a0692)
…[PR118067]

SImode and DImode moves from/to mask registers are valid only with AVX512BW,
so mark relevant alternatives in *movsi_internal and *movdi_internal as such.

	PR target/118067

gcc/ChangeLog:

	* config/i386/i386.md (*movdi_internal):
	Disable alternatives from/to mask registers without AVX512BW.
	(*movsi_internal): Ditto.
Since the introduction of gdc.test/runnable/test23514.d, it's exposed an
incorrect compilation when adding a 64-bit constant to a link-time
address.  The current cast to size_t causes a loss of precision, which
can result in incorrect compilation.

	PR d/114434

gcc/d/ChangeLog:

	* expr.cc (ExprVisitor::visit (PtrExp *)): Get the offset as a
	dinteger_t rather than a size_t.
	(ExprVisitor::visit (SymOffExp *)): Likewise.

gcc/testsuite/ChangeLog:

	* gdc.test/runnable/test23514.d: New test.

(cherry picked from commit 9ab3895)
We disable gathers for zen4.  It seems that gather has improved a bit compared
to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when
the indices are known ahead of time. Vector loads followed by shuffles result
in a higher load bandwidth." however the situation seems to be more
complicated.

gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot
products in TSVC. Curiously enough breaking these out into microbenchmark
reversed the situation and it turns out that the performance depends on
how indices are distributed.  gather is loss if indices are sequential,
neutral if they are random and win for some strides (4, 8).

This seems to be similar to earlier zens, so I think (especially for
backporting znver5 support) that it makes sense to be conistent and disable
gather unless we work out a good heuristics on when to use it. Since we
typically do not know the indices in advance, I don't see how that can be done.

I opened PR116582 with some examples of wins and loses

gcc/ChangeLog:

	* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for
	ZNVER5.
	(X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5.
	(X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5.
	(X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5.
	(X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5.
	(X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5.

(cherry picked from commit d82edbe)
	PR d/111650

gcc/d/ChangeLog:

	* decl.cc (get_fndecl_arguments): Move generation of frame type to ...
	(DeclVisitor::visit (FuncDeclaration *)): ... here, after the call to
	build_closure.

gcc/testsuite/ChangeLog:

	* gdc.dg/pr111650.d: New test.

(cherry picked from commit 4d4929f)
2025-01-23  John David Anglin  <[email protected]>

gcc/ChangeLog:

	* config/pa/pa32-regs.h (ADDITIONAL_REGISTER_NAMES): Change
	register 86 name to "%fr31L".
The loop checking for built-in constant operand restrictions was missing
some operands due to the loop limit being too small.  Fixing that exposed
a testsuite failure which is caused by a typo in the pmxvi4ger8pp definition
where we had made the PMASK field too small.

2025-01-16  Peter Bergner  <[email protected]>

gcc/
	* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Use correct
	array size for the loop limit.
	* config/rs6000/rs6000-builtins.def: Fix field size for PMASK operand.

(cherry picked from commit 1a2d63a)
For invalid constant operand values used in built-in functions, return
const0_rtx to signify an error occurred during expansion.

2025-01-16  Peter Bergner  <[email protected]>

gcc/
	* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Return
	const0_rtx when there is an error.

gcc/testsuite/
	* gcc.target/powerpc/mma-builtin-error.c: New test.

(cherry picked from commit 0696af7)
NinaRanns pushed a commit to NinaRanns/gcc that referenced this pull request Jan 28, 2025
…on-r15-7214-g0710024b5bd861

Contracts nonattr rebase on r15 7214 g0710024b5bd861
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.