Skip to content

Conversation

@NinaRanns
Copy link
Owner

Thanks for taking the time to contribute to GCC! Please be advised that if you are
viewing this on github.com, that the mirror there is unofficial and unmonitored.
The GCC community does not use github.com for their contributions. Instead, we use
a mailing list ([email protected]) for code submissions, code reviews, and
bug reports. Please send patches there instead.

Copy link
Collaborator

@iains iains left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to take a look at the generated IR too - but will stop here for now.

{
result = build_over_call (cand, LOOKUP_NORMAL, complain);
result = maybe_contract_wrap_new_method_call(cand->fn, result);
}
Copy link
Collaborator

@iains iains Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why "new" method call and not just method_call or call ? (the hiccup here is that 'new' has some specific meaning additional to the "just done" one).

return (flag_contracts
&& !processing_template_decl
&& DECL_ABSTRACT_ORIGIN (decl1) == NULL_TREE
&& (DECL_ABSTRACT_ORIGIN (decl1) == NULL_TREE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we really need to disentangle this from the DECL_ABSTRACT_ORIGIN (apparent) mis-use - unless the documentation for that is out of date - we should check with the GCC C++ gurus .. ... if the use of DECL_ABSTRACT_ORIGIN is an (ab)-use then that should be fixed (but not part of this patch FAOD)

tree wrapdecl = get_contract_wrapper_function (fndecl);
if (wrapdecl){
copy_and_remap_contracts (wrapdecl, fndecl);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some action needed if the wrapdecl is error_mark_node or absent?

= build_lang_decl_loc (loc, FUNCTION_DECL, get_identifier (name), wrapper_type);


DECL_CONTEXT (wrapdecl) = NULL_TREE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will add this later, I guess?

/* Copy any alignment the user added. */
DECL_USER_ALIGN (wrapdecl) = DECL_USER_ALIGN (fndecl);

/* Apply attributes from the original fn.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if any non-contracts attributes are relevant here?

instantiating as it will be done as a part of instantiation. */
copy_and_remap_contracts (wrapdecl, fndecl);

DECL_ABSTRACT_ORIGIN (wrapdecl) = fndecl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel we're abusing this (according to the doc. - it should be on our TODO to establish if the abuse is real or the doc is out of date/insufficient)

DECL_WEAK (wrapdecl) = false;

// below is copied from build_contract_condition_function
DECL_INTERFACE_KNOWN (wrapdecl) = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this does .. (not saying it's wrong, just not sure of the implications)


/* Update various inline related declaration properties. */
//DECL_DECLARED_INLINE_P (wrapdecl) = true;
DECL_DISREGARD_INLINE_LIMITS (wrapdecl) = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not disagreeing with this - but note that it is a big hammer.


if (fndecl == error_mark_node) return fndecl;

/* We only need to wrap the function if it's a virtual function */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like us (informally) to see how tricky it would be to have a flag that allows us to apply caller-side contracts generally - it would help (perhaps) with some of the SG21 + coroutines discussion.

@NinaRanns NinaRanns merged commit 5c10362 into contracts-nonattr Sep 11, 2024
NinaRanns pushed a commit that referenced this pull request Sep 26, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <[email protected]>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.
iains pushed a commit that referenced this pull request Nov 6, 2024
This patch folds svindex with constant arguments into a vector series.
We implemented this in svindex_impl::fold using the function build_vec_series.
For example,
svuint64_t f1 ()
{
  return svindex_u642 (10, 3);
}
compiled with -O2 -march=armv8.2-a+sve, is folded to {10, 13, 16, ...}
in the gimple pass lower.
This optimization benefits cases where svindex is used in combination with
other gimple-level optimizations.
For example,
svuint64_t f2 ()
{
    return svmul_x (svptrue_b64 (), svindex_u64 (10, 3), 5);
}
has previously been compiled to
f2:
        index   z0.d, #10, #3
        mul     z0.d, z0.d, #5
        ret
Now, it is compiled to
f2:
        mov     x0, 50
        index   z0.d, x0, #15
        ret

We added test cases checking
- the application of the transform during gimple for constant arguments,
- the interaction with another gimple-level optimization.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <[email protected]>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svindex_impl::fold): Add constant folding.

gcc/testsuite/
	* gcc.target/aarch64/sve/index_const_fold.c: New test.
iains pushed a commit that referenced this pull request Nov 6, 2024
We can make use of the integrated rotate step of the XAR instruction
to implement most vector integer rotates, as long we zero out one
of the input registers for it.  This allows for a lower-latency sequence
than the fallback SHL+USRA, especially when we can hoist the zeroing operation
away from loops and hot parts.  This should be safe to do for 64-bit vectors
as well even though the XAR instructions operate on 128-bit values, as the
bottom 64-bit results is later accessed through the right subregs.

This strategy is used whenever we have XAR instructions, the logic
in aarch64_emit_opt_vec_rotate is adjusted to resort to
expand_rotate_as_vec_perm only when it's expected to generate a single REV*
instruction or when XAR instructions are not present.

With this patch we can gerate for the input:
v4si
G1 (v4si r)
{
    return (r >> 23) | (r << 9);
}

v8qi
G2 (v8qi r)
{
  return (r << 3) | (r >> 5);
}
the assembly for +sve2:
G1:
        movi    v31.4s, 0
        xar     z0.s, z0.s, z31.s, #23
        ret

G2:
        movi    v31.4s, 0
        xar     z0.b, z0.b, z31.b, #5
        ret

instead of the current:
G1:
        shl     v31.4s, v0.4s, 9
        usra    v31.4s, v0.4s, 23
        mov     v0.16b, v31.16b
        ret
G2:
        shl     v31.8b, v0.8b, 3
        usra    v31.8b, v0.8b, 5
        mov     v0.8b, v31.8b
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <[email protected]>

gcc/

	* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add
	generation of XAR sequences when possible.

gcc/testsuite/

	* gcc.target/aarch64/rotate_xar_1.c: New test.
NinaRanns pushed a commit that referenced this pull request Sep 12, 2025
Unlike Advanced SIMD, SVE has instruction to perform smin, smax, umin, umax
on 64-bit elements.  Thus, we can use them with the fixed-width V2DImode
expander.  Most of the machinery is already there on the define_insn side,
supporting V2DImode operands of the SVE pattern.  We just need to wire up
the RTL emission to the v2di standard names for the TARGET_SVE case.

So for the smin case we now generate:
min_di:
        ldr     q30, [x0]
        ptrue   p7.b, all
        ldr     q31, [x1]
        smin    z30.d, p7/m, z30.d, z31.d
        str     q30, [x2]
        ret

min_imm_di:
        ldr     q31, [x0]
        smin    z31.d, z31.d, #5
        str     q31, [x2]
        ret

instead of the previous:
min_di:
        ldr     q30, [x0]
        ldr     q31, [x1]
        cmgt    v29.2d, v30.2d, v31.2d
        bsl     v29.16b, v31.16b, v30.16b
        str     q29, [x2]
        ret

min_imm_di:
        ldr     q31, [x0]
        mov     z30.d, #5
        cmgt    v29.2d, v30.2d, v31.2d
        bsl     v29.16b, v31.16b, v30.16b
        str     q29, [x2]
        ret

The register operand case is the same length, though the new ptrue can now be
shared and moved away.  But the immediate operand case is obviously better
as the SVE immediate form doesn't require a predicate operand.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <[email protected]>

gcc/

	* config/aarch64/iterators.md (sve_di_suf): New mode attribute.
	* config/aarch64/aarch64-sve.md (<optab><mode>3 SVE_INT_BINARY_MULTI):
	Rename to...
	(<optab><mode>3<sve_di_suf>): ... This.  Use SVE_I_SIMD_DI mode
	iterator.
	* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Use the above
	for TARGET_SVE.

gcc/testsuite/

	* gcc.target/aarch64/sve/usminmax_di.c: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants