Skip to content

Commit 4308f03

Browse files
committed
[lld-macho] Align cstrings less conservatively
Previously, we aligned every cstring to 16 bytes as a temporary hack to deal with #50135. However, it was highly wasteful in terms of binary size. To recap, in contrast to ELF, which puts strings that need different alignments into different sections, `clang`'s Mach-O backend puts them all in one section. Strings that need to be aligned have the .p2align directive emitted before them, which simply translates into zero padding in the object file. In other words, we have to infer the alignment of the cstrings from their addresses. We differ slightly from ld64 in how we've chosen to align these cstrings. Both LLD and ld64 preserve the number of trailing zeros in each cstring's address in the input object files. When deduplicating identical cstrings, both linkers pick the cstring whose address has more trailing zeros, and preserve the alignment of that address in the final binary. However, ld64 goes a step further and also preserves the offset of the cstring from the last section-aligned address. I.e. if a cstring is at offset 18 in the input, with a section alignment of 16, then both LLD and ld64 will ensure the final address is 2-byte aligned (since `18 == 16 + 2`). But ld64 will also ensure that the final address is of the form 16 * k + 2 for some k (which implies 2-byte alignment). Note that ld64's heuristic means that a dedup'ed cstring's final address is dependent on the order of the input object files. E.g. if in addition to the cstring at offset 18 above, we have a duplicate one in another file with a `.cstring` section alignment of 2 and an offset of zero, then ld64 will pick the cstring from the object file earlier on the command line (since both have the same number of trailing zeros in their address). So the final cstring may either be at some address `16 * k + 2` or at some address `2 * k`. I've opted not to follow this behavior primarily for implementation simplicity, and secondarily to save a few more bytes. It's not clear to me that preserving the section alignment + offset is ever necessary, and there are many cases that are clearly redundant. In particular, if an x86_64 object file contains some strings that are accessed via SIMD instructions, then the .cstring section in the object file will be 16-byte-aligned (since SIMD requires its operand addresses to be 16-byte aligned). However, there will typically also be other cstrings in the same file that aren't used via SIMD and don't need this alignment. They will be emitted at some arbitrary address `A`, but ld64 will treat them as being 16-byte aligned with an offset of `16 % A`. I have verified that the two repros in #50135 work well with the new alignment behavior. Fixes #54036. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D121342
1 parent db65429 commit 4308f03

File tree

5 files changed

+229
-38
lines changed

5 files changed

+229
-38
lines changed

lld/MachO/SyntheticSections.cpp

Lines changed: 76 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1341,7 +1341,10 @@ void CStringSection::finalizeContents() {
13411341
for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) {
13421342
if (!isec->pieces[i].live)
13431343
continue;
1344-
uint32_t pieceAlign = MinAlign(isec->pieces[i].inSecOff, align);
1344+
// See comment above DeduplicatedCStringSection for how alignment is
1345+
// handled.
1346+
uint32_t pieceAlign =
1347+
1 << countTrailingZeros(isec->align | isec->pieces[i].inSecOff);
13451348
offset = alignTo(offset, pieceAlign);
13461349
isec->pieces[i].outSecOff = offset;
13471350
isec->isFinal = true;
@@ -1351,45 +1354,89 @@ void CStringSection::finalizeContents() {
13511354
}
13521355
size = offset;
13531356
}
1357+
13541358
// Mergeable cstring literals are found under the __TEXT,__cstring section. In
13551359
// contrast to ELF, which puts strings that need different alignments into
13561360
// different sections, clang's Mach-O backend puts them all in one section.
13571361
// Strings that need to be aligned have the .p2align directive emitted before
1358-
// them, which simply translates into zero padding in the object file.
1362+
// them, which simply translates into zero padding in the object file. In other
1363+
// words, we have to infer the desired alignment of these cstrings from their
1364+
// addresses.
13591365
//
1360-
// I *think* ld64 extracts the desired per-string alignment from this data by
1361-
// preserving each string's offset from the last section-aligned address. I'm
1362-
// not entirely certain since it doesn't seem consistent about doing this, and
1363-
// in fact doesn't seem to be correct in general: we can in fact can induce ld64
1364-
// to produce a crashing binary just by linking in an additional object file
1365-
// that only contains a duplicate cstring at a different alignment. See PR50563
1366-
// for details.
1366+
// We differ slightly from ld64 in how we've chosen to align these cstrings.
1367+
// Both LLD and ld64 preserve the number of trailing zeros in each cstring's
1368+
// address in the input object files. When deduplicating identical cstrings,
1369+
// both linkers pick the cstring whose address has more trailing zeros, and
1370+
// preserve the alignment of that address in the final binary. However, ld64
1371+
// goes a step further and also preserves the offset of the cstring from the
1372+
// last section-aligned address. I.e. if a cstring is at offset 18 in the
1373+
// input, with a section alignment of 16, then both LLD and ld64 will ensure the
1374+
// final address is 2-byte aligned (since 18 == 16 + 2). But ld64 will also
1375+
// ensure that the final address is of the form 16 * k + 2 for some k.
13671376
//
1368-
// On x86_64, the cstrings we've seen so far that require special alignment are
1369-
// all accessed by SIMD operations -- x86_64 requires SIMD accesses to be
1370-
// 16-byte-aligned. arm64 also seems to require 16-byte-alignment in some cases
1371-
// (PR50791), but I haven't tracked down the root cause. So for now, I'm just
1372-
// aligning all strings to 16 bytes. This is indeed wasteful, but
1373-
// implementation-wise it's simpler than preserving per-string
1374-
// alignment+offsets. It also avoids the aforementioned crash after
1375-
// deduplication of differently-aligned strings. Finally, the overhead is not
1376-
// huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead
1377-
// when linking chromium_framework on x86_64.
1378-
DeduplicatedCStringSection::DeduplicatedCStringSection()
1379-
: builder(StringTableBuilder::RAW, /*Alignment=*/16) {}
1380-
1377+
// Note that ld64's heuristic means that a dedup'ed cstring's final address is
1378+
// dependent on the order of the input object files. E.g. if in addition to the
1379+
// cstring at offset 18 above, we have a duplicate one in another file with a
1380+
// `.cstring` section alignment of 2 and an offset of zero, then ld64 will pick
1381+
// the cstring from the object file earlier on the command line (since both have
1382+
// the same number of trailing zeros in their address). So the final cstring may
1383+
// either be at some address `16 * k + 2` or at some address `2 * k`.
1384+
//
1385+
// I've opted not to follow this behavior primarily for implementation
1386+
// simplicity, and secondarily to save a few more bytes. It's not clear to me
1387+
// that preserving the section alignment + offset is ever necessary, and there
1388+
// are many cases that are clearly redundant. In particular, if an x86_64 object
1389+
// file contains some strings that are accessed via SIMD instructions, then the
1390+
// .cstring section in the object file will be 16-byte-aligned (since SIMD
1391+
// requires its operand addresses to be 16-byte aligned). However, there will
1392+
// typically also be other cstrings in the same file that aren't used via SIMD
1393+
// and don't need this alignment. They will be emitted at some arbitrary address
1394+
// `A`, but ld64 will treat them as being 16-byte aligned with an offset of `16
1395+
// % A`.
13811396
void DeduplicatedCStringSection::finalizeContents() {
1382-
// Add all string pieces to the string table builder to create section
1383-
// contents.
1397+
// Find the largest alignment required for each string.
1398+
for (const CStringInputSection *isec : inputs) {
1399+
for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) {
1400+
const StringPiece &piece = isec->pieces[i];
1401+
if (!piece.live)
1402+
continue;
1403+
auto s = isec->getCachedHashStringRef(i);
1404+
assert(isec->align != 0);
1405+
uint8_t trailingZeros = countTrailingZeros(isec->align | piece.inSecOff);
1406+
auto it = stringOffsetMap.insert(
1407+
std::make_pair(s, StringOffset(trailingZeros)));
1408+
if (!it.second && it.first->second.trailingZeros < trailingZeros)
1409+
it.first->second.trailingZeros = trailingZeros;
1410+
}
1411+
}
1412+
1413+
// Assign an offset for each string and save it to the corresponding
1414+
// StringPieces for easy access.
13841415
for (CStringInputSection *isec : inputs) {
1385-
for (size_t i = 0, e = isec->pieces.size(); i != e; ++i)
1386-
if (isec->pieces[i].live)
1387-
isec->pieces[i].outSecOff =
1388-
builder.add(isec->getCachedHashStringRef(i));
1416+
for (size_t i = 0, e = isec->pieces.size(); i != e; ++i) {
1417+
if (!isec->pieces[i].live)
1418+
continue;
1419+
auto s = isec->getCachedHashStringRef(i);
1420+
auto it = stringOffsetMap.find(s);
1421+
assert(it != stringOffsetMap.end());
1422+
StringOffset &offsetInfo = it->second;
1423+
if (offsetInfo.outSecOff == UINT64_MAX) {
1424+
offsetInfo.outSecOff = alignTo(size, 1 << offsetInfo.trailingZeros);
1425+
size = offsetInfo.outSecOff + s.size();
1426+
}
1427+
isec->pieces[i].outSecOff = offsetInfo.outSecOff;
1428+
}
13891429
isec->isFinal = true;
13901430
}
1431+
}
13911432

1392-
builder.finalizeInOrder();
1433+
void DeduplicatedCStringSection::writeTo(uint8_t *buf) const {
1434+
for (const auto &p : stringOffsetMap) {
1435+
StringRef data = p.first.val();
1436+
uint64_t off = p.second.outSecOff;
1437+
if (!data.empty())
1438+
memcpy(buf + off, data.data(), data.size());
1439+
}
13931440
}
13941441

13951442
// This section is actually emitted as __TEXT,__const by ld64, but clang may

lld/MachO/SyntheticSections.h

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -531,13 +531,19 @@ class CStringSection : public SyntheticSection {
531531

532532
class DeduplicatedCStringSection final : public CStringSection {
533533
public:
534-
DeduplicatedCStringSection();
535-
uint64_t getSize() const override { return builder.getSize(); }
534+
uint64_t getSize() const override { return size; }
536535
void finalizeContents() override;
537-
void writeTo(uint8_t *buf) const override { builder.write(buf); }
536+
void writeTo(uint8_t *buf) const override;
538537

539538
private:
540-
llvm::StringTableBuilder builder;
539+
struct StringOffset {
540+
uint8_t trailingZeros;
541+
uint64_t outSecOff = UINT64_MAX;
542+
543+
explicit StringOffset(uint8_t zeros) : trailingZeros(zeros) {}
544+
};
545+
llvm::DenseMap<llvm::CachedHashStringRef, StringOffset> stringOffsetMap;
546+
size_t size = 0;
541547
};
542548

543549
/*

lld/MachO/ld64-vs-lld.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,12 @@ programs which have (incorrectly) relied on string deduplication always
1313
occurring. In particular, programs which compare string literals via pointer
1414
equality must be fixed to use value equality instead.
1515

16+
String Alignment
17+
****************
18+
LLD is slightly less conservative about aligning cstrings, allowing it to pack
19+
them more compactly. This should not result in any meaningful semantic
20+
difference.
21+
1622
``-no_deduplicate`` Flag
1723
**********************
1824
- LD64:

lld/test/MachO/cstring-align.s

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# REQUIRES: x86
2+
# RUN: rm -rf %t; split-file %s %t
3+
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-empty.s -o %t/align-empty.o
4+
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-4-0.s -o %t/align-4-0.o
5+
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-4-2.s -o %t/align-4-2.o
6+
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-16-0.s -o %t/align-16-0.o
7+
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-16-2.s -o %t/align-16-2.o
8+
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-16-4.s -o %t/align-16-4.o
9+
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-16-8.s -o %t/align-16-8.o
10+
11+
## Check that we preserve the alignment of cstrings. Alignment is determined
12+
## not by section alignment but by the number of trailing zeros of the cstring's
13+
## address in the input object file.
14+
15+
## The non-dedup case is not particularly interesting since the null bytes don't
16+
## get dedup'ed, meaning that the output strings get their offsets "naturally"
17+
## preserved.
18+
19+
# RUN: %lld -dylib %t/align-empty.o %t/align-4-0.o %t/align-16-0.o -o %t/align-4-0-16-0
20+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-4-0-16-0 | \
21+
# RUN: FileCheck %s -D#OFF1=4 -D#OFF2=16
22+
# RUN: %lld -dylib %t/align-empty.o %t/align-16-0.o %t/align-4-0.o -o %t/align-16-0-4-0
23+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-16-0-4-0 | \
24+
# RUN: FileCheck %s -D#OFF1=16 -D#OFF2=20
25+
26+
# RUN: %lld -dylib %t/align-empty.o %t/align-4-2.o %t/align-16-0.o -o %t/align-4-2-16-0
27+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-4-2-16-0 | \
28+
# RUN: FileCheck %s -D#OFF1=6 -D#OFF2=16
29+
# RUN: %lld -dylib %t/align-empty.o %t/align-16-0.o %t/align-4-2.o -o %t/align-16-0-4-2
30+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-16-0-4-2 | \
31+
# RUN: FileCheck %s -D#OFF1=16 -D#OFF2=22
32+
33+
# RUN: %lld -dylib %t/align-empty.o %t/align-4-0.o %t/align-16-2.o -o %t/align-4-0-16-2
34+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-4-0-16-2 | \
35+
# RUN: FileCheck %s -D#OFF1=4 -D#OFF2=18
36+
# RUN: %lld -dylib %t/align-empty.o %t/align-16-2.o %t/align-4-0.o -o %t/align-16-2-4-0
37+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/align-16-2-4-0 | \
38+
# RUN: FileCheck %s -D#OFF1=18 -D#OFF2=20
39+
40+
# CHECK: Contents of (__TEXT,__cstring) section
41+
# CHECK-NEXT: [[#%.16x,START:]] {{$}}
42+
# CHECK: [[#%.16x,START+OFF1]] a{{$}}
43+
# CHECK: [[#%.16x,START+OFF2]] a{{$}}
44+
# CHECK-EMPTY:
45+
46+
## The dedup cases are more interesting...
47+
48+
## Same offset, different alignments => pick higher alignment
49+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-0.o %t/align-16-0.o -o %t/dedup-4-0-16-0
50+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-0-16-0 | \
51+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=16
52+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-0.o %t/align-4-0.o -o %t/dedup-16-0-4-0
53+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-0-4-0 | \
54+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=16
55+
56+
## 16 byte alignment vs 2 byte offset => align to 16 bytes
57+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-2.o %t/align-16-0.o -o %t/dedup-4-2-16-0
58+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-2-16-0 | \
59+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=16
60+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-0.o %t/align-4-2.o -o %t/dedup-16-0-4-2
61+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-0-4-2 | \
62+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=16
63+
64+
## 4 byte alignment vs 2 byte offset => align to 4 bytes
65+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-0.o %t/align-16-2.o -o %t/dedup-4-0-16-2
66+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-0-16-2 | \
67+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=4
68+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-2.o %t/align-4-0.o -o %t/dedup-16-2-4-0
69+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-2-4-0 | \
70+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=4
71+
72+
## Both inputs are 4-byte aligned, one via offset and the other via section alignment
73+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-0.o %t/align-16-4.o -o %t/dedup-4-0-16-4
74+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-0-16-4 | \
75+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=4
76+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-4.o %t/align-4-0.o -o %t/dedup-16-4-4-0
77+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-4-4-0 | \
78+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=4
79+
80+
## 8-byte offset vs 4-byte section alignment => align to 8 bytes
81+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-4-0.o %t/align-16-8.o -o %t/dedup-4-0-16-8
82+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-4-0-16-8 | \
83+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=8
84+
# RUN: %lld -dylib --deduplicate-literals %t/align-empty.o %t/align-16-8.o %t/align-4-0.o -o %t/dedup-16-8-4-0
85+
# RUN: llvm-objdump --macho --section="__TEXT,__cstring" %t/dedup-16-8-4-0 | \
86+
# RUN: FileCheck %s --check-prefix=DEDUP -D#OFF=8
87+
88+
# DEDUP: Contents of (__TEXT,__cstring) section
89+
# DEDUP-NEXT: [[#%.16x,START:]] {{$}}
90+
# DEDUP: [[#%.16x,START+OFF]] a{{$}}
91+
# DEDUP-EMPTY:
92+
93+
#--- align-empty.s
94+
## We use this file to create an empty string at the start of every output
95+
## file's .cstring section. This makes the test cases more interesting since LLD
96+
## can't place the string "a" at the trivially-aligned zero offset.
97+
.cstring
98+
.p2align 2
99+
.asciz ""
100+
101+
#--- align-4-0.s
102+
.cstring
103+
.p2align 2
104+
.asciz "a"
105+
106+
#--- align-4-2.s
107+
.cstring
108+
.p2align 2
109+
.zero 0x2
110+
.asciz "a"
111+
112+
#--- align-16-0.s
113+
.cstring
114+
.p2align 4
115+
.asciz "a"
116+
117+
#--- align-16-2.s
118+
.cstring
119+
.p2align 4
120+
.zero 0x2
121+
.asciz "a"
122+
123+
#--- align-16-4.s
124+
.cstring
125+
.p2align 4
126+
.zero 0x4
127+
.asciz "a"
128+
129+
#--- align-16-8.s
130+
.cstring
131+
.p2align 4
132+
.zero 0x8
133+
.asciz "a"

lld/test/MachO/cstring-dedup.s

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,11 @@
88
# RUN: llvm-objdump --macho --section="__DATA,ptrs" --syms %t/test | FileCheck %s
99
# RUN: llvm-readobj --section-headers %t/test | FileCheck %s --check-prefix=HEADER
1010

11-
## Make sure we only have 3 deduplicated strings in __cstring, and that they
12-
## are 16-byte-aligned.
11+
## Make sure we only have 3 deduplicated strings in __cstring.
1312
# STR: Contents of (__TEXT,__cstring) section
14-
# STR: {{.*}}0 foo
15-
# STR: {{.*}}0 barbaz
16-
# STR: {{.*}}0 {{$}}
13+
# STR: {{[[:xdigit:]]+}} foo
14+
# STR: {{[[:xdigit:]]+}} barbaz
15+
# STR: {{[[:xdigit:]]+}} {{$}}
1716

1817
## Make sure both symbol and section relocations point to the right thing.
1918
# CHECK: Contents of (__DATA,ptrs) section

0 commit comments

Comments
 (0)