Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions llvm/lib/MC/MCContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,27 @@ MCDataFragment *MCContext::allocInitialFragment(MCSection &Sec) {
MCSymbol *MCContext::getOrCreateSymbol(const Twine &Name) {
SmallString<128> NameSV;
StringRef NameRef = Name.toStringRef(NameSV);
if (NameRef.contains('\\')) {
NameSV = NameRef;
size_t S = 0;
// Support escaped \\ and \" as in GNU Assembler. GAS issues a warning for
// other characters following \\, which we do not implement due to code
// structure.
for (size_t I = 0, E = NameSV.size(); I < E; ++I) {
char C = NameSV[I];
if (C == '\\') {
switch (NameSV[I + 1]) {
case '"':
case '\\':
C = NameSV[++I];
break;
}
}
NameSV[S++] = C;
}
NameSV.resize(S);
NameRef = NameSV;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this unescaping only be done when parsing textual assembly in AsmParser, but not when programmatically creating symbols?

I'd expect the LLVM IR @"\\\22" = constant i8 0 to result in "\\\"": rather than "\"":, as it does with this patch. (Previously it resulted in "\\"":, which is also wrong, but at least only affects the case where you're emitting assembly.)

See rust-lang/rust#146065 for a regression from this change. Of course we could easily apply additional escaping on the Rust side, but I think the current behavior here isn't right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just saw this message. I agree that ideally this should be limited to texual assembly in MC/MCParser/AsmParser.cpp. Unfortunately this is challenging because the wide-used API AsmParser::parseIdentifier (used ~107 times) takes a StringRef instead of an owned string. It seems a lot of work to fix parseIdentifier...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a problem. I put up #158106 to put at least some of these uses behind a common API.

A simple option would be to just keep StringRef and have the backing storage in an Allocator. This would avoid API changes and should not really be a performance or memory usage problem as long as quoted symbols are rare.

Though I'm not sure whether all uses of parseIdentifier should parse quoted strings or not.


assert(!NameRef.empty() && "Normal symbols cannot be unnamed!");

Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/MC/MCSymbol.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ void MCSymbol::print(raw_ostream &OS, const MCAsmInfo *MAI) const {
OS << "\\n";
else if (C == '"')
OS << "\\\"";
else if (C == '\\')
OS << "\\\\";
else
OS << C;
}
Expand Down
3 changes: 3 additions & 0 deletions llvm/test/MC/AsmParser/quoted.s
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
"a b":
call "a b"

# CHECK: "a b\\":
"a b\\":

#--- err.s
"a\":
# ERR: 1:2: error: unterminated string constant
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/MC/COFF/safeseh.s
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@

// check that we quote the output of .safeseh

.safeseh "\01foo"
// CHECK: .safeseh "\01foo"
.safeseh "\\foo"
// CHECK: .safeseh "\\foo"
22 changes: 19 additions & 3 deletions llvm/test/MC/ELF/symbol-names.s
Original file line number Diff line number Diff line change
@@ -1,12 +1,28 @@
// RUN: llvm-mc -triple i686-pc-linux -filetype=obj %s -o - | llvm-readobj --symbols - | FileCheck %s
// RUN: llvm-mc -triple=x86_64 -filetype=obj %s | llvm-objdump -tdr - | FileCheck %s

// MC allows ?'s in symbol names as an extension.

// CHECK-LABEL:SYMBOL TABLE:
// CHECK-NEXT: 0000000000000001 l F .text 0000000000000000 a"b\{{$}}
// CHECK-NEXT: 0000000000000006 l .text 0000000000000000 a\{{$}}
// CHECK-NEXT: 0000000000000000 g F .text 0000000000000000 foo?bar
// CHECK-NEXT: 0000000000000000 *UND* 0000000000000000 a"b\q{{$}}
// CHECK-EMPTY:

.text
.globl foo?bar
.type foo?bar, @function
foo?bar:
ret

// CHECK: Symbol
// CHECK: Name: foo?bar
// CHECK-LABEL:<a"b\>:
// CHECK-NEXT: callq {{.*}} <a"b\>
// CHECK-LABEL:<a\>:
// CHECK-NEXT: callq {{.*}}
// CHECK-NEXT: R_X86_64_PLT32 a"b\q-0x4
.type "a\"b\\", @function
"a\"b\\":
call "a\"b\\"
"a\\":
/// GAS emits a warning for \q
call "a\"b\q"
Loading