Skip to content

Conversation

@s-barannikov
Copy link
Contributor

@s-barannikov s-barannikov commented Aug 20, 2025

We used to abuse Operands list to store instruction encoding's DecoderMethod there.
Let's store it in the InstructionEncoding class instead, where it belongs.

@s-barannikov s-barannikov force-pushed the tablegen/decoder/decoder-abuse branch 2 times, most recently from 9718c30 to 21ae6f1 Compare August 20, 2025 05:56
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually helped to detect two bugs in M68k. One is fixed by #154481, the other is more difficult to fix (crash on attempt to disassemble instructions reading/writing SR or CCR).

I'll fix the FIXME in a separate PR to keep this as NFC.

@s-barannikov s-barannikov changed the title [TableGen][DecoderEmitter] Add DecoderMethod to InstructionEncoding [TableGen][DecoderEmitter] Add DecoderMethod to InstructionEncoding (NFC) Aug 20, 2025
@s-barannikov s-barannikov marked this pull request as ready for review August 20, 2025 07:47
@llvmbot
Copy link
Member

llvmbot commented Aug 20, 2025

@llvm/pr-subscribers-tablegen

Author: Sergei Barannikov (s-barannikov)

Changes

We used to abuse Operands list to store instruction encoding's DecoderMethod there.
Let's store it in the InstructionEncoding class instead, where it belongs.


Full diff: https://github.com/llvm/llvm-project/pull/154477.diff

1 Files Affected:

  • (modified) llvm/utils/TableGen/DecoderEmitter.cpp (+39-9)
diff --git a/llvm/utils/TableGen/DecoderEmitter.cpp b/llvm/utils/TableGen/DecoderEmitter.cpp
index 1924cf8a3adcd..f8f84a900d0dd 100644
--- a/llvm/utils/TableGen/DecoderEmitter.cpp
+++ b/llvm/utils/TableGen/DecoderEmitter.cpp
@@ -143,6 +143,14 @@ class InstructionEncoding {
   /// The size of this encoding, in bits.
   unsigned BitWidth;
 
+  /// The name of the function to use for decoding. May be an empty string,
+  /// meaning the decoder is generated.
+  StringRef DecoderMethod;
+
+  /// Whether the custom decoding function always succeeds. Should not be used
+  /// if the decoder is generated.
+  bool HasCompleteDecoder = true;
+
   /// Information about the operands' contribution to this encoding.
   SmallVector<OperandInfo, 16> Operands;
 
@@ -171,6 +179,16 @@ class InstructionEncoding {
   /// Returns the size of this encoding, in bits.
   unsigned getBitWidth() const { return BitWidth; }
 
+  /// Returns the name of the function to use for decoding, or an empty string
+  /// if the decoder is generated.
+  StringRef getDecoderMethod() const { return DecoderMethod; }
+
+  /// Returns whether the custom decoding function always succeeds.
+  bool hasCompleteDecoder() const {
+    assert(!DecoderMethod.empty());
+    return HasCompleteDecoder;
+  }
+
   /// Returns information about the operands' contribution to this encoding.
   ArrayRef<OperandInfo> getOperands() const { return Operands; }
 
@@ -1246,10 +1264,25 @@ bool FilterChooser::emitBinaryParser(raw_ostream &OS, indent Indent,
 
 bool FilterChooser::emitDecoder(raw_ostream &OS, indent Indent,
                                 unsigned EncodingID) const {
-  bool HasCompleteDecoder = true;
+  const InstructionEncoding &Encoding = Encodings[EncodingID];
+
+  // If a custom instruction decoder was specified, use that.
+  StringRef DecoderMethod = Encoding.getDecoderMethod();
+  if (!DecoderMethod.empty()) {
+    bool HasCompleteDecoder = Encoding.hasCompleteDecoder();
+    OS << Indent << "if (!Check(S, " << DecoderMethod
+       << "(MI, insn, Address, Decoder))) { "
+       << (HasCompleteDecoder ? "" : "DecodeComplete = false; ")
+       << "return MCDisassembler::Fail; }\n";
+    return HasCompleteDecoder;
+  }
 
-  for (const OperandInfo &Op : Encodings[EncodingID].getOperands()) {
-    // If a custom instruction decoder was specified, use that.
+  bool HasCompleteDecoder = true;
+  for (const OperandInfo &Op : Encoding.getOperands()) {
+    // FIXME: This is broken. If there is an operand that doesn't contribute
+    //   to the encoding, we generate the same code as if the decoder method
+    //   was specified on the encoding. And then we stop, ignoring the rest
+    //   of the operands. M68k disassembler experiences this.
     if (Op.numFields() == 0 && !Op.Decoder.empty()) {
       HasCompleteDecoder = Op.HasCompleteDecoder;
       OS << Indent << "if (!Check(S, " << Op.Decoder
@@ -1258,7 +1291,6 @@ bool FilterChooser::emitDecoder(raw_ostream &OS, indent Indent,
          << "return MCDisassembler::Fail; }\n";
       break;
     }
-
     HasCompleteDecoder &= emitBinaryParser(OS, Indent, Op);
   }
   return HasCompleteDecoder;
@@ -1971,11 +2003,9 @@ unsigned InstructionEncoding::populateEncoding() {
 
   // If the instruction has specified a custom decoding hook, use that instead
   // of trying to auto-generate the decoder.
-  StringRef InstDecoder = EncodingDef->getValueAsString("DecoderMethod");
-  if (!InstDecoder.empty()) {
-    bool HasCompleteInstDecoder =
-        EncodingDef->getValueAsBit("hasCompleteDecoder");
-    Operands.push_back(OperandInfo(InstDecoder.str(), HasCompleteInstDecoder));
+  DecoderMethod = EncodingDef->getValueAsString("DecoderMethod");
+  if (!DecoderMethod.empty()) {
+    HasCompleteDecoder = EncodingDef->getValueAsBit("hasCompleteDecoder");
     return Bits.getNumBits();
   }
 

@s-barannikov s-barannikov force-pushed the tablegen/decoder/decoder-abuse branch 2 times, most recently from 002d434 to 335e69b Compare August 20, 2025 11:37
@s-barannikov s-barannikov force-pushed the tablegen/decoder/decoder-abuse branch from 335e69b to abaec17 Compare August 20, 2025 21:17
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@s-barannikov s-barannikov enabled auto-merge (squash) August 20, 2025 21:30
@s-barannikov s-barannikov disabled auto-merge August 20, 2025 21:30
@s-barannikov s-barannikov enabled auto-merge (squash) August 20, 2025 21:31
@s-barannikov s-barannikov merged commit 46343ca into llvm:main Aug 20, 2025
9 checks passed
@s-barannikov s-barannikov deleted the tablegen/decoder/decoder-abuse branch August 20, 2025 22:37
s-barannikov added a commit that referenced this pull request Aug 21, 2025
As the FIXME says, we might generate the wrong code to decode an
instruction if it had an operand with no encoding bits. An example is
M68k's `MOV16ds` that is defined as follows:

```
dag OutOperandList = (outs MxDRD16:$dst);
dag InOperandList = (ins SRC:$src);
list<Register> Uses = [SR];
string AsmString = "move.w\t$src, $dst"
dag Inst = (descend { 0, 1, 0, 0, 0, 0, 0, 0, 1, 1 },
            (descend { 0, 0, 0 }, (operand "$dst", 3)));
```

The `$src` operand is not encoded, but what we see in the decoder is:
```C++
    tmp = fieldFromInstruction(insn, 0, 3);
    if (!Check(S, DecodeDR16RegisterClass(MI, tmp, Address, Decoder)))
    { return MCDisassembler::Fail; }
    if (!Check(S, DecodeSRCRegisterClass(MI, insn, Address, Decoder)))
    { return MCDisassembler::Fail; }
    return S;
```

This calls DecodeSRCRegisterClass passing it `insn` instead of the value
of a field that doesn't exist. DecodeSRCRegisterClass has an
unconditional llvm_unreachable inside it.

New decoder looks like:
```C++
    tmp = fieldFromInstruction(insn, 0, 3);
    if (!Check(S, DecodeDR16RegisterClass(MI, tmp, Address, Decoder)))
    { return MCDisassembler::Fail; }
    return S;
```

We're still not disassembling this instruction right, but at least we no
longer have to provide a weird operand decoder method that accepts
instruction bits instead of operand bits.

See #154477 for the origins of the FIXME.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants