A batch of improvements for the '__emit' operator #421

Daniel-Cortez · 2019-05-04T18:37:12Z

What this PR does / why we need it:

Added 'universal' pseudo-opcodes, designed to be primarily used in macros.
load.u.pri/alt - load value (may be an expression) into PRI/ALT.
stor.u.pri/alt - store value in PRI/ALT into the operand.
addr.u.pri/alt - load operand address into PRI/ALT.
push.u - push value (may be an expression) into the stack.
push.adr.u - obtain operand address and push it into the stack.
zero.u - write value 0 into the operand.
inc.u - increase the value stored in the operand by 1.
dec.u - decrease the value stored in the operand by 1.

Example:

#define getval(%0) __emit(load.u.pri %0)
#define addrof(%0) __emit(addr.u.pri %0)

const global_const = 0x1234;
new global_var = 0x5678;

main()
{
	const local_const = 0x4321;
	new local_var = 0x8765;
	static local_static_var = 0x9ABC;
	static local_static_array[2];

	printf("%08x", getval(global_const)); // 00001234
	printf("%08x", getval(global_var)); // 00005678
	printf("%08x", getval(local_const)); // 00004321
	printf("%08x", getval(local_var)); // 00008765
	printf("%08x", getval(local_static_var + global_const)); // 0000ACF0
	printf("%08x", addrof(global_var)); // 00000000
	printf("%08x", addrof(local_var)); // 00004094
	printf("%08x", addrof(local_static_var)); // 00000004
	printf("%08x", addrof(local_static_array[1])); // 0000000C
}

The compiler now issues an error if the stack offset/data address isn't a multiple of cell size.
NOTE: For this purpose I reused error 011 which previously wasn't used anywhere.

__emit load.s.pri 5; // error 011: stack offset/data address must be a multiple of cell size

Fixed crash (Crash if unused emit native #412) when referencing a previously unused native function.
Removed index check for operands of lctrl and sctrl.
Opcodes stor.s.pri/alt, inc.s, dec.s, and zero.s don't accept variables and arrays passed by reference anymore.
Allowed negative offsets for arguments of type 'local variable' (Operator emit negative values #413).
Added tests.
NOTE: Since the compiler can only display up to 26 errors, I had to split the tests into 7 different *.pwn files (+ the extra one to check the generated bytecode).
Fixed a lot of minor bugs discovered with the newly added tests.

Which issue(s) this PR fixes:

Closes #346
Fixes #412
Fixes #413

What kind of pull this is:

A Bug Fix
A New Feature
Some repository meta (documentation, etc)
Other

Additional Documentation:

…iple of cell size

…fset'

…ments of type 'local variable'

…s of type 'label'

… arguments) to 'stor.s.pri/alt', 'inc.s', 'dec.s' and 'zero.s'

…ges as "-reference-"

… sign Example: L1: __emit const.pri -L1; Before: error 001: expected token: "-any value-", but found "-L1" After: error 001: expected token: "-any value-", but found "-(-label-)"

…f the '-' sign

Daniel-Cortez · 2019-05-04T18:54:51Z

There are a few things I'm still not sure about:

Array handling is only done for pseudo-opcodes load.u.pri/alt and push.u - this is because for these opcodes I reused the standard expression parser (function expression() from sc3.c).
For the other pseudo-instructions the operand is not an expression but an lvalue, so it's required to know its address, and I couldn't find any existing functionality for this task in sc3.c.
As a temporary solution I used function expression() for those opcodes as well: if the user specifies a variable or a reference (iVARIABLE/iREFERENCE) the function returns the corresponding symbol *, and it's possible to know its address (sym->addr), but if the user specifies an array cell (iARRAYCELL; e.g. arr[0]), there seems to be no way to obtain its address.
I have no idea how to implement array handling for those opcodes in a good way, without reinventing the wheel, so any help or suggestions for this would be appreciated.
Currently instructions casetbl and case can be used in single-instruction __emit statements and in expressions, but it's easy to make the compiler generate invalid code:

__emit casetbl 2 lbl_default;
__emit case 0 lbl_case0;
new x = 0; // This line will become a part of the case table, which is obviously wrong
__emit case 1 lbl_case1;

I think it would be better to only allow the use of those instructions in block mode, like this:

__emit
{
	casetbl 2 :lbl_default
	case 0 :lbl_case0
	case 1 :lbl_case1
lbl_default:
	// ...
lbl_case0:
	// ...
lbl_case1:
	// ...
}

That way it would be impossible to put any other code inbetween case table entries, but this would require adding a new error and I'm not sure how to phrase such error (not a native English speaker) - any help/suggestions on how to properly phrase it would be nice.

If and when this is done, it would be possible to improve control over the case entries:

Check if the number of entries matches the number specified in the casetbl instruction.
Make sure the entries in the table are sorted by value (in the Implementer's Guide it's said they must be sorted so the Abstract Machine could use binary search on them).

Daniel-Cortez · 2019-05-28T18:51:07Z

Ok, I reimplemented the pseudo-opcodes, now all of them can handle array access.
The only exception is that array characters (e.g. array{n}) are not allowed in push.u.adr: I made it that way, because normally array characters can't be passed to functions by reference.

I still need help with formulating the error messages in order to improve control over case table entries made with __emit (see the previous post).
But if that feature isn't needed much, then I suppose this PR should be ready to be merged.

Apparently shifting a 32-bit signed value by 31 bits is UB...

Zeex · 2019-06-15T16:26:01Z

Looks good 👍 though I'm not well versed in the new __emit stuff

Daniel-Cortez added 16 commits April 27, 2019 23:34

__emit: Issue an error if the stack offset/data address is not a mult…

4c8d56c

…iple of cell size

__emit: Don't accept numeric constants for arguments of type 'data of…

00e3e00

…fset'

__emit: Display proper argument type names in error messages for argu…

e93817b

…ments of type 'local variable'

__emit: Don't mark functions as uWRITTEN

4c79811

__emit: Properly display error messages on type mismatch for argument…

f6c9db9

…s of type 'label'

__emit: Take in account local symbols for arguments of type 'function'

1ad1448

__emit: Allow expressions for arguments of type 'shift'

9fb9bbb

__emit: Remove index check for opcodes 'lctrl' and 'sctrl'

69a573a

__emit: Fix crash on attempt to reference a native

3c1ac1e

__emit: Code cleanup

32528e7

__emit: Do not allow to pass references (passed by reference function…

86f2b78

… arguments) to 'stor.s.pri/alt', 'inc.s', 'dec.s' and 'zero.s'

__emit: Display the type of passed-by-reference arrays in error messa…

6277969

…ges as "-reference-"

Fix potential buffer overrun in #emit and __emit

4b490e8

__emit: Change the format of errors related to invalid use of the '-'…

40dace7

… sign Example: L1: __emit const.pri -L1; Before: error 001: expected token: "-any value-", but found "-L1" After: error 001: expected token: "-any value-", but found "-(-label-)"

__emit: Fix previously undetected error case related to invalid use o…

3c370ff

…f the '-' sign

__emit: Allow negative offsets for arguments of type 'local variable'

4c260e6

Daniel-Cortez requested a review from a team as a code owner May 4, 2019 18:37

Daniel-Cortez force-pushed the emit-3 branch from 02bc9d9 to e2db7d8 Compare May 28, 2019 18:15

Daniel-Cortez added 3 commits June 3, 2019 17:04

__emit: Implement pseudo-opcodes

2cfcd21

__emit: Add tests

dfefdf3

__emit: Make sure the opcode table is sorted

8394b5a

Daniel-Cortez force-pushed the emit-3 branch from e2db7d8 to 8394b5a Compare June 3, 2019 12:16

Daniel-Cortez added 2 commits June 8, 2019 02:59

__emit: Remove excessive local block

991eed9

__emit: Fix undefined behavior

5c4e0c2

Apparently shifting a 32-bit signed value by 31 bits is UB...

Daniel-Cortez force-pushed the emit-3 branch from bad16ae to 5c4e0c2 Compare June 7, 2019 20:03

Zeex approved these changes Jun 15, 2019

View reviewed changes

Zeex merged commit 8561b38 into pawn-lang:dev Jun 15, 2019

Daniel-Cortez deleted the emit-3 branch June 15, 2019 22:43

This was referenced Jun 16, 2019

Crash if unused emit native #412

Closed

Operator emit negative values #413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A batch of improvements for the '__emit' operator #421

A batch of improvements for the '__emit' operator #421

Daniel-Cortez commented May 4, 2019 •

edited

Loading

Daniel-Cortez commented May 4, 2019

Daniel-Cortez commented May 28, 2019 •

edited

Loading

Zeex commented Jun 15, 2019

A batch of improvements for the '__emit' operator #421

A batch of improvements for the '__emit' operator #421

Conversation

Daniel-Cortez commented May 4, 2019 • edited Loading

Daniel-Cortez commented May 4, 2019

Daniel-Cortez commented May 28, 2019 • edited Loading

Zeex commented Jun 15, 2019

Daniel-Cortez commented May 4, 2019 •

edited

Loading

Daniel-Cortez commented May 28, 2019 •

edited

Loading