Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call ret #194

Merged
merged 2 commits into from
Jan 19, 2023
Merged

Call ret #194

merged 2 commits into from
Jan 19, 2023

Conversation

bgregoir
Copy link
Contributor

@bgregoir bgregoir commented Jul 1, 2022

Work in progress

@bgregoir
Copy link
Contributor Author

The pull request is ready for review.
We should certainly add some explanation in the change log

.gitlab-ci.yml Outdated
@@ -52,7 +52,8 @@ coq:
EXTRA_NIX_ARGUMENTS: --arg coqDeps true
extends: .common
script:
- nix-shell --arg inCI true $EXTRA_NIX_ARGUMENTS --run 'make -j$NIX_BUILD_CORES -C proofs'
- nix-shell --arg inCI true $EXTRA_NIX_ARGUMENTS --run 'make -j$NIX_BUILD_CORES -C proofs Makefile.coq'
- nix-shell --arg inCI true $EXTRA_NIX_ARGUMENTS --run 'make -j$NIX_BUILD_CORES -C proofs lang/extraction.vo'
- nix-shell --arg inCI true $EXTRA_NIX_ARGUMENTS --run 'make -j$NIX_BUILD_CORES -C compiler CIL'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.
If I remember well we discussed the possibility that the ci has an intermediate entry (i.e separating the proofs).

@@ -209,10 +210,15 @@ let memory_analysis pp_err ~debug tbl up =
let ra = V.mk "RA" (Stack Direct) (tu Arch.reg_size) L._dummy [] in
let extra =
let extra = to_save in
let extra = if rastack then ra :: extra else extra in
(* let extra = if rastack then ra :: extra else extra in *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -308,7 +318,7 @@ Proof.
case: h vt.
+ move=> i {ty} ty /eq_exprP -> vt /=.
case: i => /= [f | r]; first by apply: var_of_flagP eqm.
by apply: var_of_regP eqm.
apply: var_of_regP eqm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is surprising.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1595,7 +1667,7 @@ Proof.
eexists; first reflexivity.
exists lc; first exact: omap_lc.
by constructor => //; rewrite /setpc /= eqpc.
Qed.
Admitted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:: MkLI ii (Lgoto lcall)
(* ++ MkLI ii (LstoreLabel ra lret) *)
(* :: lstore ii rspi z Uptr glob_ra *)
++ MkLI ii (Lcall lcall)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clean the comments away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -90,8 +90,9 @@ Module MemoryI : MemoryT.
- sizes are positive
- stack does not overflow
*)
(* FIXME 0 <=? frame_size f not needed *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be fixed before or after merging this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was already fixed, I forgot to remove the comment

@vbgl vbgl force-pushed the call_ret branch 2 times, most recently from abebbe5 to 3fee871 Compare July 19, 2022 11:18
@vbgl
Copy link
Member

vbgl commented Jul 19, 2022

I’m going to rebase…

@vbgl vbgl force-pushed the call_ret branch 3 times, most recently from 5ef7122 to ab7e947 Compare July 19, 2022 11:50
@vbgl
Copy link
Member

vbgl commented Jul 19, 2022

CI is happy. Let’s wait for feedback from Jasmin users.

@tfaoliveira
Copy link
Member

CI is happy. Let’s wait for feedback from Jasmin users.

@vbgl @bgregoir, I did some experiments on libjade. I explain the procedure first, and then the results.

  1. compile libjade with Jasmin main and Jasmin call ret
  2. for some number of times (I choose 32 because it was roughly the time of getting a coffee)
    2.1 for each implementation that uses non-inline functions (kyber, x25519, shake256)
    2.1.1 run the corresponding bench for libjade-jasmin-main and libjade-jasmin-call-ret; for instance, for x25519, scalarmult is run 2K times, 3 times; the bench executable reports the best median from the 3 loops;
    2.1.2 register the difference, in %

The benchmark settings (number of iterations, etc) are in libjade/bench/common. The remaining scripts that I used for this task are not. When I get some time to test and clean them, I can push them. The CPU is a Skylake.

The results of scalarmult ref4 (~ amd64-64) and ref5 (~ amd64-51) are quite stable at a negative overhead. x25519 ref4 implementation usually runs at roughly 129500 cycles on said CPU. A -0.1% difference means ~ -130 cycles. This implementation does ~25 function calls (just for the inversion).
scalarmult

For shake256, IIRC, only Keccakf is non-inline. The reported value for each run is the average of the overheads for: 32 bytes of output and 0, 32, 128, 1024, and 16384 bytes of input;
shake256

There seems to be a bit of overhead for shake. Next, you can take a look at the number of cycles (for outlen=32 and inlen=16384) from libjade-jasmin-main and libjade-jasmin-call-ret, for a second entire run where I also registered individual results and not just the average (which means that the following values were not used to produce the previous plot).

32,16384,160086,160700,0.38
32,16384,160076,160728,0.41
32,16384,160172,160682,0.32
32,16384,160328,160714,0.24
32,16384,160080,160726,0.40
32,16384,160082,160724,0.40
32,16384,160086,160754,0.42
32,16384,160308,160810,0.31
32,16384,160468,160806,0.21
32,16384,160102,160754,0.41
32,16384,160370,160754,0.24
32,16384,160118,160746,0.39
32,16384,160116,160694,0.36
32,16384,160042,160728,0.43
32,16384,160178,160696,0.32
32,16384,160096,160754,0.41
32,16384,160124,160686,0.35
32,16384,160132,160700,0.35
32,16384,160244,160762,0.32
32,16384,160070,160656,0.37
32,16384,160104,160726,0.39
32,16384,160078,160694,0.38
32,16384,160112,160788,0.42
32,16384,160066,160672,0.38
32,16384,160150,160776,0.39
32,16384,160186,160774,0.37
32,16384,160092,160706,0.38
32,16384,160106,160756,0.41
32,16384,160082,160702,0.39
32,16384,160090,160700,0.38
32,16384,160166,160710,0.34
32,16384,160480,160746,0.17

For Kyber the results are the following (a little bit noisy at the moment; I think I would need to swap the random bytes implementation/shuffle the order of Kyber executions to get slightly stable results; I also cannot provide much insight about how many function calls are done given that I only know the implementation superficially).
kyber

Overall I think this change in the compiler does not affect performance significantly so I would say that it would be nice to have it merged.

@vbgl vbgl force-pushed the call_ret branch 4 times, most recently from eaf733d to 4f3a8b5 Compare September 8, 2022 07:55
@vbgl vbgl force-pushed the call_ret branch 2 times, most recently from 331ac96 to 5741f1b Compare October 3, 2022 14:49
@vbgl vbgl force-pushed the call_ret branch 2 times, most recently from cee3390 to 3f9e17e Compare December 1, 2022 16:36
@bgregoir
Copy link
Contributor Author

On my side, the PR is ready to merge.

@vbgl vbgl force-pushed the call_ret branch 2 times, most recently from c4f93ad to f425eb5 Compare January 19, 2023 12:09
@vbgl
Copy link
Member

vbgl commented Jan 19, 2023

CI: https://gitlab.com/jasmin-lang/jasmin/-/pipelines/751942335

Do we want to run some benchmarks?

Copy link
Member

@vbgl vbgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is happy. I recommend to merge now and polish later.

@vbgl vbgl merged commit f1c5cd8 into main Jan 19, 2023
@vbgl vbgl deleted the call_ret branch January 19, 2023 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants