Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tensor_ext::ExtractOp for extracting an element out of a slot of a single ciphertext #1174

Open
ZenithalHourlyRate opened this issue Dec 11, 2024 · 4 comments

Comments

@ZenithalHourlyRate
Copy link
Collaborator

ZenithalHourlyRate commented Dec 11, 2024

Related to #913

In the secretized IR, there are two semantics of tensor::ExtractOp

  • One that does normal tensor extract, from a tensor of ciphertext
  • Another needs to extract a slot from a simd ciphertext

And this depends on the vectorizer to decide a tensor::ExtractOp bears which semantic. We can also check the lowering code on how to decide which semantic one tensor::ExtractOp is.

For example, in matmul, tensor::extract is decided to be extracting from tensor of ciphertext

  func.func @matmul(%arg0: tensor<1x2xf32> {secret.secret}, %arg1: tensor<1x2xf32> {secret.secret}) -> tensor<1x2xf32> {
    %0 = "tosa.const"() <{value = dense<[[2.0, 3.0], [4.0, 6.0]]> : tensor<2x2xf32>}> : () -> tensor<2x2xf32>
    %1 = affine.for %arg2 = 0 to 1 iter_args(%arg3 = %arg1) -> (tensor<1x2xf32>) {
      %2 = affine.for %arg4 = 0 to 2 iter_args(%arg5 = %arg3) -> (tensor<1x2xf32>) {
        %3 = affine.for %arg6 = 0 to 2 iter_args(%arg7 = %arg5) -> (tensor<1x2xf32>) {
          %extracted = tensor.extract %arg0[%arg2, %arg6] : tensor<1x2xf32>
          %extracted_0 = tensor.extract %0[%arg6, %arg4] : tensor<2x2xf32>
          %extracted_1 = tensor.extract %arg7[%arg2, %arg4] : tensor<1x2xf32>
          %4 = arith.mulf %extracted, %extracted_0 : f32
          %5 = arith.addf %extracted_1, %4 : f32
          %inserted = tensor.insert %5 into %arg7[%arg2, %arg4] : tensor<1x2xf32>
          affine.yield %inserted : tensor<1x2xf32>
        }
        affine.yield %3 : tensor<1x2xf32>
      }
      affine.yield %2 : tensor<1x2xf32>
    }
    return %1 : tensor<1x2xf32>
  }

yet in dot_product, either from input IR or the output IR, tensor::ExtractOp is meant to be extracting slot from simd ciphertext

func.func @dot_product(%arg0: tensor<8xf16> {secret.secret}, %arg1: tensor<8xf16> {secret.secret}) -> f16 {
  %c0 = arith.constant 0 : index
  %c0_sf16 = arith.constant -0.0 : f16
  %0 = affine.for %arg2 = 0 to 8 iter_args(%iter = %c0_sf16) -> (f16) {
    %1 = tensor.extract %arg0[%arg2] : tensor<8xf16>
    %2 = tensor.extract %arg1[%arg2] : tensor<8xf16>
    %3 = arith.mulf %1, %2 : f16
    %4 = arith.addf %iter, %3 : f16
    affine.yield %4 : f16
  }
  return %0 : f16
}
  func.func @dot_product(%arg0: !secret.secret<tensor<8xf16>>, %arg1: !secret.secret<tensor<8xf16>>) -> !secret.secret<f16> {
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %c4 = arith.constant 4 : index
    %c7 = arith.constant 7 : index
    %0 = secret.generic ins(%arg0, %arg1 : !secret.secret<tensor<8xf16>>, !secret.secret<tensor<8xf16>>) {
    ^bb0(%arg2: tensor<8xf16>, %arg3: tensor<8xf16>):
      %1 = arith.mulf %arg2, %arg3 : tensor<8xf16>
      %2 = tensor_ext.rotate %1, %c4 : tensor<8xf16>, index
      %3 = arith.addf %1, %2 : tensor<8xf16>
      %4 = tensor_ext.rotate %3, %c2 : tensor<8xf16>, index
      %5 = arith.addf %3, %4 : tensor<8xf16>
      %6 = tensor_ext.rotate %5, %c1 : tensor<8xf16>, index
      %7 = arith.addf %5, %6 : tensor<8xf16>
      %extracted = tensor.extract %7[%c7] : tensor<8xf16>
      secret.yield %extracted : f16
    } -> !secret.secret<f16>
    return %0 : !secret.secret<f16>

When lowering, this translates to different ops. For the former, tensor::ExtractOp is preserved while the later lowers to scheme::Extract op like bgv::ExtractOp / ckks::ExtractOp, then lowers to openfhe.mulplain + rotate + lwe.interpret_underlying_type.

This makes it difficult for level analysis as it can not decide which tensor::ExtractOp is an actual unpacking thus has effect on the level result.

This was found during migrating secret-to-ckks to RNS scheme, which found many tensor::Extract that is actually extracting ciphertext from tensor ciphertext and did the wrong analysis.

Note that at the level of secret-insert-mgmt-ckks, it can not know an indivitual tensor extract bears which semantic as all it can see is tensor<shapexf32>

@AlexanderViand-Intel
Copy link
Collaborator

AlexanderViand-Intel commented Dec 11, 2024

I'm not sure a duplicate extract op is the right way forward. While it solves this specific issue, the underlying problem is a bit more general: because our arithmetization pipeline (--mlir-to-secret-arithmetic) doesn't actually "go all the way", --secret-to-<scheme> has a bunch of hidden assumptions and complexity in it.

My suggestion would be to drastically reduce the scope of the --secret-to-<scheme> passes, to the point where they are limited to a dumb translation of "obvious pre-images" of the scheme ops (i.e., arith.addi, arith.muli, tensor_ext.rotate), making no own decisions on packing/approach/etc (e.g., packing should already be annotated by the airthmetization pipeline using some layout annotation, see also #913).

This would mean moving the "SIMD slot extract" pattern to the arithmetization pipeline and actually replacing such tensor.extract's with a series of rotations/add/mul/etc. This way, the management pass doesn't even need to know about the concept of a slot extraction.

EDIT: I now think we should still introduce a tensor_ext::slot_extract op for this concept, so that one can dump the IR prior to replacing them, for better debuggability/readability and to allow other arithetization patterns to create them more easily. But it'd be purely an implementation detail of the arithmetization, invisible to the management/scheme-specific abstraction layers.

Thoughts on this? It seems sensible to me now, but so did the previous three suggestions I wrote but deleted before posting because I realized they were dumb/broken/inelegant 😉

@ZenithalHourlyRate
Copy link
Collaborator Author

ZenithalHourlyRate commented Dec 11, 2024

drastically reduce the scope of the --secret-to-<scheme> passes, to the point where they are limited to a dumb translation of "obvious pre-images"

I agree. Packing should be decided way earlier.

actually replacing such tensor.extract's with a series of rotations/add/mul/etc

I'd agree that bgv.extract/ckks.extract should not be a scheme basic op, but the technical issue is that down below there are lwe.reinterpret_underlying_type cast to handle the type difference. If we purely use mul/add in early pipeline, we can not get the correct type.

I'd favor tensor_ext::slot_extract for type conversion and down below the pipeline they are aware of such thing (i.e. lowered to lwe.reinterpret_underlying_type)

EDIT: even more, I actually think replaing tensor::extract as a bgv.mul_plain+bgv.rotate is costly as we have to do modreduce / relinearize for them. Why should not tensor::extract just be type conversion with the slot encoded in the encoding info.

@AlexanderViand-Intel
Copy link
Collaborator

AlexanderViand-Intel commented Dec 12, 2024

EDIT: even more, I actually think replaing tensor::extract as a bgv.mul_plain+bgv.rotate is costly as we have to do modreduce / relinearize for them. Why should not tensor::extract just be type conversion with the slot encoded in the encoding info.

Ohh, I think that highlights that there's actually many possible semantics for a tensor.extract in the input program

  • Standard tensor extraction from a tensor of ciphertexts
  • "True" slot extraction, which produces an LWE ciphertext rather than an RLWE one (not currently in HEIR)
  • "packing-based" slot extraction with add/mul/rotate, which produces an RLWE ciphertext that contains the intended value in slot 0 (I.e., rotate by -i as the simplest version). However, this can get more complex if we want to zero all other slots or even replicate the extracted value to all slots.
  • "virtual" slot extraction, where we continue to use the RLWE ctxt as-is, but "make a note" to only consider slot i to be meaningful data.

Copy link

github-actions bot commented Dec 16, 2024

This issue has 6 outstanding TODOs:

This comment was autogenerated by todo-backlinks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants