From f2465c654c7ec3d0d1ff22a314caf49f3bcfcc86 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Thu, 13 Dec 2018 12:46:24 +0100 Subject: [PATCH 1/8] Define encodeInto() API This enables converting strings into byte sequences of pre-allocated buffers. Also cleanup TextEncoder a bit. Tests: ... Fixes #69. --- encoding.bs | 95 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 76 insertions(+), 19 deletions(-) diff --git a/encoding.bs b/encoding.bs index d348d76..3f65b51 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1294,16 +1294,20 @@ attribute's getter, when invoked, must return "utf-8".

Interface {{TextEncoder}}

+dictionary TextEncoderEncodeIntoResult {
+  unsigned long long read;
+  unsigned long long written;
+};
+
 [Constructor,
  Exposed=(Window,Worker)]
 interface TextEncoder {
   [NewObject] Uint8Array encode(optional USVString input = "");
+  TextEncoderEncodeIntoResult encodeInto(USVString source, Uint8Array destination);
 };
 TextEncoder includes TextEncoderCommon;
 
-

A {{TextEncoder}} object has an associated encoder. -

A {{TextEncoder}} object offers no label argument as it only supports UTF-8. It also offers no stream option as no encoder requires buffering of scalar values. @@ -1319,18 +1323,14 @@ requires buffering of scalar values.

encoder . encode([input = ""])

Returns the result of running UTF-8's encoder. + +

encoder . encodeInto(source, destination) +

Runs the UTF-8 encoder on source, stores the result on + destination, and returns the progress made.

The TextEncoder() -constructor, when invoked, must run these steps: - -

    -
  1. Let enc be a new {{TextEncoder}} object. - -

  2. Set enc's encoder to UTF-8's encoder. - -

  3. Return enc. -

+constructor, when invoked, must return a new {{TextEncoder}} object.

The encode(input) method, when invoked, must run these steps: @@ -1347,17 +1347,74 @@ must run these steps:

  • Let token be the result of reading from input. -

  • Let result be the result of - processing token for - encoder, input, output. +

  • Let result be the result of processing token for the + UTF-8 encoder, input, output. + +

  • +

    Assert: result is not error. + +

    The UTF-8 encoder cannot return error. + +

  • If result is finished, convert output into a byte sequence, + and then return a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing output. + + + + +

    The +encodeInto(source, destination) +method, when invoked, must run these steps: + +

      +
    1. Let read be 0. + +

    2. Let written be 0. + +

    3. Let destinationLength be the number of bytes destination's + {{ArrayBuffer}} can hold. + + +

    4. +

      Let unused be a new stream. + +

      The handler algorithm invoked below requires this argument, but it is not + used by the UTF-8 encoder. + +

    5. Convert source to a stream. + +

    6. +

      While true: + +

        +
      1. Let token be the result of reading from source. + +

      2. Let result be the result of running the UTF-8 encoder's handler + on unused and token. + +

      3. If result is finished, then return a new {{TextEncoderEncodeIntoResult}} + dictionary whose {{TextEncoderEncodeIntoResult/read}} member is read and + {{TextEncoderEncodeIntoResult/written}} member is written.

      4. -

        If result is finished, convert output into a - byte sequence, and then return a {{Uint8Array}} object wrapping an - {{ArrayBuffer}} containing output. - +

        Otherwise: + +

          +
        1. +

          If destinationLengthwritten is greater than the number of + bytes in result, then: -

          UTF-8 cannot return error. +

            +
          1. If token is greater than U+FFFF, then increment read by 2. + +

          2. Otherwise, increment read by 1. + +

          3. Write the bytes in result into destination's {{ArrayBuffer}}, + starting at written. + + +

          4. Increment written by the number of bytes in result. +

          +
    From 3cde4a387d3b5e5bc35be68a94d7c7b10be8287f Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Thu, 13 Dec 2018 14:18:32 +0100 Subject: [PATCH 2/8] address feedback --- encoding.bs | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/encoding.bs b/encoding.bs index 3f65b51..a845305 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1391,9 +1391,7 @@ method, when invoked, must run these steps:
  • Let result be the result of running the UTF-8 encoder's handler on unused and token. -

  • If result is finished, then return a new {{TextEncoderEncodeIntoResult}} - dictionary whose {{TextEncoderEncodeIntoResult/read}} member is read and - {{TextEncoderEncodeIntoResult/written}} member is written. +

  • If result is finished, then break.

  • Otherwise: @@ -1414,8 +1412,14 @@ method, when invoked, must run these steps:

  • Increment written by the number of bytes in result. + +

  • Otherwise, break. + +

  • Return a new {{TextEncoderEncodeIntoResult}} dictionary whose + {{TextEncoderEncodeIntoResult/read}} member is read and + {{TextEncoderEncodeIntoResult/written}} member is written From cff77c41526d0f6fa849de8991b0020bbfa5734a Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Thu, 13 Dec 2018 15:08:29 +0100 Subject: [PATCH 3/8] ack --- encoding.bs | 2 ++ 1 file changed, 2 insertions(+) diff --git a/encoding.bs b/encoding.bs index a845305..044a367 100644 --- a/encoding.bs +++ b/encoding.bs @@ -3266,6 +3266,8 @@ Ken Whistler, Kenneth Russell, 田村健人 (Kent Tamura), Leif Halvard Silli, +Luke Wagner, +Maciej Hirsz, Makoto Kato, Mark Callow, Mark Crispin, From b4dc096d48dc58f29d35730eaa0a249bd119f031 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 14 Dec 2018 11:11:05 +0100 Subject: [PATCH 4/8] everything but the example --- encoding.bs | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/encoding.bs b/encoding.bs index 044a367..ab45643 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1325,8 +1325,11 @@ requires buffering of scalar values.

    Returns the result of running UTF-8's encoder.

    encoder . encodeInto(source, destination) -

    Runs the UTF-8 encoder on source, stores the result on - destination, and returns the progress made. +

    Runs the UTF-8 encoder on source, stores the result of that operation into + destination, and returns the progress made as a dictionary whereby + {{TextEncoderEncodeIntoResult/read}} is the number of converted code units of + source and {{TextEncoderEncodeIntoResult/written}} is the number of bytes modified in + destination.

    The TextEncoder() @@ -1370,9 +1373,9 @@ method, when invoked, must run these steps:

  • Let written be 0. -

  • Let destinationLength be the number of bytes destination's - {{ArrayBuffer}} can hold. - +

  • Let destinationBytes be the result of + getting a reference to the bytes held by + destination.

  • Let unused be a new stream. @@ -1398,17 +1401,16 @@ method, when invoked, must run these steps:

    1. -

      If destinationLengthwritten is greater than the number of - bytes in result, then: +

      If destinationBytes's length − + written is greater than the number of bytes in result, then:

      1. If token is greater than U+FFFF, then increment read by 2.

      2. Otherwise, increment read by 1. -

      3. Write the bytes in result into destination's {{ArrayBuffer}}, - starting at written. - +

      4. Write the bytes in result into destinationBytes, from byte + offset written.

      5. Increment written by the number of bytes in result.

      From 7a5c284ddda333436d82e25f1c3ef74ea3e8604c Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 14 Dec 2018 19:05:27 +0100 Subject: [PATCH 5/8] nit --- encoding.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/encoding.bs b/encoding.bs index ab45643..fffd7ce 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1421,7 +1421,7 @@ method, when invoked, must run these steps:
    2. Return a new {{TextEncoderEncodeIntoResult}} dictionary whose {{TextEncoderEncodeIntoResult/read}} member is read and - {{TextEncoderEncodeIntoResult/written}} member is written + {{TextEncoderEncodeIntoResult/written}} member is written.

    From 9b5ffc1da8bd1b7350bc32a8c9fe937f67cbae6b Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 21 Dec 2018 16:01:03 +0100 Subject: [PATCH 6/8] add Adam's example, mistakes are mine --- encoding.bs | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/encoding.bs b/encoding.bs index fffd7ce..5a2fa76 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1424,6 +1424,34 @@ method, when invoked, must run these steps: {{TextEncoderEncodeIntoResult/written}} member is written. +
    +

    The {{TextEncoder//encodeInto()}} method can be used to encode a string into an existing + {{ArrayBuffer}} object. Various details below are left as an exercise for the reader, but this + demonstrates an approach one could take to use this methodd: + +

    
    +function convertString(buffer, input, callback) {
    +  let bufferSize = 256,
    +      bufferStart = malloc(buffer, bufferSize),
    +      writeOffset = 0,
    +      readOffset = 0;
    +  while (true) {
    +    const view = new Uint8Array(buffer, bufferStart + writeOffset, bufferSize - writeOffset),
    +          {read, written} = cachedEncoder.encodeInto(input.substring(readOffset), view);
    +    readOffset += read;
    +    writeOffset += written;
    +    if (readOffset === input.length) {
    +      callback(bufferStart, writeOffset);
    +      free(buffer, bufferStart);
    +      return;
    +    }
    +    bufferSize *= 2;
    +    bufferStart = realloc(buffer, bufferStart, bufferSize);
    +  }
    +}
    +
    +
    +

    Interface mixin {{GenericTransformStream}}

    From 88615e64e94e91f3cd5f5f2193a6004013ccb9d1 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 21 Dec 2018 16:05:38 +0100 Subject: [PATCH 7/8] bs --- encoding.bs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/encoding.bs b/encoding.bs index 5a2fa76..acdba71 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1425,9 +1425,9 @@ method, when invoked, must run these steps:
    -

    The {{TextEncoder//encodeInto()}} method can be used to encode a string into an existing - {{ArrayBuffer}} object. Various details below are left as an exercise for the reader, but this - demonstrates an approach one could take to use this methodd: +

    The encodeInto() method can + be used to encode a string into an existing {{ArrayBuffer}} object. Various details below are left + as an exercise for the reader, but this demonstrates an approach one could take to use this method:

    
     function convertString(buffer, input, callback) {
    
    From 7bb77bbb5cf13a6dd2ba63423c16cd9e5f493124 Mon Sep 17 00:00:00 2001
    From: Anne van Kesteren 
    Date: Tue, 8 Jan 2019 15:04:31 +0100
    Subject: [PATCH 8/8] or equal
    
    ---
     encoding.bs | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)
    
    diff --git a/encoding.bs b/encoding.bs
    index acdba71..758331d 100644
    --- a/encoding.bs
    +++ b/encoding.bs
    @@ -1402,7 +1402,7 @@ method, when invoked, must run these steps:
         
    1. If destinationBytes's length − - written is greater than the number of bytes in result, then: + written is greater than or equal to the number of bytes in result, then:

      1. If token is greater than U+FFFF, then increment read by 2.