Feature: Add initial descriptions to Task options #286

TanviHacks · 2023-07-10T15:52:35Z

Even if they are full of technical jargon, we should take a first step at adding descriptions to the various options when creating a task. Perhaps there is some language we can pull from docs or the spec itself for now.

jbr · 2023-07-10T16:47:06Z

Agreed. I would love help from someone more fluent in the domain to come up with a one sentence description for each field in the task form and also a longer form description (1-2 paragraph) that would be hidden behind a help/info indicator

inahga · 2023-07-10T21:37:27Z

Related #148

divergentdave · 2023-07-11T17:12:24Z

Here's my pass on descriptions for task parameters.

Leader Aggregator
- Short description: Select an aggregator server to process this metrics task.
- Long description:
  The leader aggregator is one of the two non-colluding servers that processes metrics tasks. Its role is more resource-intensive than the helper's. One of the two aggregators must be run by Divvi Up, and the other must be run by a different organization. To use a self-hosted aggregator, you must first add it to your account; it will then appear in this list.
Helper Aggregator
- Short description: Select an aggregator server to process this metrics task.
- Long description:
  The helper aggregator is one of the two non-colluding servers that processes metrics tasks. Its role is less resource-intensive than the leader's. One of the two aggregators must be run by Divvi Up, and the other must be run by a different organization. To use a self-hosted aggregator, you must first add it to your account; it will then appear in this list.
Function:
- Short description: Determines the kind of client measurement accepted, and how they are summarized.
- Long description:
  Selects the aggregation function used by this metrics task. The following functions are supported:
  - Count: Each client measurement is either "true" or "false". The aggregate result is the number of "true" measurements.
  - Sum: Each client measurement is an integer number. The aggregate result is a sum of the measurements.
  - Histogram: The aggregate result is a list of counters, and each client measurement chooses one counter to increment.
Measurement Range (only present if Function is "sum")
- Short description: Selects the bit width and range of valid client measurements.
- Long description:
  Determines the range of integers that are accepted as client measurements. Note that this only determines the maximum value of individual measurements, and not the maximum value of the aggregate result (sum of measurements). Regardless of this choice, the aggregate result wraps around at about 3.4×10³⁸. This parameter affects the size of client reports.
Histogram buckets (only present if Function is "histogram")
- Short description: Selects the number of histogram buckets or counters.
- Long description:
  Determines how many buckets the histogram has. Each client report can only add one to a single bucket/counter. This parameter affects the size of client reports.
Query Type:
- Short description: Determines how reports are grouped into batches, and what kinds of queries the collector can make.
- Long description:
  Time Interval: Groups measurements into batches by their client timestamp. Collectors may query for aggregate results over (non-overlapping) time intervals. Good for identifying temporal patterns in data. If client reports may be received late, well after their timestamps, then the collector is forced to choose between delaying collection requests or abandoning late reports.
  Fixed Size: Groups measurements into batches arbitrarily as they arrive. Grants more control over batch sizes, because a maximum batch size can be set. Good for cases where the report upload rate is unknown or varies widely. Temporal patterns in data may be obscured by aggregating on-time and late reports together in the same batches.
Minimum Batch Size:
- Short description: Minimum number of reports per batch.
- Long description:
  Minimum number of reports per batch. This should be set high enough that the aggregate results over a batch do not violate the application's privacy goals. This is determined by a number of factors, including the aggregation function used, the population distribution of measurements, the importance/sensitivity of the underlying data, and whether client attestation is used to prevent Sybil attacks. If differential privacy noise is added, it can simplify selection of a minimum batch size.
Maximum Batch Size:
- Short description: Maximum number of reports per batch.
- Long description:
  Maximum number of reports per batch. This is only available with the "Fixed Size" query type.
DAP-encoded HPKE file:
- Short description: The collector's public key. Results will be encrypted using this key.
- Long description:
  Upload a binary public key file in DAP "HpkeConfig" format. Do not upload the corresponding private key. You will need to use the private key when collecting aggregate results, to decrypt results from the aggregators.
Time precision:
- Short description: Granularity of client report timestamps.
- Long description:
  All client report timestamps will be rounded to the previous multiple of this duration. If the query type is Time Interval, then query time intervals must start and end on multiples of this duration as well.
Expiration:
- Short description: Optional, pre-scheduled time to decommission this task.
- Long description: If set, then reports may no longer be uploaded for this task after its expiration time.

inahga · 2023-07-11T17:24:40Z

Nicely written.

Thoughts on max_batch_size.

Maximum number of reports per batch. This is only available with the "Fixed Size" query type.

Only thing missing is why someone would want to change this, or what factors go into choosing this. Looking at the description, it might be a good candidate for us just to choose a value for the subscriber, i.e. 1.25 * min_batch_size or something.

branlwyd · 2023-07-11T18:01:38Z

I think we should always set min_batch_size = max_batch_size, and allow users to configure batch_size only. There is no practical upside to have a range for min_batch_size to max_batch_size. (The history here is that it was thought that allowing a range would make it easier for implementations to "hit" the target range; but it's just as easy to hit a "range" of one allowed value.)

TanviHacks · 2023-07-24T05:12:42Z

Thanks for adding the text to the Task creator!

The Time Precision long description is a little confusing. A couple questions:

All client report timestamps will be rounded to the previous multiple of this duration.
Is this saying that times will be rounded down? If you choose 1 hour then times then 10:58, for example, would round to 10:00?

If the query type is Time Interval, then query time intervals must start and end on multiples of this duration as well.
I'm not sure I follow here. Do you mean that if you choose a 1 hour time precision, then your Time interval reports should be requested for times that span multiple hours (as opposed to minutes or days)?

divergentdave · 2023-07-24T19:22:52Z

Regarding the first question, yes, that's correct.

The timespan of time interval queries must be divisible by the time precision. Thus, the time precision effectively sets a minimum duration on the timespan of time interval queries. With a one hour time precision setting, querying over a half-hour period would not be allowed, but querying over a 24 hour period would be allowed.

I'll send a PR to flesh this paragraph out a bit.

jbr added the Copywriting label Jul 10, 2023

jbr added the documentation Improvements or additions to documentation label Jul 10, 2023

jbr mentioned this issue Jul 13, 2023

add an initial version of task form help text #299

Merged

jbr closed this as completed in #299 Jul 13, 2023

divergentdave mentioned this issue Jul 24, 2023

Flesh out time precision description #329

Merged

inahga mentioned this issue Sep 13, 2023

Should we always set min_batch_size == max_batch_size? #495

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add initial descriptions to Task options #286

Feature: Add initial descriptions to Task options #286

TanviHacks commented Jul 10, 2023

jbr commented Jul 10, 2023

inahga commented Jul 10, 2023

divergentdave commented Jul 11, 2023

inahga commented Jul 11, 2023

branlwyd commented Jul 11, 2023

TanviHacks commented Jul 24, 2023

divergentdave commented Jul 24, 2023

Feature: Add initial descriptions to Task options #286

Feature: Add initial descriptions to Task options #286

Comments

TanviHacks commented Jul 10, 2023

jbr commented Jul 10, 2023

inahga commented Jul 10, 2023

divergentdave commented Jul 11, 2023

inahga commented Jul 11, 2023

branlwyd commented Jul 11, 2023

TanviHacks commented Jul 24, 2023

divergentdave commented Jul 24, 2023