Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add initial descriptions to Task options #286

Closed
TanviHacks opened this issue Jul 10, 2023 · 7 comments · Fixed by #299
Closed

Feature: Add initial descriptions to Task options #286

TanviHacks opened this issue Jul 10, 2023 · 7 comments · Fixed by #299
Labels
Copywriting documentation Improvements or additions to documentation

Comments

@TanviHacks
Copy link

Even if they are full of technical jargon, we should take a first step at adding descriptions to the various options when creating a task. Perhaps there is some language we can pull from docs or the spec itself for now.

@jbr
Copy link
Contributor

jbr commented Jul 10, 2023

Agreed. I would love help from someone more fluent in the domain to come up with a one sentence description for each field in the task form and also a longer form description (1-2 paragraph) that would be hidden behind a help/info indicator

@jbr jbr added the documentation Improvements or additions to documentation label Jul 10, 2023
@inahga
Copy link
Contributor

inahga commented Jul 10, 2023

Related #148

@divergentdave
Copy link
Contributor

Here's my pass on descriptions for task parameters.

  • Leader Aggregator
    • Short description: Select an aggregator server to process this metrics task.
    • Long description:
      The leader aggregator is one of the two non-colluding servers that processes metrics tasks. Its role is more resource-intensive than the helper's. One of the two aggregators must be run by Divvi Up, and the other must be run by a different organization. To use a self-hosted aggregator, you must first add it to your account; it will then appear in this list.
  • Helper Aggregator
    • Short description: Select an aggregator server to process this metrics task.
    • Long description:
      The helper aggregator is one of the two non-colluding servers that processes metrics tasks. Its role is less resource-intensive than the leader's. One of the two aggregators must be run by Divvi Up, and the other must be run by a different organization. To use a self-hosted aggregator, you must first add it to your account; it will then appear in this list.
  • Function:
    • Short description: Determines the kind of client measurement accepted, and how they are summarized.
    • Long description:
      Selects the aggregation function used by this metrics task. The following functions are supported:
      • Count: Each client measurement is either "true" or "false". The aggregate result is the number of "true" measurements.
      • Sum: Each client measurement is an integer number. The aggregate result is a sum of the measurements.
      • Histogram: The aggregate result is a list of counters, and each client measurement chooses one counter to increment.
  • Measurement Range (only present if Function is "sum")
    • Short description: Selects the bit width and range of valid client measurements.
    • Long description:
      Determines the range of integers that are accepted as client measurements. Note that this only determines the maximum value of individual measurements, and not the maximum value of the aggregate result (sum of measurements). Regardless of this choice, the aggregate result wraps around at about 3.4×1038. This parameter affects the size of client reports.
  • Histogram buckets (only present if Function is "histogram")
    • Short description: Selects the number of histogram buckets or counters.
    • Long description:
      Determines how many buckets the histogram has. Each client report can only add one to a single bucket/counter. This parameter affects the size of client reports.
  • Query Type:
    • Short description: Determines how reports are grouped into batches, and what kinds of queries the collector can make.
    • Long description:
      Time Interval: Groups measurements into batches by their client timestamp. Collectors may query for aggregate results over (non-overlapping) time intervals. Good for identifying temporal patterns in data. If client reports may be received late, well after their timestamps, then the collector is forced to choose between delaying collection requests or abandoning late reports.
      Fixed Size: Groups measurements into batches arbitrarily as they arrive. Grants more control over batch sizes, because a maximum batch size can be set. Good for cases where the report upload rate is unknown or varies widely. Temporal patterns in data may be obscured by aggregating on-time and late reports together in the same batches.
  • Minimum Batch Size:
    • Short description: Minimum number of reports per batch.
    • Long description:
      Minimum number of reports per batch. This should be set high enough that the aggregate results over a batch do not violate the application's privacy goals. This is determined by a number of factors, including the aggregation function used, the population distribution of measurements, the importance/sensitivity of the underlying data, and whether client attestation is used to prevent Sybil attacks. If differential privacy noise is added, it can simplify selection of a minimum batch size.
  • Maximum Batch Size:
    • Short description: Maximum number of reports per batch.
    • Long description:
      Maximum number of reports per batch. This is only available with the "Fixed Size" query type.
  • DAP-encoded HPKE file:
    • Short description: The collector's public key. Results will be encrypted using this key.
    • Long description:
      Upload a binary public key file in DAP "HpkeConfig" format. Do not upload the corresponding private key. You will need to use the private key when collecting aggregate results, to decrypt results from the aggregators.
  • Time precision:
    • Short description: Granularity of client report timestamps.
    • Long description:
      All client report timestamps will be rounded to the previous multiple of this duration. If the query type is Time Interval, then query time intervals must start and end on multiples of this duration as well.
  • Expiration:
    • Short description: Optional, pre-scheduled time to decommission this task.
    • Long description: If set, then reports may no longer be uploaded for this task after its expiration time.

@inahga
Copy link
Contributor

inahga commented Jul 11, 2023

Nicely written.

Thoughts on max_batch_size.

Maximum number of reports per batch. This is only available with the "Fixed Size" query type.

Only thing missing is why someone would want to change this, or what factors go into choosing this. Looking at the description, it might be a good candidate for us just to choose a value for the subscriber, i.e. 1.25 * min_batch_size or something.

@branlwyd
Copy link
Member

I think we should always set min_batch_size = max_batch_size, and allow users to configure batch_size only. There is no practical upside to have a range for min_batch_size to max_batch_size. (The history here is that it was thought that allowing a range would make it easier for implementations to "hit" the target range; but it's just as easy to hit a "range" of one allowed value.)

@TanviHacks
Copy link
Author

Thanks for adding the text to the Task creator!

The Time Precision long description is a little confusing. A couple questions:

All client report timestamps will be rounded to the previous multiple of this duration.
Is this saying that times will be rounded down? If you choose 1 hour then times then 10:58, for example, would round to 10:00?

If the query type is Time Interval, then query time intervals must start and end on multiples of this duration as well.
I'm not sure I follow here. Do you mean that if you choose a 1 hour time precision, then your Time interval reports should be requested for times that span multiple hours (as opposed to minutes or days)?

@divergentdave
Copy link
Contributor

Regarding the first question, yes, that's correct.

The timespan of time interval queries must be divisible by the time precision. Thus, the time precision effectively sets a minimum duration on the timespan of time interval queries. With a one hour time precision setting, querying over a half-hour period would not be allowed, but querying over a 24 hour period would be allowed.

I'll send a PR to flesh this paragraph out a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copywriting documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants