Update doc based on design review

wchao1115 · Jan 31, 2020 · 2f698f9 · 2f698f9
1 parent 4c005e7
commit 2f698f9
Showing 1 changed file with 20 additions and 15 deletions.
diff --git a/rfcs/20200113-tf-data-service.md b/rfcs/20200113-tf-data-service.md
@@ -1,11 +1,11 @@
 # Distributed tf.data service
 
-| Status        | Proposed                                                |
+| Status        | Accepted                                                |
 | :------------ | :------------------------------------------------------ |
 | **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |
 | **Author(s)** | Andrew Audibert ([email protected]) Rohan Jain ([email protected]) |
 | **Sponsor**   | Jiri Simsa ([email protected])                          |
-| **Updated**   | 2019-01-24                                              |
+| **Updated**   | 2019-01-30                                              |
 
 ## Objective
 
@@ -143,14 +143,16 @@ here to implement datasets which produce per-replica elements, enabling
 idiomatic control flow.
 
 ```python
-def tf.data.experimental.service.distribute(address):
+def tf.data.experimental.service.distribute(address_or_resolver):
   """Marks that a dataset should be processed by the tf.data service.
 
   ds = ... # dataset to distribute
-  ds = ds.apply(tf.data.experimental.service.distribute(address))
+  ds = ds.apply(
+    tf.data.experimental.service.distribute(address_or_resolver))
 
   Args:
-    address: The address of the tf.data service master.
+    address_or_resolver: The address of the tf.data service master, or a
+      cluster resolver that can be used to determine the master address.
 
   Returns:
     A function that can be passed to `dataset.apply()`.
@@ -622,22 +624,25 @@ service. We will also provide a tutorial for using the tf.data service.
 *   How should we communicate that distributing a dataset will change the order
     in which elements are processed? If users' datasets rely on elements being
     processed in a certain order, they could face unpleasant surprises.
-    -   Current plan is to address this through documentation.
+    -   Final decision: Address this through documentation.
 *   Should we support splitting `skip`, `take`, and `scan` by having them
     operate at a per-task level (e.g. skip or take the first `N` elements within
     each task)?
-    -   Leaning towards supporting these operations at a per-task level. This is
-        consistent with how skip/take/scan behave today when using distribution
-        strategies to distribute a dataset.
+    -   Final decision: Prohibit distributing these transformations, and tell
+        users to instead use these transformations *after* applying the
+        `distribute` transformation.
 *   Is there a more user-friendly way to share iteration ids across consumers?
     Distribution strategy is well-equipped with collective ops to share the
     iteration ids, but sharing the iteration id could be a heavy burden for
     some users.
-    -   Distributing iteration ids is simple in the common case where a single
-        process builds the graph. If users are advanced enough to do distributed
-        training without distribution strategies, they will likely have a
-        different mechanism available for distributing iteration ids.
+    -   Final decision: It is a reasonable expectation for users to either use
+        distribution strategies, or distribute their own iteration ids.
+        TensorFlow will soon have public APIs for collective operations that
+        would make it easy to broadcast iteration ids.
 *   Can `service.distribute` take a `ClusterResolver` so that the master
     hostname isn't baked into the dataset definition?
-    -   We can achieve this by having the `distribute` transformation take a
-        master_address_or_resolver.
+    -   Final decision: Accept `master_address_or_resolver`, and wait to resolve
+        the master address until iteration begins. The `ClusterResolver` will be
+        stored in the Python `Dataset` object. In the future, we may want C++
+        implementations of `ClusterResolver` so that we can represent the
+        resolver within the dataset graph.