diff --git a/rfcs/20200902-pickle-for-keras.md b/rfcs/20200902-pickle-for-keras.md
new file mode 100644
index 000000000..e2afa4db1
--- /dev/null
+++ b/rfcs/20200902-pickle-for-keras.md
@@ -0,0 +1,292 @@
+# Support for Pickle, Python's serialization protocol
+
+| Status        | Proposed       |
+:-------------- |:---------------------------------------------------- |
+| **RFC #**     | [286](https://github.com/tensorflow/community/pull/286) |
+| **Author(s)** | Adrian Garcia Badaracco ({firstname}@{firstname}gb.com), Scott Sievert (tf-rfc@stsievert.com)  |
+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                 |
+| **Updated**   | 2020-09-02                                           |
+
+## Objective
+
+Implement support for Pickle, Python's serialization protocol within Keras.
+
+## Motivation
+
+> *Why this is a valuable problem to solve? What background information is
+> needed to show how this design addresses the problem?*
+
+The specific motivation for this RFC: we want to use Keras models in Dask-ML's
+and Ray's hyperparameter optimization. More generally, support for serialization
+with the Pickle protocol will enable:
+
+* Using Keras with other parallelization libraries like Python's
+  `multiprocessing`, Dask, Ray or IPython parallel.
+* Saving Keras models to disk with custom serialization libraries like Joblib
+  or Dill. This is common when using a Keras model as part of a Scikit-Learn
+  pipeline or with their hyperparameter searches.
+* Copying Keras models with Python's built-in `copy.deepcopy`.
+
+Supporting Pickle will enable wider usage in the Python ecosystem because
+Python's ecosystems of libraries depend strongly on the presence of protocols.
+Without these protocols, it's necessary for each library to implement a custom
+serialization method for every other library. For example, Dask Distributed has
+a custom serialization method for Keras at [distributed/protocol/keras.py].
+See "[Pickle isn't slow, it's a protocol]" for more detail (notably, this post
+focuses on having an efficient Pickle implementation for PyTorch).
+
+[distributed/protocol/keras.py]:https://github.com/dask/distributed/blob/73fa9bd1bd7dcb4ceed72cdbdc6dd4b92f887521/distributed/protocol/keras.py
+
+This request is *not* advocating for use of Pickle while saving or sharing
+Keras models. We believe the efficient, secure and stable methods in TF should
+be used for that. Instead, we are proposing to add a Pickle implementation to
+support wider usage in the Python ecosystem.
+
+[Pickle isn't slow, it's a protocol]:https://blog.dask.org/2018/07/23/protocols-pickle
+
+> *Which users are affected by the problem? Why is it a problem? What data
+> supports this? What related work exists?*
+
+Users trying to use distributed systems (e.g, Ray or Dask) with Keras are
+affected. In our experience, this is common in hyperparameter optimization.  In
+general, having Pickle support means a better experience, especially when using
+Keras with other libraries. Briefly, implementation of this RFC will make the
+following possible:
+
+* Saving a Scikit-Learn pipeline to disk if it includes a Keras model
+* Using custom parallelization like Joblib or Dask.
+
+More use cases and examples are give in "User Benefit."
+
+Related work is in [SciKeras], which brings a Scikit-Learn API
+to Keras. Pickle is relevant because Scikit-Learn requires that estimators must be able to be pickled ([source][skp]).
+As such, SciKeras has an implementation of `__reduce__`, which is also in
+[tensorflow#39609].
+  
+[dask-ml#534]:https://github.com/dask/dask-ml/issues/534
+[SO#51110834]:https://stackoverflow.com/questions/51110834/cannot-pickle-dill-a-keras-object
+[SO#54070845]:https://stackoverflow.com/questions/54070845/how-to-pickle-keras-custom-layer
+[SO#59872509]:https://stackoverflow.com/questions/59872509/how-to-export-a-model-created-from-kerasclassifier-and-gridsearchcv-using-joblib
+[SO#37984304]:https://stackoverflow.com/questions/37984304/how-to-save-a-scikit-learn-pipline-with-keras-regressor-inside-to-disk
+[SO#48295661]:https://stackoverflow.com/questions/48295661/how-to-pickle-keras-model
+[skper]:https://scikit-learn.org/stable/modules/model_persistence.html#persistence-example
+[TF#33204]:https://github.com/tensorflow/tensorflow/issues/33204
+[TF#34697]:https://github.com/tensorflow/tensorflow/issues/34697
+
+[tensorflow#39609]:https://github.com/tensorflow/tensorflow/pull/39609
+[SciKeras]:https://github.com/adriangb/scikeras
+[skp]:https://github.com/scikit-learn/scikit-learn/blob/0fb307bf39bbdacd6ed713c00724f8f871d60370/sklearn/utils/estimator_checks.py#L1523-L1524
+
+<!--
+StackOverflow questions where `Model.save` would not work:
+
+* [SO#40396042](https://stackoverflow.com/questions/40396042/how-to-save-scikit-learn-keras-model-into-a-persistence-file-pickle-hd5-json-ya)
+  
+Examples that could be resolved using `Model.save` (but the user tried pickle first):
+
+* [SO #51878627](https://stackoverflow.com/questions/51878627/pickle-keras-ann)
+-->
+
+
+## User Benefit
+
+> How will users (or other contributors) benefit from this work? What would be the headline in the release notes or blog post?
+
+One blog post headline: "Keras models can be used with the advanced
+hyperparameter optimization techniques found in Dask-ML and Ray Tune." This has
+already been mentioned in "Framework support" of [a Dask blog post][dbp]
+comparing Dask-ML's hyperparameter optimization with Ray's tune-sklearn.
+
+[dbp]:https://blog.dask.org/2020/08/06/ray-tune#framework-support
+
+Users will also benefit with easier usage; they won't run into any of these
+errors:
+
+* People try to save Scikit-Learn meta-estimators with Keras components using
+  the serialization libraries Joblib or Dill. 
+  This fails because Keras models can not be serialized without a custom
+  method. Examples include [SO#59872509], [SO#37984304] and
+  [SO#48295661], and [SO#51110834].
+* Using custom parallelization strategies requires serialization support through
+  Pickle; however, many parallelization libraries don't
+  special case Keras models (e.g, Joblib). Relevant errors are most common in hyperparameter
+  optimization with Scikit-Learn's parallelization through Joblib
+  ([TF#33204] and [TF#34697]) or parallelization through Dask ([dask-ml#534]).
+* Lack of Pickle support can complicate saving training history like in
+  (the poorly asked) [SO#54070845].
+
+ 
+This RFC would resolve these issues.
+
+## Design Proposal
+
+We propose implementing the Pickle protocol using the existing Keras
+saving functionality as a backend. For example, adding pickle support to TF Metrics
+is as simple as the following:
+
+``` python
+# tensorflow/python/keras/metrics.py
+
+@keras_export('keras.metrics.Metric')  # line 80
+@six.add_metaclass(abc.ABCMeta)
+class Metric(base_layer.Layer):
+    ...
+
+  def __reduce__(self, protocol):
+    # the deserialized/serialize functions are defined in this file
+    return deserialize, (serialize(self),)
+```
+
+This implementation adds support for the Pickle protocol, which supports serialization
+to arbitrary IO, either memory or disk. The `__reduce__` special method can return
+the string that would have been written to disk and the function to load that string into memory ([docs][reduce_docs]).
+
+[reduce_docs]:https://docs.python.org/3/library/pickle.html#object.__reduce__
+
+For `tf.keras.Model`, we can use `SaveModel` as the backend for `__reduce__`:
+
+``` python
+# tensorflow/python/keras/engine/training.py
+...
+from tesorflow.python.keras.models import load_model
+
+class Model(base_layer.Layer, version_utils.ModelVersionSelector):  # line 131
+  ...
+
+  def __reduce__(self, protocol):
+    temp_ram_location = f"ram://tmp/saving/{id(self)}"
+    self.save(temp_ram_location)
+    b = tf.io.gfile.read_folder(temp_ram_location)
+    return self._reconstruct_pickle, (np.asarray(memoryview(b)), )
+
+  @classmethod
+  def _reconstruct_pickle(cls, obj):
+    temp_ram_location = f"ram://tmp/saving/{id(obj)}"
+    tf.io.gfile.write_folder(temp_ram_location, b)
+    return load_model(temp_ram_location)
+```
+
+
+This almost exactly mirrors the PyTorch
+implementation of Pickle support in [pytorch#9184]
+as mentioned in "[Pickle isn't slow, it's a protocol]."
+In addition, small augmentations to TensorFlow's IO module will be required (as discussed in [tensorflow#39609]).
+
+By wrapping the pickled object within a Numpy array, pickling will support
+pickle protocol 5 for zero-copy pickling. This provides an immediate
+performance improvement for many use cases. 
+
+[pytorch#9184]:https://github.com/pytorch/pytorch/pull/9184
+
+### Alternatives Considered
+
+Of course, one method is to ask users to monkey-patch Keras models themselves.
+This would hold for libraries too. Clearly, this is unreasonable. Regardless,
+some libraries like Dask Distributed have already implemented custom serialization
+protocols ([distributed/protocol/keras.py]).
+
+#### Other pickle implementations
+
+The Pickle protocol supports two features:
+
+1. In-memory copying of live objects: via Python's `copy` module. This falls back to (2) below.
+2. Serialization to arbitrary IO (memory or disk): via Python's `pickle` module.
+
+This proposal seeks to take the conservative approach at least initially and
+only implement (2) above since (1) can always fall back to (2) and using only
+(2) alleviates any concerns around references to freed memory in the C++
+portions of TF and other such bugs.
+
+This said, for situations where the user is making an in-memory copy of an object and it might
+even be okay to keep around references to non-Python objects, a separate approach that optimizes
+(1) would be warranted. This RFC does not seek to address this problem. Hence
+this RFC is generally not concerned with:
+
+* Issues arising from C++ references. These cannot be kept around when
+  serializing to a binary file stream.
+* Performance of the serialization/deserialization.
+
+### Performance Implications
+
+* The performance should be the same as the underlying backend that is already
+  implemented in TF.
+* For cases where the user was going to pickle anyway, this will be faster
+  because it uses TF's methods instead of letting Python deal with it naively.
+* Tests will consist of running `new_model = pickle.loads(pickle.dumps(model))`
+  and then doing checks on `new_model`.
+
+### Dependencies
+
+* Dependencies: does this proposal add any new dependencies to TensorFlow?
+    * No
+* Dependent projects: are there other areas of TensorFlow or things that use
+  TensorFlow (TFX/pipelines, TensorBoard, etc.) that this affects?
+    * This should not affect those libraries. It will affect libraries
+      further downstream like Dask-ML and Ray Tune.
+
+### Engineering Impact
+
+* Do you expect changes to binary size / startup time / build time / test
+  times?
+    * No
+* Who will maintain this code? Is this code in its own buildable unit? Can this
+  code be tested in its own? Is visibility suitably restricted to only a small
+  API surface for others to use?
+    * This code depends on existing Keras/TF methods. This code will not break
+      presuming they are maintained (the new API surface area is very small).
+
+### Platforms and Environments
+
+* Platforms: does this work on all platforms supported by TensorFlow? If not,
+  why is that ok? Will it work on embedded/mobile? Does it impact automatic
+  code generation or mobile stripping tooling? Will it work with transformation
+  tools?
+    * Yes, as long as a Python backend is available.
+* Execution environments (Cloud services, accelerator hardware): what impact do
+  you expect and how will you confirm?
+    * We don't see any impact.
+
+### Best Practices
+
+* Does this proposal change best practices for some aspect of using/developing
+  TensorFlow? How will these changes be communicated/enforced?
+    * No
+
+### Tutorials and Examples
+
+There are plenty of examples of how this can and would be used within all of the issues above, in addition to the linked notebook
+([link again](https://colab.research.google.com/drive/14ECRN8ZQDa1McKri2dctlV_CaPkE574I?authuser=1#scrollTo=qlXDfJObNXVf)) which has
+end to end implementations and tests for all of this.
+
+### Compatibility
+
+* Does the design conform to the backwards & forwards compatibility
+  [requirements](https://www.tensorflow.org/programmers_guide/version_compat)?
+    * Yes
+
+> *How will this proposal interact with other parts of the TensorFlow Ecosystem?*
+
+* How will it work with TFLite?
+    * N/A
+* How will it work with distribution strategies?
+  * This enables use of other serialization libraries, which might enable support for other distribution strategies.
+* How will it interact with tf.function?
+    * N/A
+* Will this work on GPU/TPU?
+    * N/A
+* How will it serialize to a SavedModel?
+    * Not applicable, and almost a circular question.
+
+### User Impact
+
+* What are the user-facing changes? How will this feature be rolled out?
+
+There are no user-facing changes: this is a backend change to private methods.
+
+Rolling out only involves testing. It will not require any documentation
+changes to advertise this features: `Model.save` should still be used for users
+simply trying to save their model to disk.
+
+## Questions and Discussion Topics
+
+Seed this with open questions you require feedback on from the RFC process.
diff --git a/rfcs/20200902-pickle-for-keras/README.md b/rfcs/20200902-pickle-for-keras/README.md
deleted file mode 100644
index 847ce227e..000000000
--- a/rfcs/20200902-pickle-for-keras/README.md
+++ /dev/null
@@ -1,176 +0,0 @@
-# Title of RFC
-
-| Status        | (Proposed / Accepted / Implemented / Obsolete)       |
-:-------------- |:---------------------------------------------------- |
-| **RFC #**     | [286](https://github.com/tensorflow/community/pull/286)
-| **Author(s)** | Adrian Garcia Badaracco ({firstname}@{firstname}gb.com)
-| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                 |
-| **Updated**   | 2020-09-02                                           |
-
-## Objective
-
-Implement support for Python's Pickle protocol within Keras.
-
-## Motivation
-
-The pickle protocol is used extensively
-within the Python ecosystem, including by [Dask](https://github.com/dask/dask),
-[Scikit-Learn](https://github.com/scikit-learn/scikit-learn) and several other
-popular machine learning libraries. These libraries rely on the pickle protocol and
-cannot work without it. This hinders what would otherwise be great uses of Keras.
-
-Pickle and `copy` (referring to the specific Python module, it can use
-the pickle protocol as it's backend) are also the _only_ universal way to
-save Python objects. This means that this is what most users try first.
-As evidenced by several of the StackOverflow/TensorFlow issues below, even if there are
-TensorFlow specific ways to copy things in memory or disk, users will probably
-try pickle or copy first and be confused by cryptic errors.
-
-Here are several examples
-of Keras users running into issues because pickle is not supported.
-
-GH issues where `Model.save` would not work:
-
-* [TF#34697](https://github.com/tensorflow/tensorflow/issues/34697)
-* [TF#33204](https://github.com/tensorflow/tensorflow/issues/33204)
-* TODO: issues from Dask
-
-StackOverflow questions where `Model.save` would not work:
-
-* [SO#59872509](https://stackoverflow.com/questions/59872509/how-to-export-a-model-created-from-kerasclassifier-and-gridsearchcv-using-joblib)
-* [SO#37984304](https://stackoverflow.com/questions/37984304/how-to-save-a-scikit-learn-pipline-with-keras-regressor-inside-to-disk)
-* [SO#51110834](https://stackoverflow.com/questions/51110834/cannot-pickle-dill-a-keras-object)
-* [SO#54070845](https://stackoverflow.com/questions/54070845/how-to-pickle-keras-custom-layer)
-* [SO#40396042](https://stackoverflow.com/questions/40396042/how-to-save-scikit-learn-keras-model-into-a-persistence-file-pickle-hd5-json-ya)
-* [SO#48295661](https://stackoverflow.com/questions/48295661/how-to-pickle-keras-model)
-  
-Examples that could be resolved using `Model.save` (but the user tried pickle first):
-
-* [SO #51878627](https://stackoverflow.com/questions/51878627/pickle-keras-ann)
-
-Here is an post describing how PyTorch ran into similar issues and how they resolved them: [link](https://matthewrocklin.com/blog/work/2018/07/23/protocols-pickle)
-
-## User Benefit
-
-* Lessen the learning curve for new Keras/TF users since they will be able to
-use entry points they already know.
-* Improve compatibility with libraries like Scikit-Learn and Dask.
-
-## Design Proposal
-
-The pickle protocol supports two distinct functions:
-
-1. In-memory copying of live objects: via Python's `copy` module. This falls back to (2) below.
-2. Serialization to arbitrary IO (memory or disk): via Python's `pickle` module.
-
-This proposal seeks to take the conservative approach at least initially and only
-implement (2) above since (1) can always fall back to (2) and using only (2) alleviates
-any concerns around references to freed memory in the C++ portions of TF and other such bugs.
-
-This said, for situations where the user is making an in-memory copy of an object and it might
-even be okay to keep around references to non-Python objects, a separate approach that optimizes
-(1) would be warranted. This RFC does not seek to address this problem. Hence this RFC is generally
-not concerned with:
-
-* Issues arising from C++ references. These cannot be kept around when serializing to a binary file stream.
-* Performance of the serialization/deserialization.
-
-The general proposal is to implement the pickle protocol using existing Keras saving functionality
-as a backend. For example, adding pickle/copy support to Metrics is as simple as:
-
-```python3
-class Metric:  # in tf.keras.metrics
-
-    def __reduce_ex__(self, protocol):
-        return deserialize, (serialize(metric),)  # where deserialize is tf.keras.metrics.deserialize
-```
-
-Documentation for how to use `__reduce__ex__` and other alternatives that allow implementing of
-the pickle protocol can be found [here](https://docs.python.org/3/library/pickle.html) and
-[here](https://docs.python.org/3/library/copyreg.html) (official Python docs).
-
-For more complex objects (namely `tf.keras.Model`) we can either:
-
-1. Implement a similar approach, but we would need to save the weights and config separately. See [this notebook](https://colab.research.google.com/drive/14ECRN8ZQDa1McKri2dctlV_CaPkE574I?authuser=1#scrollTo=qlXDfJObNXVf) for an example.
-2. Use `Model.save` as the backend. This would require implementing support for serializing to memory in `Model.save`, but is overall a better solution (since `Model.save` is the official way to save models in `tf.keras`)
-
-Solution (2) would look something like this (assuming `Model.save` worked with `io.BytesIO()`):
-
-```python3
-class Model:
-
-    def __reduce_ex__(self, protocol):
-        b = io.BytesIO()  # python built in library
-        self.save(b)  # i.e. tf.keras.Model.save
-        return load_model, (b.get_value()),)  # where load_model is tf.keras.models.load_model
-```
-
-Note how this exactly mirrors the aforementioned blog post regarding PyTorch ([link](https://matthewrocklin.com/blog/work/2018/07/23/protocols-pickle)).
-
-By implementing this in all of Keras' base classes, things will automatically work
-with custom metrics and subclassed models.
-
-### Alternatives Considered
-
-The only real alternative is to:
-
-1. Ask all libraries that currently use pickle to make a special case for each Keras object and figure out how each Keras object prefers to be serialized (see use of `serialize` vs `Model.save` above).
-2. Ask all users to learn the above as well.
-3. Override `__reduce_ex__` to give a user friendly warning instead of failing cryptically.
-
-### Performance Implications
-
-* The performance should be the same as the underlying backend that is already implemented in TF.
-* For cases where the user was going to pickle anyway, this will be faster because it uses TF's methods instead of letting Python deal with it naively.
-* Tests will consist of running `new_model = pickle.loads(pickle.loads(model))` and then doing checks on `new_model`.
-
-### Dependencies
-
-* Dependencies: does this proposal add any new dependencies to TensorFlow? **NO**
-* Dependent projects: are there other areas of TensorFlow or things that use TensorFlow (TFX/pipelines, TensorBoard, etc.) that this affects? **This should not affect anything**
-
-### Engineering Impact
-
-* Do you expect changes to binary size / startup time / build time / test times? **NO**
-* Who will maintain this code? Is this code in its own buildable unit? Can this code be tested in its own? Is visibility suitably restricted to only a small API surface for others to use?
-
-This code depends on existing Keras/TF methods. As long as those are maintained and don't break, this code will not break. The new API surface area is very small.
-
-### Platforms and Environments
-
-* Platforms: does this work on all platforms supported by TensorFlow? If not, why is that ok? Will it work on embedded/mobile? Does it impact automatic code generation or mobile stripping tooling? Will it work with transformation tools?
-* Execution environments (Cloud services, accelerator hardware): what impact do you expect and how will you confirm?
-
-This will work on anything that is running Python >= 2.7 (as far as I can tell, the pickle protocol has not changed since then).
-
-### Best Practices
-
-* Does this proposal change best practices for some aspect of using/developing TensorFlow? How will these changes be communicated/enforced? **NO**
-
-### Tutorials and Examples
-
-There are plenty of examples of how this can and would be used within all of the issues above, in addition to the linked notebook
-([link again](https://colab.research.google.com/drive/14ECRN8ZQDa1McKri2dctlV_CaPkE574I?authuser=1#scrollTo=qlXDfJObNXVf)) which has
-end to end implementations and tests for all of this.
-
-### Compatibility
-
-* Does the design conform to the backwards & forwards compatibility [requirements](https://www.tensorflow.org/programmers_guide/version_compat)? **YES**
-  
-* How will this proposal interact with other parts of the TensorFlow Ecosystem?
-      - How will it work with TFLite?  *N/A*
-      - How will it work with distribution strategies?  *N/A*
-      - How will it interact with tf.function?  *N/A*
-      - Will this work on GPU/TPU?  *N/A*
-      - How will it serialize to a SavedModel? *Circular question...*
-
-### User Impact
-
-* What are the user-facing changes? How will this feature be rolled out?
-
-We implement it and are done. I do not think there is any need to edit the docs to advertise this feature. We still want
-users to use `Model.save` when they are trying to save their model.
-
-## Questions and Discussion Topics
-
-Seed this with open questions you require feedback on from the RFC process.
\ No newline at end of file