diff --git a/enhancements/network/on-prem-service-load-balancers.md b/enhancements/network/on-prem-service-load-balancers.md
new file mode 100644
index 0000000000..b7ad1935a6
--- /dev/null
+++ b/enhancements/network/on-prem-service-load-balancers.md
@@ -0,0 +1,519 @@
+---
+title: on-prem-service-load-balancers
+authors:
+  - "@russellb"
+reviewers:
+  - @markmc
+  - @smarterclayton
+  - @derekwaynecarr
+  - @squeed
+  - @aojea
+  - @celebdor
+  - @abhinavdahiya
+  - @yboaron
+  - @cybertron
+approvers:
+  - @knobunc
+  - @danwinship
+  - @danehans
+creation-date: 2020-05-14
+last-updated: 2020-05-14
+status: proposed
+---
+
+# Service Load Balancers for On Premise Infrastructure
+
+## Release Signoff Checklist
+
+- [ ] Enhancement is `implementable`
+- [ ] Design details are appropriately documented from clear requirements
+- [ ] Test plan is defined
+- [ ] Graduation criteria for dev preview, tech preview, GA
+- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)
+
+## Summary
+
+We do not currently support full automation for [Services of
+type=LoadBalancer](https://kubernetes.io/docs/concepts/services-networking/#loadbalancer)
+(Service Load Balancers, or SLBs) for OpenShift in bare metal environments.
+While bare metal clusters are of primary interest, we hope to find a solution
+that would apply to other on-premise environments that don’t have native load
+balancer capabilities available (a cluster on VMware, RHV, or OpenStack without
+Octavia, for example).
+
+## Motivation
+
+Service Load Balancers are a common way to expose applications on a cluster. We
+highly value clusters using on premise infrastructure, but do not support this
+feature in that context.  We aim to fill this gap with an optional range of
+capabilities.
+
+We do have a related feature, `AutoAssignCIDRs`, where OpenShift will
+automatically assign an ExternalIP for SLBs.  However, routing traffic to these
+IPs is still left up as an exercise to the administrator.  This enhancement
+offers an improvement where we can automate making these IP addresses
+reachable.  This method would remain an option for any clusters where the
+administrator would like to manage routing for external IPs in a completely
+custom manner.  For more information on the existing `IngressIPs` feature:
+
+* https://github.com/openshift/api/blob/master/config/v1/types_network.go#L99
+* https://github.com/openshift/openshift-docs/pull/21388
+
+### Goals
+
+Some more context is helpful before specifying the goals of this enhancement.
+When a Service has an external IP address, the OpenShift network plugin in use
+must already prepare networking on Nodes to be able to receive traffic with
+that IP address as a destination.  The network plugin does not know or care
+about how that traffic reaches the Node because the mechanism differs depending
+on which platform the cluster is running on.  Once that traffic reaches a Node,
+the existing Service proxy functionality handles forwarding that traffic to a
+Service backend, including some degree of load balancing.
+
+With this context in mind, the goal of this enhancement is less about load
+balancing itself, and more about providing mechanisms of routing traffic to
+Nodes for external IP addresses used for Service Load Balancers.
+
+A SLB solution must provide these high level features:
+
+* Management of one or more pools of IP addresses to be allocated for SLBs
+* High Availability (HA) management of these IP addresses once allocated.  We
+  must be able to fail over addresses in less than 5 seconds for an unplanned
+  Node outage.  We must be able to perform graceful failover without downtime
+  for a planned outage, such as during upgrades.
+* Solution must provide automation for making IP addresses available on the
+  correct Node(s).  Solution must support a scalable L3 method for doing this
+  (likely BGP), but should also be usable in smaller, simpler environments
+  using L2 protocols.  Tradeoffs include:
+    * Layer 2 (gratuitous ARP for IPv4, NDP for IPv6) - good for wide range of
+      environment compatibility, but limiting for larger clusters.  All traffic
+      for a single Service Load Balancer IP address must go through one node.
+    * Layer 3 (BGP) - good for integration with networks for larger clusters
+      and opens up the possibility for a greater degree of load balancing using
+      ECMP to send traffic to multiple Nodes for a single Service Load Balancer
+* Suitable for large scale clusters (target up to 2000 nodes).
+* Must be compatible with at least the following cluster network types:
+  [OpenShift-SDN](https://github.com/openshift/sdn) and
+  [OVN-Kubernetes](https://github.com/ovn-org/ovn-kubernetes)
+
+### Non-Goals
+
+* We can also support this functionality through the use of partner add-ons,
+  but discussion of those solutions is out of scope for this document.
+
+## Proposal
+
+Adopt [MetalLB](https://metallb.universe.tf/) as an out-of-the-box solution for
+most on-premise SLB use cases.
+
+MetalLB is commonly referenced when people discuss service load balancers for
+bare metal.  The [concepts page](https://metallb.universe.tf/concepts/) gives a
+pretty good overview of how it works.  It manages pools of IP addresses to
+allocate for SLBs.  Once an IP is allocated to a SLB, it’s assigned to a Node
+and the location of that IP address must be announced externally.  It has two
+modes to announce IPs: [layer 2](https://metallb.universe.tf/concepts/layer2/)
+(ARP for IPv4, NDP for IPv6) or
+[BGP](https://metallb.universe.tf/concepts/bgp/).  The layer 2 mode is
+sufficient for smaller scale clusters, while the BGP mode can work at much
+larger scale.
+
+While the BGP option is attractive for scaling reasons, it’s also more
+complicated and will not work in all environments.  It won’t work in an
+environment that won’t allow BGP advertisements from the cluster Nodes.  If a
+cluster uses a Network addon that also makes use of BGP, MetalLB integration
+will be more challenging.  For example see the [MetalLB page about Calico
+support](https://metallb.universe.tf/configuration/calico/).
+
+The layer 2 mode has the advantage of working in more environments.  We could
+also consider a MetalLB enhancement that makes it understand different L2
+domains and manage different IP address pools for each domain where SLBs may
+reside.
+
+### How Load Balancing Works with MetalLB
+
+As mentioned in the Goals section, MetalLB does not have to implement load
+balancing itself.  It only implements ensuring load balancer IP addresses are
+reachable on appropriate Nodes.  The way a cluster uses MetalLB does have an
+impact on how load balancing works, though.
+
+When the Layer 2 mode is in use, all traffic for a single external IP address
+must go through a single Node in the cluster.  MetalLB is responsible for
+choosing which Node this should be.  From that Node, the Service proxy will
+distribute load across the Endpoints for that Service.  This provides a degree
+of load balancing, as long as the Service traffic does not exceed what can go
+through a single Node.
+
+The BGP mode of MetalLB offers some improved capabilities.  It is possible for
+the router(s) for the cluster to send traffic for a single external IP address
+to multiple Nodes.  This removes the single Node bottleneck.  The number of
+Nodes which can be used as targets for the traffic depends on the configuration
+of a given Service.  There is a field on Services called
+[`externalTrafficPolicy` that can be `cluster` or
+`local`](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip).
+
+* `local` -- In this mode, the pod expects to receive traffic with the original
+  source IP address still intact. To achieve this, the traffic must go directly
+  to a Node where one of the Endpoints for that Service is running. Only those
+  Nodes advertise the Service IP in this case.
+
+* `cluster` -- In this mode, traffic may arrive on any Node and will be
+  redirected to another Node if necessary to reach a Service Endpoint. The
+  source IP address will be changed to the Node's IP to ensure traffic returns
+  via the same path it arrived on. The Service IP is advertised for all Nodes
+  for these Services.
+
+### User Stories
+
+#### Match Cloud Load Balancer Functionality Where Possible
+
+For each story below which discusses an environment we aim to support, we aim
+to provide functionality that matches what you would see with a cluster on a
+cloud.  This includes allowing Service load balancers accessible inside and
+outside of the cluster, verifiable using existing e2e tests that make use of
+service load balancers.
+
+#### Story 1 - Easy Use with a Small Cluster
+
+As an administrator of a small cluster that resides entirely on a single layer
+2 domain, I would like to configure one or more ranges of IP addresses from my
+network that the cluster is free to use for Service Load Balancers.  I do not
+want to do any extra configuration in my network infrastructure beyond just
+ensuring that the configured ranges of addresses are not used elsewhere.
+
+MetalLB can do this today.
+
+#### Story 2 - BGP Integration
+
+As an administrator of a larger cluster with Nodes that reside on multiple L2
+network segments, I would like to configure one or more ranges of IP addresses
+from my network that the cluster is free to use for Service Load Balancers.  I
+would like my cluster to peer with my BGP infrastructure to advertise the
+current location of IP addresses allocated to Service Load Balancers.
+
+MetalLB can do this today.
+
+#### Story 3 - Larger Clusters without BGP
+
+As an administrator of a larger cluster with Nodes that reside on multiple L2
+network segments, I would like to configure one or more ranges of IP addresses
+from my network that the cluster is free to use for Service Load Balancers.  I
+do not have BGP infrastructure available or I'm not willing to have my cluster
+peer with my BGP infrastructure.  I would like to configure awareness of which
+subsets of my Nodes can use which pools of IP addresses since not every Node
+has physical connectivity to the same L2 networks.
+
+Note that MetalLB does not offer this today.  There has been some discussion
+about related functionality:
+* https://github.com/metallb/metallb/issues/605
+* https://github.com/metallb/metallb/pull/502
+
+### Implementation Details/Notes/Constraints
+
+#### Upstream Engagement
+
+The first step of implementation is to invest in the upstream project.  We
+should have one or more engineers engage with the project to handle issues,
+review pull requests, and contribute bug fixes or enhancements.  As part of
+this process we should continue our technical due diligence with testing and
+reviewing code to increase our confidence in choosing this solution.
+
+One area we could contribute immediately is with setting up upstream CI.  The
+project does not appear to run any CI today.
+
+[kind](https://github.com/kubernetes-sigs/kind/) is a good basis for CI of
+community kubernetes-ecosystem projects. It is usable with the built-in free
+github CI support rather than needing someone to be paying for test
+infrastructure elsewhere. It would even allow testing both L2 and L3 mode. It
+doesn't matter what protocols the actual underlying network supports if you're
+doing all of your testing in a virtual network built on top of it.
+
+#### Operator
+
+This is the first of two alternatives for how we might integrate MetalLB in
+OpenShift.
+
+We must also create an operator for MetalLB.  We should develop an operator
+that is generally useful to the MetalLB community.  We should also have an
+OpenShift version of this operator for our use.
+
+It is assumed that the MetalLB operator would be managed by OLM as an optional
+additional component to be installed on on-premise clusters.  However, in the
+[ROADMAP.md
+document](https://github.com/openshift/enhancements/blob/master/ROADMAP.md),
+there is an item to "Front the API servers and other master services with
+service load balancers".  If this functionality is required at install time,
+the details on management of this operator may be revisited.
+
+There is a start of a
+[metallb-operator](https://github.com/cybertron/metallb-operator) available and
+a [video demo](https://www.youtube.com/watch?v=WgOZno0D7nw).
+
+#### Alternative Integration: Cloud Controller Manager
+
+An alternative integration approach would be via a cloud controller manager
+(CCM). An example of this is the [packet.net
+CCM](https://github.com/packethost/packet-ccm), which ensures MetalLB is
+deployed and also configures it properly to work in packet.net’s BGP
+environment.
+
+These integration options must be explored in more detail as part of a more
+detailed integration proposal.
+
+### Risks and Mitigations
+
+#### Maturity and API Stability
+
+While MetalLB appears to be [used in production by
+some](https://github.com/metallb/metallb/issues/5), the project
+itself claims it is in [beta](https://metallb.universe.tf/concepts/maturity/)
+and that its users are early adopters.  We will mitigate this risk through our
+own technical due diligence: reviewing and contributing to the code and via
+extensive testing.
+
+Given the pre-1.0 beta state of the project, we must pay particular close
+attention to any interfaces that need to be stabilized before we can ship
+MetalLB.  We want to get ahead of potential future upgrade challenges as soon
+as possible.
+
+#### Size of the Test Matrix
+
+MetalLB includes two major modes of operation: layer 2 and BGP.  Both have
+strengths and weaknesses.  Multiple modes also means an increase in our test
+matrix.  If this proves to be a challenge, we should consider a phased roll-out
+where we start with only the layer 2 mode (simpler, works in more environments)
+and roll out BGP support as a later stage.
+
+#### Security
+
+Like any network facing application, MetalLB should be reviewed for any
+security concerns.  This must be part of our ongoing technical due diligence.
+So far, the following areas should receive a close look:
+* The [memberlist](https://github.com/hashicorp/memberlist) protocol and
+  implementation, used by MetalLB's layer 2 mode for cluster membership and
+  fast Node failure detection.
+* MetalLB's [custom implementation of
+  BGP](https://github.com/metallb/metallb/tree/main/internal/bgp)
+* MetalLB's
+  [implementation](https://github.com/metallb/metallb/tree/main/internal/layer2)
+  of [ARP](https://github.com/mdlayher/arp) and
+  [NDP](https://github.com/mdlayher/ndp) for its layer 2 mode.
+
+#### Logging, Debugging, Visibility
+
+MetalLB has fairly limited debugging capabilities at this stage.  Events are
+created for Services which provide some information.  Otherwise, you must read
+the logs of the running components and hope to find some hints about what may
+be going on.
+
+Debugging is often a big challenge for networking components.  We should invest
+early in enhancements to make understanding and debugging the behavior of
+MetalLB as easy as possible.  This can be mitigated with a combination of good
+documentation and improved tooling.
+
+## Design Details
+
+### Test Plan
+
+Separate test plans are required for the layer 2 and BGP modes of MetalLB.
+
+Testing MetalLB's layer 2 mode will work with our existing `e2e-metal-ipi` job.
+In that CI job, we have full control of the networks used by the installed
+cluster, so we can allocate a range of IP addresses for use by MetalLB.  More
+investigation is needed, but it's likely that we can not test this on our
+`e2e-metal` UPI based jobs because we rely on the network provided by
+packet.net between the cluster hosts.
+
+Testing of the BGP mode is more complicated.  It will require setting up a BGP
+network environment for the cluster nodes to peer with.  We don't have anything
+like this today, so it will take some work.  The upstream MetalLB project needs
+this, as well.  It currently lacks any automated testing of the BGP
+integration.
+
+### Upgrade / Downgrade Strategy
+
+The operator will include any required logic to handle upgrades or downgrades
+and changes between versions of MetalLB.
+
+MetalLB is currently configured via a `ConfigMap` that is not a stable API.  We
+will start by building an operator that provides a stable API for configuration
+and the `ConfigMap` will become an internal implementation detail fully owned
+by the operator.
+
+We must also mitigate these risks through engagement and contributions to the
+upstream community to help make sure that changes made to the software and its
+configuration interfaces can be managed through an upgrade or downgrade
+process.
+
+### Version Skew Strategy
+
+MetalLB has two major components: a single `controller` pod and a `speaker` pod
+that runs as a `DaemonSet`.
+
+In layer 2 mode, the `speaker` needs to run on every `Node`.
+
+In BGP mode, there's more flexibility. The behavior depends on the
+[ExternalTrafficPolicy](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip)
+type on the service load balancers.
+
+* If the type is `cluster`, then `speaker` can run on any subset of Nodes that
+  you want peering with BGP routers. All of those Nodes will advertise that
+  they can be used to reach all of of the Service load balancer IPs.
+* If the type is `local`, then `speaker` must run on every node where an Endpoint
+  may exist locally. Each node will only advertise that load balancer IPs are
+  reachable that can map to a local Endpoint. Put another way, speaker must run
+  on every node where workloads behind a service load balancer with an
+  `ExternalTrafficPolicy` of `local` may run.
+
+The primary version skew concern would be when there are `speaker` instances
+running from different versions of MetalLB.  For example, in the layer 2 mode,
+the `speaker` implementation includes an algorithm for it to independently
+determine whether it should be the leader, or announcer, for a given `Service`.
+A change to this algorithm in a new version could cause more than one `speaker`
+to think it owns a `Service`.
+
+These risks to version skew must be mitigated through upstream community
+engagement and contributions, as well as informed management of upgrades in the
+MetalLB operator.
+
+## Implementation History
+
+* (May, 2020) - Technical due diligence and upstream engagement beginning
+
+## Drawbacks
+
+TBD
+
+## Alternatives
+
+### Custom Solution using Keepalived
+
+[Bare Metal IPI Networking
+Infrastructure](https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md)
+is a document in the OpenShift Installer repository that discusses some of the
+networking integration done for the bare metal IPI platform.
+
+Bare Metal IPI clusters include keepalived + haproxy running on OpenShift
+masters to manage a Virtual IP (VIP) for the API and to load balance API
+requests.  This has been reused for other on-premise environments (VMware,
+OpenStack, RHV).  It is implemented by having the machine-config-operator (MCO)
+lay down static pod manifests.  See the document linked above for more details.
+
+One keepalived-based option would be to build on and extend this existing
+integration.  Configuration and management of a pool of IP addresses for SLBs
+would be new code.
+
+Another keepalived based starting point is the
+[keepalived-operator](https://github.com/redhat-cop/keepalived-operator) which
+is discussed in this [blog
+post](https://www.openshift.com/blog/self-hosted-load-balancer-for-openshift-an-operator-based-approach).
+
+Keepalived only supports L2 advertisement of IP address location (ARP / NDP).
+To support larger scale clusters, we must do one of the following:
+
+* Require all SLBs to be hosted on Nodes within a single L2 domain within the
+  cluster.
+* Make our SLB controller smart enough to understand different IP address
+  pools, their associated L2 domains, and which Nodes are on which L2 domain.
+* Extend keepalived (either directly or via some integration) to support an L3
+  based address location advertisement (likely BGP).
+
+Something based on keepalived is probably our simplest solution.  However,
+downsides include:
+
+* This would be entirely built by us.  It’s possible we could build some
+  community usage around a simple solution like this, but that would take time.
+  Given more featureful alternatives that already exist, I wouldn't expect much
+  traction.
+* There’s not a lot of opportunity for future functionality growth here unless
+  we start swapping out the pieces (keepalived and/or haproxy), which would
+  also sacrifice some of the simplicity.
+* Keepalived uses the VRRP protocol and it would be nice to move away from
+  this.  VRRP IDs are just 0-255 and the Keepalived + haproxy integration for
+  bare metal IPI generates VRRP IDs based on the cluster name.  Even with
+  different names, it’s possible to have a collision, and that can cause
+  problems in lab environments with a lot of test clusters on shared networks.
+  VRRP uses multicast by default which is not allowed in all environments,
+  though it's also possible to configure keepalived to use unicast, instead.
+
+### kube-vip
+
+* [Web site](https://kube-vip.io/)
+* [Kube-vip docs for SLBs](https://kube-vip.io/kubernetes/)
+* [Code](https://github.com/plunder-app/kube-vip)
+
+I came across `kube-vip` when the author shared it in the
+`#cluster-api-provider` channel on the Kubernetes slack.  It’s new and likely
+not mature, but some of the implementation is clever.
+
+Instead of using VRRP to provide IP address HA, it uses RAFT (from
+[https://github.com/hashicorp/raft]).  That would help avoid potential VRRP ID
+collisions between multiple clusters.
+
+Kube-vip implements its own custom load balancer, which is concerning from a
+security, feature, and performance perspective.
+
+`kube-vip` uses a couple of other supporting components:
+* [starboard](https://github.com/plunder-app/starboard) - daemonset, manages
+  iptables rules based on current IP address location
+* [plndr-cloud-provider](https://github.com/plunder-app/plndr-cloud-provider) -
+  kubernetes cloud provider
+
+Kube-vip only supports layer 2 based address advertisement, and it doesn’t look
+like it supports IPv6 yet.
+
+Despite the project's young age, someone has already [integrated it with
+OpenShift 3.11](https://github.com/megian/openshift-kube-vip-ansible).
+
+Some of the key downsides to this option:
+* Depends on yet-another-raft-implementation --
+  https://github.com/hashicorp/raft
+* New, developed as a hobby project by one person, likely PoC level maturity
+* Lacking any layer 3 based address advertisement options
+
+### OVN-Kubernetes Native Solution
+
+A primary downside of this approach is being specific to OVN-Kubernetes, where
+ideally we’d utilize something a bit more reusable.  Since OVN has much of the
+required functionality built-in, it’s at least worth considering.
+
+OVN has a native load balancing implementation which OVN-Kubernetes uses to
+implement Services within a cluster.  OVN also includes L3 HA support, where
+the IP address for a SLB would automatically fail over to another Node if one
+Node fails.  OVN-Kubernetes could be expanded to support SLBs using these
+features.
+
+OVN only supports L2 based (ARP / NDP) address location advertisement.  To
+address larger scale clusters, we would have to do one of:
+
+* Require all SLBs to be hosted on Nodes within a single L2 domain within the
+  cluster.
+    * This is very limiting for scale, so it’s either only applicable to
+      smaller clusters, or only a subset of the cluster can host SLB IP
+      addresses.
+* Make our SLB controller smart enough to understand different IP address
+  pools, their associated L2 domains, and which Nodes are on which L2 domain.
+    * This helps scale, but increases the complexity of our implementation.
+* Extend OVN / OVN-Kubernetes (either directly or via some integration) to
+  support an L3 based address location advertisement (likely BGP).
+    * This doesn’t work for all environments, but the use of BGP is common and
+      understood in the Kubernetes ecosystem.
+
+## Infrastructure Needed
+
+As noted in the `Test Plan` section of this document, the existing
+`e2e-metal-ipi` job is a sufficient environment to run e2e tests with MetalLB
+enabled with its layer 2 mode.  More work is needed to design a test
+environment to test the BGP mode.  That work has not been done and may present
+new requirements for test infrastructure.
+
+## References
+
+* [MetalLB web page](https://metallb.universe.tf/)
+* [MetalLB on GitHub](https://github.com/metallb/metallb/)
+
+Upstream issues that have come up in enhancement discussion:
+
+* [metallb/metallb#168](https://github.com/metallb/metallb/issues/168) -
+  Discussing using metallb to front the Kubernetes API server
+* [metallb/metallb#621](https://github.com/metallb/metallb/issues/621) -
+  Discussing a graceful no downtime failover method for layer 2 mode