[Extensibility] CoAP CoRE #203

jiayihu · 2021-01-12T22:31:57Z

What this PR does / why we need it:

This is a WIP PR which adds support to the CoAP CoRE protocol. It has at least reachable a POC point that I feel worth sharing. I've added more information in the proposal doc. Sorry for any grammar error or type 😄 I'm going to check with Grammarly before merging.

Special notes for your reviewer:

If applicable:

this PR contains documentation
this PR contains unit tests
added code adheres to standard Rust formatting (cargo fmt)
code builds properly (cargo build)
code is free of common mistakes (cargo clippy)
all Akri tests succeed (cargo test)
inline documentation builds (cargo doc)
version has been updated appropriately (./version.sh)

agent/Cargo.toml

agent/src/protocols/coap_core/discovery_handler.rs

agent/src/protocols/coap_core/discovery_impl.rs

deployment/helm/values.yaml

samples/brokers/coap-core-broker/Dockerfile.coap-core

kate-goldenring · 2021-01-13T14:59:36Z

docs/proposals/coap-core.md

+
+- Is there a better way to store the discovered resources as Configuration in the cluster?
+
+I think the current implementation would need a controller to accept queries about available resources and return the broker which can communicate to the device. The device is listed as generic `akri.sh/coap-core-021dd7` resource on the node, which is pretty much useless. A better label would be `akri.sh/oic.r.temperature-021dd7`, that is the discovered resource. This would allow using the K8s controller for scheduling pods which need the resource.


Currently, the Agent makes a device plugin for each DiscoveryResult, naming that resource -. That is the the resource that can be requested by the Akri Controller (when deploying brokers) or the K8s scheduler (if someone personally schedules Pods requesting those resources). The exposed resource representing the DiscoveryResult

COAP_IP=192.168.1.126 oic.r.temperature=/sensors/temp oic.r.light.brightness=/sensors/light COAP_RESOURCE_TYPES=oic.r.temperature,oic.r.light.brightness

is a brightness and temperature sensor, a joint resource of sorts, which is an interesting point, and one that both OPC UA and Zeroconf probably experience. In all three protocols, a service is being exposed which could represent multiple underlying resources/capabilities. I think there could definitely be cases when Instances of a Configuration purposefully represent multiple resource types and all brokers use both. Otherwise, currently, if someone wanted to schedule to only temperature sensors, they would need to specify filtering for only temperature sensor resource types in Configuration -- are you thinking of adding support for filter in/out devices based on resource type, similar to Zeroconf? -- and then name of the resource could be changed to mimic an experience you mention by declaring --set coapcore.name=oic.r.temperature. There is the fact that the device with akri.sh/oic.r.temperature-021dd7 resource also has a brightness sensor.

It seems to me a good convention to have each coap Configuration represent a resource type. Or for an operator to know that the "resource" exposed as an instance is joint. Maybe this can be aided by encouraging folks to rename the configuration with a descriptive name, when installing Akri.

are you thinking of adding support for filter in/out devices based on resource type, similar to Zeroconf

I can surely do it if there some use cases for it 🤔

About the rest of the point, I think that filtering to allow only a low fixed numbers of resource types could be limiting. If I were a platform like Azure, who wants to offer edge clusters with IoT devices as resources, I'd prefer supporting a large diverisity of resource types because that's the nature of IoT.
I think filtering makes sense if the use case is a company specialized in one kind of devices.

One of the ideas of my thesis, is building a platform which can offer local devices as services in edge computing. Just like today you can provision databases and other software as service in the cloud. But even in a small areas like a neighbourhood, you could have a large number of resource types. Also by filtering, that would mean that you would have two separate Configurations for temperature and brightness and thus two brokers. Having two Configurations doesn't look bad per se, whereas two brokers for each resource in each device feels more wrong. That number brokers x types x devices would be explosive I believe.

I think the main question I have is whether each resource type should be returned as a separate DiscoveryResult here and therefore gets a separate Instance (and could also get a separate broker). Currently, we have one broker deployment strategy of one broker per instance, but this could change in the future to where someone could request a core_coap temperature and light sensor per broker, but this does get weird, since that could result in getting a temperature sensor from one endpoint and a light sensor from another instead of the temperature and light sensors from the same endpoint. It would also be hard to manage device usage if multiple resources of the device are being exposed as instances.

I am not familiar with the use cases for coap_core, so I differ to you. The more we discuss it, I am falling in line with your thoughts of each service being a DiscoveryResult rather than breaking it down further by resource type. Thanks for being patient with me and helping me type through my thoughts.

As for filtering, I still think it could be useful. People might only want to discover services that expose a certain resource type, since their brokers may specifically be interested in that resource type. Say you have a broker that grabs the temperature reading from temperature sensors and serves it to some anomaly detection app, you'd want to ensure that the service/device it is grabbing the temperature from does in fact have a temperature resource. Then, in the Configuration, you could specify the filter and discover would simply only return devices that have that resource type.

Also, your thesis sounds cool. Do you intend to use existing tech, such as CoAP to expose the local devices as services? Do you intend to just use CoAP or discover the devices via multiple protocols?

someone could request a core_coap temperature and light sensor per broker, but this does get weird, since that could result in getting a temperature sensor from one endpoint and a light sensor from another instead of the temperature and light sensors from the same endpoint

Actually I don't think that becomes necessary weird. It would allow to virtualize the IoT resources. If you are asking for temperature and brightness, one could not care from which device it comes from. That would allow some virtualization benefits. For instance, we could use the brightness from another device if it fails in the current one. Since we're talking about edge, one may probably want to restrict the location of the resources though (e.g. "temperature and brightness in this room"). And if you restrict enough, the limit case would having the two resources from the same device. It's just an idea, maybe silly.

What I mean is that I think it depends on the use case. In other cases, I agree it does get weirds, e.g. if I need both camera recording and infrared recording from the same device.

The interesting point that has come to my mind while typing is that maybe we should discuss about what virtualization means for akri in the context of edge devices. Currently, akri allows to add device instances as resources and it's thus a virtualization of device instances. However, virtualizing resources instead of devices seems to be more natural to me if we think about the cloud.

For instance, let's be guided by how we expect users to use Akri instances. An application would never request a CoAP CoRE device, because that doesn't make sense. It would more likely happier to request a resource type, like temperature. How do we implement that with Akri? If we have an instance/broker per device, then we need to write a new controller which scans the list of device instances to find a result with the temperature sensor as metadata. This doesn't feel very natural to me.

Is an instance per resource better? Maybe. Is also a broker per resource instance better? I don't think so because it would be an explosive overhead.

I think "the main question I have is whether each resource type should be returned as a separate DiscoveryResult" can be reduced further to what virtualization means for Akri and, even before that, what's the expected user experience for application developers. How will developers use CoRE resources in your opinion?

Do these questions make sense for the other protocols? Unfortunately I have no experience with the other protocols, although I saw OPC UA mentioned in some papers. My feeling is that maybe these questions haven't occurred before because many protocols assume what the underlying resource type is. E.g. an onvif instance is the same as a camera instance and thus requesting an onvif instance to Akri is fine. But if we think about it, the discovery protocol should be an implementation detail. If you need a camera recording, the cluster should be able to provision it independently from the protocol. If I have an OVIF camera, a udev camera and a CoAP CoRE camera, the three of them should be usable by the application. And this is also why my CoAP broker acts as a proxy converting between HTTP to CoAP and viceversa. Any application can request the CoAP resources without knowing about CoAP.

Also, your thesis sounds cool. Do you intend to use existing tech, such as CoAP to expose the local devices as services? Do you intend to just use CoAP or discover the devices via multiple protocols?

Thanks :) Yes, I intend to expose devices as services and I thought that it's also the goal of Akri. I read a nice paper the other day, which explains the same vision called "Sensing and Actuation as a Service (SAaaS)": Combining Cloud and sensors in a smart city environment. It was written in 2012 when Cloud was becoming popular. I have a more modern vision which merges cloud, fog, edge computing and IoT but the main ideas for IoT are the same. If you have some time, give it a read. I think it's worth it, but I'm biased of course 😄

I think you are right that these questions are more unique to CoRE.

How will developers use CoRE resources in your opinion?

I am certainly not well versed in CoRE. What are some of the main use cases you've seen?

If you are asking for temperature and brightness, one could not care from which device it comes from. That would allow some virtualization benefits. For instance, we could use the brightness from another device if it fails in the current one.

You make a good point here about trade offs.

If we have an instance/broker per device, then we need to write a new controller which scans the list of device instances to find a result with the temperature sensor as metadata.

This could be an issue even if we have instance per resource. For example, say the Configuration was to find and expose all brightness and temperature resources (maybe a filter of brightness || temperature). Instances would be created for each brightness and each temperature resource, but the name of those instances would be the same: -digest. So how would Pods request a specific resource? There would need to be a convention of creating separate Configurations for each resource type.

Maybe we should make the way the Agent names Instances more customizable, such that every instance of a Configuration does not have the same name (besides a digest id). The device plugin name == instance name, so if the instance name is more specific then a resource could be requested with a more specific name. Maybe in the properties map of a DiscoveryResult, there could be a "INSTANCE_NAME" key the Agent could check for. If it exists, the Agent names the instance with the value. Otherwise it gets the default name.

I am certainly not well versed in CoRE. What are some of the main use cases you've seen?

In practise not very much since most works in embedded are close-sourced. But the literature use cases are of any IoT use case since the purpose of CoAP is to be a low-overhead, bandwidth and energy communication protocol yet allowing meaningful RESTful communication. E.g. RFID sensors and healthcare devices are two quite different use-cases I've read about.

One important feature of CoRE is being thought for M2M communication (compared e.g. to udev). That's why it's important for me to support provisioning by resource type, because it requires the least human works. I believe that keeping in mind M2M communication is fundamental when developing for IoT, but it's just my opinion.

In my use case, the cluster must be able to automatically discover resources, assign them to applications or handle availability in case of failure. The applications themselves must be able to work in any cluster with heterogeneous resources so they can only have expectations about the available resource types, nothing about a specific instance. Ideally people develop IoT applications locally and then deploy on a huge variety of edge clusters where Akri instances are running. That will truly bring the cloud to the edge.

If you think about it, in the current industry, a lot of manual repetitive work is done when dealing with IoT. For instance, flashing each embedded device is a nightmare and that's why deployment is always a pain. In the cloud, we can nowadays just push an image and it's done.

I hope this makes sense. I'm open to any criticism since all of you work on Azure I guess and know better than me.

Instances would be created for each brightness and each temperature resource, but the name of those instances would be the same: -digest. So how would Pods request a specific resource? There would need to be a convention of creating separate Configurations for each resource type.

Did you mean that the name of those instance would be the same despite being of two different types? And thus how would a Pod request a specific resource type. Just clarifying since it's quite easy to get confused with all these similar terms.

There would need to be a convention of creating separate Configurations for each resource type.

Yup, I thought a bit about it in #198 (comment). The issue is the potentially huge number of resource types. But maybe in the end it's the best compromise if we suppose that a cluster owner will limit the resource types to a restricted small group.

So far the choices for Configuration are: 1 Configuration x discovery protocol, 1 Configuration x device, 1 Configuration x resource, 1 Configuration x resource type.

Maybe we should make the way the Agent names Instances more customizable, such that every instance of a Configuration does not have the same name (besides a digest id).

I think I understood what you mean, but I'll ask you to provide an example just to be sure 😅

The device plugin name == instance name, so if the instance name is more specific then a resource could be requested with a more specific name.

That depends on what you mean with "more specific name". Because of the aforementioned M2M goal, a Pod should always just request by resource type like IANA oic.r.temperature and never written to need a specific instance like temperature-digest IMO.

Thanks for sharing the use cases.

Did you mean that the name of those instance would be the same despite being of two different types? And thus how would a Pod request a specific resource type.

Yup.

Maybe we should make the way the Agent names Instances more customizable, such that every instance of a Configuration does not have the same name (besides a digest id).

A DiscoveryHandler (or a call to discover) could return the name that instance should get. So if each resource type is a discovery result, oic.r.temperature could be the returned instance name. That would still have a digest put after it, so ...

Because of the aforementioned M2M goal, a Pod should always just request by resource type like IANA oic.r.temperature and never written to need a specific instance like temperature-digest IMO.

With Akri, we definitely want to support requesting resources on the Configuration level. so you could request udev-video instead of udev-video-digest. This brings up an interesting point though: if it isnt 1 Configuration x resource type, that solution would not solve your need to request oic.r.temperature

If you think about it, in the current industry, a lot of manual repetitive work is done when dealing with IoT. For instance, flashing each embedded device is a nightmare and that's why deployment is always a pain. In the cloud, we can nowadays just push an image and it's done.

Definitely agree that there needs to be better device management and hopefully Akri could provide a path to that

This brings up an interesting point though: if it isnt 1 Configuration x resource type, that solution would not solve your need to request oic.r.temperature

Exactly, this is the reason I started to question about the current 1 Configuration x discovery protocol and the consequent additional controller which looks for instances with requested metadata if things are kept as they are.

docs/proposals/coap-core.md

kate-goldenring · 2021-01-13T15:07:08Z

deployment/helm/crds/akri-configuration-crd.yaml

+                      type: object
+                      properties:
+                        ipAddresses: # {{DeviceIPs}}
+                          type: array


Maybe add some filtering options so people can specify that they only want to discover devices that support certain resource types -- say "I only want to find printers".

I've just added support to query filtering https://tools.ietf.org/html/rfc6690#section-4.1. This will allow to filter discovered resources by setting:

queryFilter: name: rt value: oic.r.temperature

From my understanding of the standard, only 1 filter is allowed and the value itself is extremely limited. That's the rational for having a filter as fixed struct instead of list of filters. Devices are also not required to support filtering because of the constrained environment, so some filtering is done by the agent.

Where in the Configuration would people pass their requested filters? Instead of using queryFilter and having to handle the case that devices might not support it, why not just filter the results in the DiscoveryHandler? So if the DiscoveryHandler discovers 3 devices [device1: temperature, brightness, and moisture sensors/resources; device2: moisture; device3: brightness], and the configuration requests moisture resources, then the DiscoveryHandler would filter out device3 and return the other 2.

I'm currently passing queryFilters in the properties:

properties: multicast: true multicastIpAddress: 224.0.1.187 staticIpAddresses: [] queryFilter: name: rt value: oic.r.temperature discoveryTimeoutSeconds: 5

Instead of using queryFilter and having to handle the case that devices might not support it, why not just filter the results in the DiscoveryHandler

I'm already filtering the results nevertheless in the DiscoveryHandler, exactly because devices are not required to support filtering :) However, I preferred to support only the standard query filtering instead of some custom Akri filtering. This limits filtering because, as far as I understand the RFC, you can filter only one property and only one value. So you cannot write a filter like rt=temperature,brightness, only rt=temperature or rt=brightness or rt=temp*. Actually I haven't added support to wildcart yet because that would require an additional dependency to a regex crate, which I discovered is not in std in Rust. But I don't have anything against it.

The rational for supporting the standard filtering, although limited, is that it feels more suitable for a first iteration to keep things simple. When more uses will emerge, more custom configuration can be added. Otherwise, for the time being, it could feel counter-intuitive having custom filter rules because one might expect that they are transformed to a proper CoRE query but that cannot actually happen. Currently, I can instead happily see that /well-known/core?rt=oic.r.temperature is passed to my CoAP servers and the DiscoveryHandler filters the results anyway if the server ignores it. This is standard expected behaviour.

I can't deny I'm also biased because my first attempt was allowing a custom rule like rt=temperature,brightness but things can become quickly weird, both from user and Akri perspective. From the user developer perspective, one wonders how this is implemented since it's non-standard. The same doubt happens with any other kind of query structure. From Akri perspective, it become ugly to do query parsing and to do conversion between CoAP query and our custom query. For instance, if we allow a custom filter that allows only temperature and brightness, what would the CoRE discovery request look like?

/well-known/core?rt=oic.r.temperature is wrong because it's missing the brightness type

/well-known/core?rt=oic.r.brightness has the reverse issue

/well-known/core?rt=oic.r.temperature,oic.r.brightness is not standard-compliant

/well-known/core?rt=oic.r.temperature and /well-known/core?rt=oic.r.brightness? This incurs in the cost of duplicating each discovery request and increasing the overhead also on the edge device.

/well-known/core and then filtering in the DiscoveryHandler seems the best choice, but that means not following the standard and taking advantage of the query filtering even if the devices support it.

Therefore, in order to avoid complicating stuff without a clear use case for it, I think supporting the limited standard query filtering is better for the time being. There could be also the case that a future version of the CoAP standard allows slightly more complex queries and by sticking to the standard, we save a lot of headaches. What do you think?

Starting with standard query filtering and then potentially doing custom filtering as a vnext sounds good. Maybe add that information to the proposal so we can keep track of it.

deployment/helm/templates/coap-core.yaml

# Conflicts: # Cargo.lock # agent/Cargo.toml # agent/src/main.rs # agent/src/protocols/mod.rs

agent/src/protocols/mod.rs

docs/proposals/coap-core.md

# Conflicts: # Cargo.lock # shared/Cargo.toml

# Conflicts: # Cargo.lock # Cargo.toml # agent/Cargo.toml # controller/Cargo.toml # deployment/helm/Chart.yaml # samples/brokers/udev-video-broker/Cargo.toml # shared/Cargo.toml # version.txt

jiayihu · 2021-01-21T22:15:33Z

agent/Cargo.toml

@@ -12,6 +12,8 @@ async-trait = "0.1.0"
 blake2 = "0.8.0"
 chrono = "0.4.10"
 cfg-if = "0.1"
+coap = { git = "https://github.com/jiayihu/coap-rs", branch = "receive_from", optional = true }


I'll remove my own fork when the PR Covertness/coap-rs#60 is merged

jiayihu · 2021-01-21T22:32:04Z

agent/src/protocols/coap_core/discovery_handler.rs

+
+    #[tokio::test]
+    async fn test_discover_resources_via_ip_addresses() {
+        // TODO: find better way than setting env variable, which could be shared to other tests


The issue lies in https://github.com/deislabs/akri/blob/1c81b6ca44cccb596f0b19a01a220d96154b7658/agent/src/protocols/mod.rs#L24

which needs to read the env variable. It could receive a trait as parameter, but it feels a bit hacky because it should be passed down all the way from the tested method so that we can pass a mock. I'm open to suggestions.

Hi @jiayihu, we set up a mock EnvVarQuery struct that can be used in tests. Here is an example of a test that uses it and of breaking a function into fn_name and inner_fn_name to create a testable inner_fn_name that tests in the mock struct: https://sourcegraph.com/github.com/deislabs/akri/-/blob/agent/src/protocols/mod.rs#L121

Thanks for the tip! I have been using EnvVarQuery already in some tests, but this case is different. I've shown the problematic line of code in my previous comment, which is not written by me actually. So I would need to pass EnvVarQuery down to all the functions until the one actually reading the env variable, since it's an internal method of akri. That would make the API a bit awkward I guess.

I believe there's maybe a better way to write this Akri method, thus why I asked for input from you:
https://github.com/deislabs/akri/blob/1c81b6ca44cccb596f0b19a01a220d96154b7658/agent/src/protocols/mod.rs#L16-L37

jiayihu · 2021-01-21T22:37:07Z

So @kate-goldenring @bfjelds , I think this is ready for some serious review for the first iteration. This PR currently supports:

CoAP Core basic use cases
Discovery via multicast and static IP addresses
Query filtering to limit accepted devices
Broker which acts as proxy translating HTTP-CoAP GET requests and caching responses

Maybe what's missing is:

Moar tests 😄
Some documentation about how you can test this stuff locally, e.g. setting the CoAP servers, maybe in a section of the proposal markdown.

Let me know if there's anything else

# Conflicts: # Cargo.lock # agent/src/protocols/mod.rs

kate-goldenring · 2021-03-12T21:22:47Z

Hi @jiayihu! The new DiscoveryHandler model has been merged. Its a big change, so I put together a document with the steps to get this PR into the new mold. Let me know how I can help. Once you are rebased -- this is going to be hard and I wonder if creating a new PR would be better -- I could put PR's into yours with the nitty stuff like a Helm template for the Configuration.

jiayihu · 2021-03-12T22:19:17Z

Mmh I'll give it a try during the weekend maybe. I don't see many conflicts so I hope rebasing won't be a big issue. I'll ask for help in case :)

jiayihu added 4 commits January 10, 2021 18:06

Allow conditional compilation of agent protocols

499ef0c

Add support for CoAP CoRE protocol

4b161bb

Add proposal document

c467feb

Fix typo

5795283

jiayihu requested review from bfjelds, Britel, jiria and kate-goldenring as code owners January 12, 2021 22:31