Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attaching data to messages #93

Closed
wants to merge 3 commits into from
Closed

Attaching data to messages #93

wants to merge 3 commits into from

Conversation

kegsay
Copy link
Member

@kegsay kegsay commented Oct 12, 2015

The discussion on this has progressed and this document is now superseded by this Google Doc


This is a straw-man proposal detailing how data can be attached to m.room.messages to provide a richer experience for knowledgeable clients. I expect this will change a lot, but is being published now to get some feedback.

Rendered: https://github.com/matrix-org/matrix-doc/blob/data-messages/drafts/m-room-message-data.rst

type: "m.room.message",
content: {
msgtype: "m.text",
body: "[matrix-org/matrix-ios-sdk] manuroe pushed 4 commits to develop",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is expressive enough to be generally useful. For sure I can see uses, but I think restricting to one entity for the message is limiting.

Slack's API I think has this down pretty well; anything which has more context will be in a <> tag, and the context is on the left of a pipe, e.g.

"Just FYI, <@U1234|manuroe> pushed some commits, yo"

would turn manuroe into a link to the user with ID @U1234, but if your client doesn't understand, it would just print:

"Just FYI, manuroe pushed some commits, yo"

I could imagine having a list of entities in the data or context key, and being able to refer to them by index, e.g.

content: {
  body: "Just FYI, manuroe pushed 4 commits to develop",
  interpolated_body: "Just FYI, <#1|manuroe> pushed 4 commits to <#2|develop>",
  data: {
    entities: [
      {
        "domain": "github.com",
        "user": "manuroe",
        "link": "https://github.com/manuroe",
      },
      {
        "domain": "github.com",
        "repo": "matrix-org/matrix-ios-sdk",
        "branch": "develop",
        "link": "https://github.com/matrix-org/matrix-ios-sdk/tree/develop",
      },
    ]
  }
}

But I think limiting to a top-level context is slightly limiting.

I am not sure how I feel about by default having something which is always interpretable in body, or requiring a separate interpolated_body or something...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Allowing multiple entities and allowing them to associate with specific parts of the message looks very nice - e.g. a Jenkins failure report relating to a git commit would probably have three or four different linkable entities.

@ara4n
Copy link
Member

ara4n commented Oct 13, 2015

I'm a bit confused by this one. Isn't it more idiomatic to just send a custom event type, but with a human-readable body field (and mandate that clients should always display the 'body' param of events they don't recognise)? So the original example here would be something more like:

type: "org.matrix.git.commits",
content: {
body: "[matrix-org/matrix-ios-sdk] manuroe pushed 4 commits to develop",
"entity": "manuroe",
"commits": ["fe34764", "4cdd8ae", "528da705", "56bfc717"]
"link": "matrix-org/matrix-ios-sdk@56bfc717",
"uri": "https://github.com/matrix-org/matrix-ios-sdk.git"
}

Either way, it feels particularly werid that the 'domain' of the context wouldn't be an m.* or org.matrix.* style namespace.

Dan's alternative slack-inspired suggestion is interesting, in that I guess it's effectively a richer hyperlink representation. Again, don't see why "domain" param is "github.com" rather than an com.github.* style namespace for the entity. The only thing that feels a little weird is inventing an entirely new hypertext markup language with <foo|bar> style representation for these links (and cargoculting random antics from Slack). What's wrong with HTML? manuroe etc?

Otherwise LGTM, although given nothing is screaming out for this i'd rather we prioritised features & bugs on the critical path. I'm also not sure what our policy should really be on landing vapourware spec stuff; I quite like the idea of keeping the spec fairly strictly tracking what we actually have implemented in the reference implementations rather than diverging off into scifi. (This is obviously fine to keep in drafts tho, which is generally scifi HQ! :)

@ara4n ara4n mentioned this pull request Oct 13, 2015
@kegsay
Copy link
Member Author

kegsay commented Oct 13, 2015

Isn't it more idiomatic to just send a custom event type, but with a human-readable body field (and mandate that clients should always display the 'body' param of events they don't recognise)?

It is not more idiomatic to do this. Clients everywhere just switch on the event type. If they don't recognise it, they drop it on the floor. m.room.message mandates that you should display content.body if you don't recognise the msgtype. It would be disastrous if you wanted every client to display a body for every event and enforced such a key on the HS because it just doesn't make any sense. For example, I don't do VoIP, so I don't know any m.call.* events, so you want me to display a body for things like candidates? What about read receipts? State events?

Either way, it feels particularly werid that the 'domain' of the context wouldn't be an m.* or org.matrix.* style namespace.

We do org.matrix.*-style namings to provide namespaces to avoid conflicts between events. The intention with domain is not to provide namespacing but to provide a hint as to what the data relates to e.g. my client knows how to talk to the Github API, so if I see domain: github.com I can potentially use this information to get the owner/repo, Github username, etc.

What's wrong with HTML?

Parsing freeform HTML is at best a pain and extremely error-prone and at worst impossible to do. Dan's list of entities makes parsing much easier.

nothing is screaming out for this

The moment we want to do anything more complicated than displaying text and images we're going to bump into this problem. Given how quick it was to write, I don't see the problem in asking for feedback from people.

I'm also not sure what our policy should really be on landing vapourware spec stuff; I quite like the idea of keeping the spec fairly strictly tracking what we actually have implemented in the reference implementations rather than diverging off into scifi.

This is precisely why this is in /drafts :)

@ara4n
Copy link
Member

ara4n commented Oct 13, 2015

It is not more idiomatic to do this. Clients everywhere just switch on the event type. If they don't recognise it, they drop it on the floor. m.room.message mandates that you should display content.body if you don't recognise the msgtype.

I thought the plan was to support content.body fallback on all events for this purpose.

It would be disastrous if you wanted every client to display a body for every event and enforced such a key on the HS because it just doesn't make any sense. For example, I don't do VoIP, so I don't know any m.call.* events, so you want me to display a body for things like candidates? What about read receipts? State events?

Obviously you wouldn't put a content.body on events which make no sense to display as text messages in a timeline.

Either way, it feels particularly werid that the 'domain' of the context wouldn't be an m.* or org.matrix.* style namespace.

We do org.matrix.*-style namings to provide namespaces to avoid conflicts between events. The intention with domain is not to provide namespacing but to provide a hint as to what the data relates to e.g. my client knows how to talk to the Github API, so if I see domain: github.com I can potentially use this information to get the owner/repo, Github username, etc.

Why wouldn't you namespace these hints?

What's wrong with HTML?

Parsing freeform HTML is at best a pain and extremely error-prone and at worst impossible to do. Dan's list of entities makes parsing much easier.

Fair point.

nothing is screaming out for this

The moment we want to do anything more complicated than displaying text and images we're going to bump into this problem. Given how quick it was to write, I don't see the problem in asking for feedback from people.

Not complaining about it being written at all :) just trying to give review.

I'm also not sure what our policy should really be on landing vapourware spec stuff; I quite like the idea of keeping the spec fairly strictly tracking what we actually have implemented in the reference implementations rather than diverging off into scifi.

This is precisely why this is in /drafts :)

yay!

@kegsay
Copy link
Member Author

kegsay commented Oct 13, 2015

I thought the plan was to support content.body fallback on all events for this purpose.

I don't recall anyone suggesting this (just m.room.message events).

Obviously you wouldn't put a content.body on events which make no sense to display as text messages in a timeline.

I 100% agree. But that is the crux of the problem. You don't know it is a dumb idea to try to display m.call.candidates but it does make sense to display org.matrix.git.commits. That is why it only makes sense to apply body fallbacks on "things which make sense to be displayed in a timeline" which is m.room.message events.

Why wouldn't you namespace these hints?

I feel namespacing them as com.github would be acting as a hint to the developer to say "make this unique to your client to avoid clashes" just like they are already doing with event types. We do not want to convey this hint because we want people to be able to do if (domain == "github.com") then ... which obviously doesn't work if everyone has namespaced them to be myapp.com.github or com.github.owner.repo

@erikjohnston
Copy link
Member

I feel namespacing them as com.github would be acting as a hint to the developer to say "make this unique to your client to avoid clashes" just like they are already doing with event types. We do not want to convey this hint because we want people to be able to do if (domain == "github.com") then ... which obviously doesn't work if everyone has namespaced them to be myapp.com.github or com.github.owner.repo

How do I, as a client, interpret the metadata? Do I have to go if domain == github.com and then know what keys that are likely to be there? How do I differentiate between metadata about a commit and metadata about a PR? Do I have to look at what keys are there?

The reason for namespacing event types and msgtypes is to allow clients to know what keys to expect and what they mean. It also ensures that if two different developers try and implement the same thing (e.g. HTML msgtype) they won't both clash and confuse clients. Take your github.com example, what happens if two different developers try to come up with it at the same time, but using different key names?

How do you expect clients to use the metadata? How will it be displayed? How much does it conflict with different msgtype?

@illicitonion illicitonion assigned kegsay and unassigned illicitonion Oct 19, 2015
@eternaleye
Copy link

One thing I wonder is whether some variety of RDF - be it JSON-LD or something else - might be better for metadata. In particular, the namespacing concern falls out quite naturally by way of subjects, objects, and predicates being, well, URIs (or blank nodes, but whatever).

@kegsay
Copy link
Member Author

kegsay commented Jun 7, 2016

The discussion on this has progressed and this document is now superseded by https://docs.google.com/document/d/1l08DL_F_CHo1pIORXzcqWLlZQpztogMFUTYmxd4Gr5s

@kegsay kegsay closed this Jun 7, 2016
@eternaleye
Copy link

That document seems to cover the "formatting" side of this ticket, but not the "auxiliary semantic data" side. Should this be reopened, a new ticket opened for that alone, ...?

@kegsay
Copy link
Member Author

kegsay commented Jun 10, 2016

@eternaleye : It is covered via:

We add a data key to m.room.message’s content to store arbitrary metadata about a displayable event, to clearly separate it from the rest of the keys in content.

We do not specify standard keys which will be present in content.data yet if that's what you mean. Without being clear on what set of interop we want on the data side, I feel it won't progress any more than this existing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants