Closed
Description
Describe the bug
When implying the value
property for a nested microformat (e.g., h-adr
inside h-entry
) from the HTML textContents
, multiple successive whitespace characters should be collapsed to a single space character.
To Reproduce
HTML input:
<div class="h-entry">
<span class="p-location h-adr">
<span class="p-locality">Berlin</span>,
<span class="p-region">Berlin</span>,
<span class="p-country-name">DE</span>
<data class="p-latitude" value="52.518606"></data>
<data class="p-longitude" value="13.376127"></data>
</span>
</div>
Expected behavior
Correct JSON output:
{
"items": [
{
"type": [
"h-entry"
],
"properties": {
"location": [
{
"type": [
"h-adr"
],
"properties": {
"locality": [
"Berlin"
],
"region": [
"Berlin"
],
"country-name": [
"DE"
],
"latitude": [
"52.518606"
],
"longitude": [
"13.376127"
]
},
"value": "Berlin, Berlin, DE"
}
]
}
}
],
"rels": {},
"rel-urls": {},
}
Actual JSON output:
{
"rels": {},
"rel-urls": {},
"items": [
{
"type": [
"h-entry"
],
"properties": {
"location": [
{
"type": [
"h-adr"
],
"properties": {
"locality": [
"Berlin"
],
"region": [
"Berlin"
],
"country-name": [
"DE"
],
"latitude": [
"52.518606"
],
"longitude": [
"13.376127"
]
},
"value": "Berlin,\n Berlin,\n DE"
}
]
}
}
]
}
Note the difference Berlin, Berlin, DE
vs. Berlin,\n Berlin,\n DE
.
Additional context
From what I can tell, this is not actually part of the specification, it seems to be commonly accepted though, as both the PHP parser and the Python parser do this.