Skip to content

Collapse space when implying value from textContents #51

Closed
@njkleiner

Description

@njkleiner

Describe the bug

When implying the value property for a nested microformat (e.g., h-adr inside h-entry) from the HTML textContents, multiple successive whitespace characters should be collapsed to a single space character.

To Reproduce

HTML input:

<div class="h-entry">
    <span class="p-location h-adr">
        <span class="p-locality">Berlin</span>,
        <span class="p-region">Berlin</span>,
        <span class="p-country-name">DE</span>
        <data class="p-latitude" value="52.518606"></data>
        <data class="p-longitude" value="13.376127"></data>
    </span>
</div>

Expected behavior

Correct JSON output:

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "location": [
                    {
                        "type": [
                            "h-adr"
                        ],
                        "properties": {
                            "locality": [
                                "Berlin"
                            ],
                            "region": [
                                "Berlin"
                            ],
                            "country-name": [
                                "DE"
                            ],
                            "latitude": [
                                "52.518606"
                            ],
                            "longitude": [
                                "13.376127"
                            ]
                        },
                        "value": "Berlin, Berlin, DE"
                    }
                ]
            }
        }
    ],
    "rels": {},
    "rel-urls": {},
}

Actual JSON output:

{
  "rels": {},
  "rel-urls": {},
  "items": [
    {
      "type": [
        "h-entry"
      ],
      "properties": {
        "location": [
          {
            "type": [
              "h-adr"
            ],
            "properties": {
              "locality": [
                "Berlin"
              ],
              "region": [
                "Berlin"
              ],
              "country-name": [
                "DE"
              ],
              "latitude": [
                "52.518606"
              ],
              "longitude": [
                "13.376127"
              ]
            },
            "value": "Berlin,\n        Berlin,\n        DE"
          }
        ]
      }
    }
  ]
}

Note the difference Berlin, Berlin, DE vs. Berlin,\n Berlin,\n DE.

Additional context

From what I can tell, this is not actually part of the specification, it seems to be commonly accepted though, as both the PHP parser and the Python parser do this.

Metadata

Metadata

Assignees

Labels

experimentalRelates to experimental microformat parasing

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions