Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get images from enclosure element #57

Open
Dynamix72 opened this issue Dec 22, 2021 · 5 comments · Fixed by #81
Open

Can't get images from enclosure element #57

Dynamix72 opened this issue Dec 22, 2021 · 5 comments · Fixed by #81
Labels
question Further information is requested
Milestone

Comments

@Dynamix72
Copy link

Dynamix72 commented Dec 22, 2021

I use the feed: https://www.nu.nl/rss/Algemeen

The image are in an enclosure element. Can this be solved in a new release?

<atom:link href="https://www.nu.nl/rss/Algemeen" rel="self"/> <language>nl-nl</language> <copyright>Copyright (c) 2021, NU</copyright> <lastBuildDate>Mon, 20 Dec 2021 16:39:45 +0100</lastBuildDate> <ttl>60</ttl> <atom:logo>https://www.nu.nl/algemeenstatic/img/atoms/images/logos/rss-logo-250x40.png</atom:logo> <item> <title>Novavax-vaccin kan volgens CBG sommige prikweigeraars alsnog overtuigen</title> <link>https://www.nu.nl/coronavirus/6174197/novavax-vaccin-kan-volgens-cbg-sommige-prikweigeraars-alsnog-overtuigen.html</link> <description>Het nieuwe coronavaccin van Novavax, dat &lt;a href="https://www.nu.nl/coronavirus/6174179/coronavaccin-van-novavax-als-vijfde-goedgekeurd-voor-gebruik-in-europa.html" target="_blank"&gt;maandag&lt;/a&gt; een positieve beoordeling van het Europese geneesmiddelenbureau (EMA) kreeg, kan helpen om ongevaccineerden alsnog over de streep te trekken. Het vaccin is namelijk gebaseerd op een oude techniek, met eiwitten. Dat zegt Ton de Boer, voorzitter van het College ter Beoordeling van Geneesmiddelen (CBG).</description> <pubDate>Mon, 20 Dec 2021 15:51:55 +0100</pubDate> <guid isPermaLink="false">https://www.nu.nl/-/6174197/</guid> **<enclosure length="0" type="image/jpeg" url="https://media.nu.nl/m/rnlxvbaa9mz9_sqr256.jpg/novavax-vaccin-kan-volgens-cbg-sommige-prikweigeraars-alsnog-overtuigen.jpg"/>** <category>Algemeen</category> <category>Binnenland</category> <category>Coronavirus</category> <dc:creator>NU.nl/ANP</dc:creator> <dc:rights>copyright photo: EPA</dc:rights> <atom:link href="https://nu.nl/coronavirus/6174179/coronavaccin-van-novavax-als-vijfde-goedgekeurd-voor-gebruik-in-europa.html" rel="related" title="Coronavaccin van Novavax als vijfde goedgekeurd voor gebruik in Europa" type="text/html"/> <atom:link href="https://nu.nl/coronavirus/6174077/speciale-omikronvaccins-komen-eraan-maar-onzeker-of-ze-worden-ingezet.html" rel="related" title="Speciale omikronvaccins komen eraan, maar onzeker of ze worden ingezet" type="text/html"/> <atom:link href="https://nu.nl/coronavirus/6173729/mensen-bellen-massaal-telefoonlijn-voor-vaccintwijfelaars-limiet-bijna-bereikt.html" rel="related" title="Mensen bellen massaal telefoonlijn voor vaccintwijfelaars, limiet bijna bereikt" type="text/html"/> </item>

@Dynamix72 Dynamix72 changed the title Can't get images from enclore element Can't get images from enclosure element Dec 22, 2021
@BastienM
Copy link

Bump on this.
Currently setting up my first RSS feed and sadly the feed is using <enclosure> tag rather than an image field.

@Gamepsyched
Copy link

Gamepsyched commented Apr 21, 2022

Hi Dynmix,

I have adjusted the sensor.py file under custom component and it seems to be working for me. I had checked your RSS feed and it seems to work.

"""Feedparser sensor"""
from __future__ import annotations

import asyncio
import re
from datetime import timedelta

import homeassistant.helpers.config_validation as cv
import voluptuous as vol
from dateutil import parser
from homeassistant.components.sensor import PLATFORM_SCHEMA, SensorEntity
from homeassistant.const import CONF_NAME, CONF_SCAN_INTERVAL
from homeassistant.core import HomeAssistant
from homeassistant.helpers.entity_platform import AddEntitiesCallback
from homeassistant.helpers.typing import ConfigType, DiscoveryInfoType

import feedparser

__version__ = "0.1.6"

COMPONENT_REPO = "https://github.com/custom-components/sensor.feedparser/"

REQUIREMENTS = ["feedparser"]

CONF_FEED_URL = "feed_url"
CONF_DATE_FORMAT = "date_format"
CONF_INCLUSIONS = "inclusions"
CONF_EXCLUSIONS = "exclusions"
CONF_SHOW_TOPN = "show_topn"

DEFAULT_SCAN_INTERVAL = timedelta(hours=1)

PLATFORM_SCHEMA = PLATFORM_SCHEMA.extend(
    {
        vol.Required(CONF_NAME): cv.string,
        vol.Required(CONF_FEED_URL): cv.string,
        vol.Required(CONF_DATE_FORMAT, default="%a, %b %d %I:%M %p"): cv.string,
        vol.Optional(CONF_SHOW_TOPN, default=9999): cv.positive_int,
        vol.Optional(CONF_INCLUSIONS, default=[]): vol.All(cv.ensure_list, [cv.string]),
        vol.Optional(CONF_EXCLUSIONS, default=[]): vol.All(cv.ensure_list, [cv.string]),
        vol.Optional(CONF_SCAN_INTERVAL, default=DEFAULT_SCAN_INTERVAL): cv.time_period,
    }
)


@asyncio.coroutine
def async_setup_platform(
    hass: HomeAssistant,
    config: ConfigType,
    async_add_devices: AddEntitiesCallback,
    discovery_info: DiscoveryInfoType | None = None,
) -> None:
    async_add_devices(
        [
            FeedParserSensor(
                feed=config[CONF_FEED_URL],
                name=config[CONF_NAME],
                date_format=config[CONF_DATE_FORMAT],
                show_topn=config[CONF_SHOW_TOPN],
                inclusions=config[CONF_INCLUSIONS],
                exclusions=config[CONF_EXCLUSIONS],
                scan_interval=config[CONF_SCAN_INTERVAL],
            )
        ],
        True,
    )


class FeedParserSensor(SensorEntity):
    def __init__(
        self,
        feed: str,
        name: str,
        date_format: str,
        show_topn: str,
        exclusions: str,
        inclusions: str,
        scan_interval: int,
    ) -> None:
        self._feed = feed
        self._attr_name = name
        self._attr_icon = "mdi:rss"
        self._date_format = date_format
        self._show_topn = show_topn
        self._inclusions = inclusions
        self._exclusions = exclusions
        self._scan_interval = scan_interval
        self._attr_state = None
        self._entries = []
        self._attr_extra_state_attributes = {"entries": self._entries}

    def update(self):
        parsed_feed = feedparser.parse(self._feed)

        if not parsed_feed:
            return False
        else:
            self._attr_state = (
                self._show_topn
                if len(parsed_feed.entries) > self._show_topn
                else len(parsed_feed.entries)
            )
            self._entries = []

            for entry in parsed_feed.entries[: self._attr_state]:
                entry_value = {}

                for key, value in entry.items():
                    if (
                        (self._inclusions and key not in self._inclusions)
                        or ("parsed" in key)
                        or (key in self._exclusions)
                    ):
                        continue
                    if key in ["published", "updated", "created", "expired"]:
                        value = parser.parse(value).strftime(self._date_format)

                    entry_value[key] = value

                if "image" in self._inclusions and "image" not in entry_value.keys():
                    images = []
                    if "summary" in entry.keys():
                        images = re.findall(
                            r"<img.+?src=\"(.+?)\".+?>", entry["summary"]
                        )
                    if images:
                        entry_value["image"] = images[0]
                    else:
                        if "links" in entry.keys():
                            images = re.findall(
                            '(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-&?=%.]+', str(entry["links"][1])
                        )
                        if images:
                            entry_value["image"] = images[0]
                        else:
                            entry_value[
                                "image"
                            ] = "https://www.home-assistant.io/images/favicon-192x192-full.png"

                self._entries.append(entry_value)

    @property
    def state(self):
        """Return the state of the sensor."""
        return self._attr_state

    @property
    def extra_state_attributes(self):
        return {"entries": self._entries}

@ogajduse
Copy link
Collaborator

#81 should fix this issue.
Could you please try the beta release I did and tell me if images show up for you?
https://github.com/custom-components/feedparser/releases/tag/0.2.0b0

If they do not show up, could you please provide the feed URL, so I can investigate?

Note: #78, #57 and #64 should be addressed and fixed by #81.

@Paktas
Copy link

Paktas commented Aug 2, 2023

Same here - trying to get image url from media:content element inside RSS item:

<media:content url="https://g.delfi.lt/images/pix/518x0/BvXDZaXkhRU/rusijos-karas-pries-ukraina-94097143.jpg"/>

@ogajduse
Copy link
Collaborator

ogajduse commented Aug 3, 2023

@Paktas What version of the integration do you run?

@ogajduse ogajduse linked a pull request Aug 9, 2023 that will close this issue
@ogajduse ogajduse added the question Further information is requested label Aug 10, 2023
@ogajduse ogajduse added this to the Release 0.2.0 milestone Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants