Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR]: Automatic control on maximal amount of feed articles #1270

Closed
Ac314 opened this issue Jan 10, 2024 · 36 comments
Closed

[FR]: Automatic control on maximal amount of feed articles #1270

Ac314 opened this issue Jan 10, 2024 · 36 comments
Assignees
Labels
Status-Fixed Ticket is resolved. Type-Enhancement This is request for brand new feature.
Milestone

Comments

@Ac314
Copy link

Ac314 commented Jan 10, 2024

Brief description of the feature request

[This is kind of revamp of my old FR added years ago]
The option "Keep only N recent articles in feeds" (global and per feed) will allow the user to control what amount of old articles he wants to keep. If it is enabled the database will not bloat because of old unneeded articles (they will be deleted automatically).

Important part here is to have some "modifiers" for this options to allow user to decide what to keep and what to delete:

  • Ignore unread messages when count/delete
  • Ignore important messages when count/delete
  • Ignore labeled messages when count/delete

For example, if I set amount of articles to keep, but did not have time to read last articles I can enable "Ignore unread messages when count/delete" and unread messages will no be deleted. So I can be sure they will not disappear before I read them.

The same with "Ignore important messages when count/delete" - if I want to keep some old messages and to be sure they will not disappear I will mark them "important" and enable this option. In this case they will be kept in the database and not deleted.

The automatic cleaning itself can be done by various means, for example:

  1. Clean feed when it is updated
  2. Clean the database when application is idle
  3. Clean the database on application exit
@Ac314 Ac314 added the Type-Enhancement This is request for brand new feature. label Jan 10, 2024
@martinrotter
Copy link
Owner

Implementing...

@martinrotter martinrotter added this to the 4.6.4 milestone Jan 11, 2024
@martinrotter
Copy link
Owner

rssguard_PqSb3n1Xht

@Ac314
Copy link
Author

Ac314 commented Jan 12, 2024

That's cool!

@martinrotter
Copy link
Owner

@slalaurette

@martinrotter
Copy link
Owner

martinrotter commented Jan 16, 2024

So. It is now implemented.

450d01e

It can be tweaked app-wide in settings, or in feed settings. It can be also tweaked in a batched manner when multiple feeds are selected.

Automatic cleanups now run AFTER feed is fetched. So, if you for example set that you only want to keep 15 articles, then all excessive OLDER articles are removed or moved to recycle bin, depending on settings.

Excessive articles are ALSO purged from recycle bin (this behavior might be tweaked based on user feedback).

For example if you have set (keep 15 articles, dont remove unread or starred), and there is 100 unread articles in your feed, none of articles will be deleted (because they are unread). If you then mark them read and re-fetch feed, all articles meeting criteria except for 15 newest articles are deleted.

There might be some "potentialy" weird behaviors when you set for example to keep only 10 article while your live RSS feed contains 80 articles etc. Naturally, the feature is meant to be used with higher numbers of articles than is available in your live feed file, like 300 articles etc.

Relevant dev. build is now building, PLEASE test when done. This feature is relatively complex and might have some undesired side effects which we need to iron out before release.

@martinrotter martinrotter added the Status-Fixed Ticket is resolved. label Jan 16, 2024
@Ac314
Copy link
Author

Ac314 commented Jan 16, 2024

Thanks! I will do the testing, it will take some time...

@martinrotter
Copy link
Owner

I will close it, so that I know its "Done" but please, post your findings here.

@Ac314
Copy link
Author

Ac314 commented Jan 17, 2024

Found some bug: if we set the option for the specific feed, and after that disable it again, it does not revert to global setting. For example: we have global setting "Keep only 10 articles", than for some feed set it to "Keep only 20 articles". After it works some time we disable "Customize article limits" so it should revert to global settings (10 articles limit). But it keeps using 20 articles limit, not cut it to 10.

@martinrotter
Copy link
Owner

will test

@Ac314
Copy link
Author

Ac314 commented Jan 17, 2024

I am not 100% sure but maybe also after you decrease the limit for specific feed (like was 30, now you set it to 20) it also keeps the old setting (30 articles limit).

@martinrotter
Copy link
Owner

607439e Found the problem and fixed, pls re-test.

@Ac314
Copy link
Author

Ac314 commented Jan 17, 2024

Thanks, seems works OK now.
The cleaning mechanics itself work good, which is very nice.
I will keep watching it for several days, I hope everything will bу OK.

@martinrotter
Copy link
Owner

Yes, there are some related bugs I found (like notification dialog informs about new articles, which were actually already deleted by the cleaning mechanism. So, I will have to think about this deeper.

@slalaurette
Copy link

If I go to a folder with subfolders, choose "Edit child feeds (recursive)" and try to set options in the "Limiting amount of article in feeds" (you missed an "s" there by the way), I get this cryptic error: "Cannot save changes: Parameter count mismatch".

There are also two mysterious checkboxes with no text whose function I don't understand.

@martinrotter
Copy link
Owner

@slalaurette Make sure to have latest devbuild downloaded and also note that INDIVIDUAL devbuilds are NOT interchangeable as database structure differs.

So test with clean user data folder please. Let me know if that helped.

As for those mysterisou checkboxes, they only appear when you multi-edit more feeds and their purpose is that they turn applying of changes on or off. In other words, if you want to change some attribute, simply check that box, then change the attribute.

@slalaurette
Copy link

OK! Will test with a clean folder.

Those checkboxes need some description, I think. ツ

Once more, thanks for your awesome work!

@martinrotter
Copy link
Owner

Those checkboxes have tooltips. :) Anyway, pls-test and report back, i will need to iron out many things

@slalaurette
Copy link

In an installation with just the default feeds, it seems to work fine. Will keep testing!

@martinrotter
Copy link
Owner

Sure, test everything you suspect being related. I tested extensively but it might be that my test-cases are too shallow and someone else may find something.

@slalaurette
Copy link

BTW when trying to come up with a way to generate a clean profile, I found that I can only uninstall "RSS Guard" from my computer, but not the development version. I had to install the dev build in a new folder so now I have three versions of the program, the main release and two dev builds. I don't know how to install the dev builds without affecting the main program.

An option for a standalone install would help greatly for testing.

@martinrotter
Copy link
Owner

Well, I did not test installers for years now.

It is very simple, simply unpack devbuild "zip" into separate folder. Thats all.

@slalaurette
Copy link

Ah, got it! I was mindlessly downloading the .exe version. Thanks!

@martinrotter
Copy link
Owner

I updated to latest devbuild and activated 500 articles global limit for my feeds. So far it works nice.

Just a side note. Even if articles are removed from DB due to this feature, then database .db file size does not decrease automatically. This is because even when some DB data are removed SQLite does not "shrink" file automatically. To shrink file use DB cleanup feature and set it up like this, then clean.

image

@Ac314
Copy link
Author

Ac314 commented Jan 19, 2024

It seems the app ignores messages marked "important" when counting before cleaning if they are not the oldest in the feed (yes, sounds vague).
Example: I have the feed with 20 articles limit. There are now 25 articles overall including 5 important ones - 2 are the oldest in the feed and 3 are with different dates (spread across the feed). "Do not remove important articles" is enabled. When I update the feed I have 22 articles left. All 5 important ones are not deleted which is good but there is still unclear why it deleted 3 articles.
I am not sure how it works currently: either 20 messages is an overall limit for all types of messages in feed or 20 messages is the limit for all messages except the important ones (and unread if corresponding modifier is enabled). In first case 15+5 should remain, in the second case it is 20+5. Bit 17+5 is a bit strange.

@Ac314
Copy link
Author

Ac314 commented Jan 19, 2024

Forgot to mention (it can be important): I have a filter for this feed to ignore new messages with the same URL:

function filterMessage() {
if (msg.isAlreadyInDatabase(MessageObject.SameUrl)) {
return MessageObject.Ignore;
} else {
return MessageObject.Accept;
}
}

And this feed actually often have articles which published date/time is changed as time passes but the article itself (including URL) is the same.

@RetroAbstract
Copy link

There is unexpected behavior with CSS2RSS feeds when limiting them to for example 2 articles: RSS Guard cycles through articles in a pool that encompasses articles beyond the set limit to keep at every fetch.

Steps to reproduce:

  1. Add this bandcamp feed: https://maryhalvorson.bandcamp.com/music
  2. Use CSS2RSS with this selector: ".music-grid-item"
  3. Either globally or in the feed's settings, set RSS Guard to: keep 2 articles, remove important & unread articles
  4. Fetch the feed once, you should now have the 2 latest items on the bandcamp page matching the selector, those being: "Cloudward" & "Amaryllis"
  5. Fetch the feed again (multiple times). At every fetch, both "Cloudward" & "Amaryllis" should be replaced at random by and cycle through one of the 4 latest items on the page: "Cloudward", "Amaryllis", "Belladonna", "Artlessly Falling Mary Halvorson's Code Girl".

What happens in 5) is the unexpected behavior part. When re-fetching after the first time, the articles should not change in the CSS2RSS bandcamp feed and remain "Cloudward" & "Amaryllis".

(The 2 latest items of a CSS2RSS will always be fetched as timestamps of articles from these feeds are always based on when the feed is fetched.)

This unexpected behavior happens for every feed made with CSS2RSS (in my experience).

@martinrotter
Copy link
Owner

There is unexpected behavior with CSS2RSS feeds when limiting them to for example 2 articles: RSS Guard cycles through articles in a pool that encompasses articles beyond the set limit to keep at every fetch.

Steps to reproduce:

1. Add this bandcamp feed: https://maryhalvorson.bandcamp.com/music

2. Use CSS2RSS with this selector: ".music-grid-item"

3. Either globally or in the feed's settings, set RSS Guard to: keep 2 articles, remove important & unread articles

4. Fetch the feed once, you should now have the 2 latest items on the bandcamp page matching the selector, those being: "_Cloudward_" & "_Amaryllis_"

5. Fetch the feed again (multiple times). At every fetch, both "_Cloudward_" & "_Amaryllis_" should be replaced at random by and cycle through one of the 4 latest items on the page: "_Cloudward_", "_Amaryllis_", "_Belladonna_", "_Artlessly Falling Mary Halvorson's Code Girl_".

What happens in 5) is the unexpected behavior part. When re-fetching after the first time, the articles should not change in the CSS2RSS bandcamp feed and remain "Cloudward" & "Amaryllis".

(The 2 latest items of a CSS2RSS will always be fetched as timestamps of articles from these feeds are always based on when the feed is fetched.)

This unexpected behavior happens for every feed made with CSS2RSS (in my experience).

Will test.

@martinrotter
Copy link
Owner

It seems the app ignores messages marked "important" when counting before cleaning if they are not the oldest in the feed (yes, sounds vague). Example: I have the feed with 20 articles limit. There are now 25 articles overall including 5 important ones - 2 are the oldest in the feed and 3 are with different dates (spread across the feed). "Do not remove important articles" is enabled. When I update the feed I have 22 articles left. All 5 important ones are not deleted which is good but there is still unclear why it deleted 3 articles. I am not sure how it works currently: either 20 messages is an overall limit for all types of messages in feed or 20 messages is the limit for all messages except the important ones (and unread if corresponding modifier is enabled). In first case 15+5 should remain, in the second case it is 20+5. Bit 17+5 is a bit strange.

Will test.

@martinrotter
Copy link
Owner

It seems the app ignores messages marked "important" when counting before cleaning if they are not the oldest in the feed (yes, sounds vague). Example: I have the feed with 20 articles limit. There are now 25 articles overall including 5 important ones - 2 are the oldest in the feed and 3 are with different dates (spread across the feed). "Do not remove important articles" is enabled. When I update the feed I have 22 articles left. All 5 important ones are not deleted which is good but there is still unclear why it deleted 3 articles. I am not sure how it works currently: either 20 messages is an overall limit for all types of messages in feed or 20 messages is the limit for all messages except the important ones (and unread if corresponding modifier is enabled). In first case 15+5 should remain, in the second case it is 20+5. Bit 17+5 is a bit strange.

OK, tested. Will try to explain.

In feature settings, there is setting called "Keep X newest articles". Exactly that is meant. So if you set it to 20, then you will be sure that 20 newest articles will be there.

Plus if you select "do not remove read/unread" then what is meant is that "There will be 20 newest article AND each article which is not read or is starred will also be there.

Feature works like this:

  1. Sort all articles from newest and find date/time of 20th article (the last one which will not be deleted).
  2. Delete all articles which are OLDER than the found date/time and they do meet read/unstarred criteria (if set).

Now apply the logic to your situation:
Settings - keep 10 messages, keep all starred.
Feed - currently contains 20 messages, 5 oldest are starred and 3 newest are starred and there are no new articles in the online feed source.

Now, feed is fetched (and cleaning logic runs):

  1. Logic fetches date/time of 10th article (there is 3 starred + 7 regular articles in first 10).
  2. Logic deletes everything after 10th article, except last 5 starred ones.
  3. So there should remain 3 starred (newest) articles + 7 regular articles + 5 starred (oldest) articles = 15.

Below are 3 pictures -> initial state with 20 articles -> settings turned on -> after re-fetching (cleaning)

rssguard_7ev6w9oFhj
rssguard_NdF0spUjQg
rssguard_yiFQLtHDjN

@martinrotter
Copy link
Owner

Just to append info: Yes, in fact the "do not delete starred/unread" setting only affects articles which are "older" than limited count. So if you have set 20 articles to keep, it is possible that some read articles will be kept and only those older than "20" limit will be deleted (if you have that setting to delete read enabled).

@martinrotter
Copy link
Owner

Why I did this approach was my mindset that: "Well, I want to see X number of my newest articles no matter what state they are, just keep those, but make sure to delete older stuff to avoid having huge DB file."

@martinrotter
Copy link
Owner

There is unexpected behavior with CSS2RSS feeds when limiting them to for example 2 articles: RSS Guard cycles through articles in a pool that encompasses articles beyond the set limit to keep at every fetch.

Steps to reproduce:

1. Add this bandcamp feed: https://maryhalvorson.bandcamp.com/music

2. Use CSS2RSS with this selector: ".music-grid-item"

3. Either globally or in the feed's settings, set RSS Guard to: keep 2 articles, remove important & unread articles

4. Fetch the feed once, you should now have the 2 latest items on the bandcamp page matching the selector, those being: "_Cloudward_" & "_Amaryllis_"

5. Fetch the feed again (multiple times). At every fetch, both "_Cloudward_" & "_Amaryllis_" should be replaced at random by and cycle through one of the 4 latest items on the page: "_Cloudward_", "_Amaryllis_", "_Belladonna_", "_Artlessly Falling Mary Halvorson's Code Girl_".

What happens in 5) is the unexpected behavior part. When re-fetching after the first time, the articles should not change in the CSS2RSS bandcamp feed and remain "Cloudward" & "Amaryllis".

(The 2 latest items of a CSS2RSS will always be fetched as timestamps of articles from these feeds are always based on when the feed is fetched.)

This unexpected behavior happens for every feed made with CSS2RSS (in my experience).

No. Let me explain in detail what happens:

  1. You fetch feed for the first time, DB contains all articles. As the source feed does not provide date/time stamps for articles, RSS Guard automatically assigns current time BUT INCREMENTS THE TIME with 1 SECOND FOR EACH ARTICLE (so that do not actually have all exactly same time).
  2. You apply cleaning logic (only keep 2 NEWEST articles - NEWEST is the keyword here).
  3. You re-fetch for the first time -> all articles are already in the DB, no new article is fetched -> 2 newest articles are kept: Cloudward and Amarylis. The DB at this point contains ONLY these 2 articles, everything else is purged.
  4. You now-refetch your feed, 2 articles are already in DB but 18 articles are not and they are now inserted into DB WITH CURRENT date/time autogenerated stamps. SO THESE 18 articles are now NEWER than those 2 (Cloudward and Amarylis).
  5. Cleaning logic now automatically runs and removes all but 2 newest articles, which are now Beladona and Artlessly.

This behavior is result of several things:

  1. Your feed does not provide stable date/time stamps for articles which is major PITA and really happens very rarely-with real-time feeds. You have to solve this issue by assigning artifical STABLE time/date to each article in your feed (or via article filter).
  2. Your cleaning logic is too strict and purges articles which are still available in remote feed source (script in your case). The mindset here is that this feature is meant for purgin OLD ENOUGH articles which are not even present in online feed file anymore. Of course, if your cleaning logic purges some article it will be re-added on next feed fetch if it is still present in remote source and somehow does not suddenbly fall into "too old" state (in this case this was caused by point 1, but will likely happen VERY rarely in real-world use-cases).

@Ac314
Copy link
Author

Ac314 commented Jan 23, 2024

Thanks for the explanation, it is quite clear.
When I added this FR I actually had some different logic in mind. It was like the following: we have "needed" and "unneeded" articles - needed must be kept, unneeded should be kept for some time and cleaned up gradually. If I read the message and it is not marked important then it is unneeded already. But sometimes I need to return to it again after some time (like tomorrow) to refresh it in memory - so some kind of "archive" of unneeded messages would be nice. And this archive should be cleaned up little-by-little.
For example: we have 20 articles limit, now we have 15 important, 15 unread and 10 read messages. In this case nothing will be deleted because 20 > 10. Other 30 are still "needed", so they are just ignored.
I cannot say that this logic is better or worse than currently implemented - it is just different. What is important is to describe it clearly in the documentation in the future to explain thoroughly how it works. Or people could be confused a bit (like me).

@martinrotter
Copy link
Owner

@Ac314 I believe we (or just you) are overthinking it. This is exactly the feature which would have to have like 10 sub-features to suit everyone's use-cases.

And also, the approach I chose is exactly chosen because it is "softer". It does keep specified amount of articles (no matter what) and deletes only the excess.

Your worry about deleting some articles "too early" will be very likely solved by setting higher number of kept articles and that is kind of expected way of using the feature.

If your feed accumulates thousands of articles, it is perhaps wise to set your number of kept ones to like 100 or 200. Also, if DB size is not a problem, just check "recycle instead of purging" checkbox and your articles will be moved to recycle bin. You can then purge recycle bin via "database cleanup" dialog from time to time. That way you have all your "recycle" articles still in recycle bin.

@Ac314
Copy link
Author

Ac314 commented Jan 23, 2024

I have no objections. Current logic is carrying out its task.

@RetroAbstract
Copy link

RetroAbstract commented Jan 23, 2024

There is unexpected behavior with CSS2RSS feeds when limiting them to for example 2 articles: RSS Guard cycles through articles in a pool that encompasses articles beyond the set limit to keep at every fetch.
Steps to reproduce:

1. Add this bandcamp feed: https://maryhalvorson.bandcamp.com/music

2. Use CSS2RSS with this selector: ".music-grid-item"

3. Either globally or in the feed's settings, set RSS Guard to: keep 2 articles, remove important & unread articles

4. Fetch the feed once, you should now have the 2 latest items on the bandcamp page matching the selector, those being: "_Cloudward_" & "_Amaryllis_"

5. Fetch the feed again (multiple times). At every fetch, both "_Cloudward_" & "_Amaryllis_" should be replaced at random by and cycle through one of the 4 latest items on the page: "_Cloudward_", "_Amaryllis_", "_Belladonna_", "_Artlessly Falling Mary Halvorson's Code Girl_".

What happens in 5) is the unexpected behavior part. When re-fetching after the first time, the articles should not change in the CSS2RSS bandcamp feed and remain "Cloudward" & "Amaryllis".
(The 2 latest items of a CSS2RSS will always be fetched as timestamps of articles from these feeds are always based on when the feed is fetched.)
This unexpected behavior happens for every feed made with CSS2RSS (in my experience).

No. Let me explain in detail what happens:

1. You fetch feed for the first time, DB contains all articles. As the source feed does not provide date/time stamps for articles, RSS Guard automatically assigns current time BUT INCREMENTS THE TIME with 1 SECOND FOR EACH ARTICLE (so that do not actually have all exactly same time).

2. You apply cleaning logic (only keep 2 **NEWEST** articles - NEWEST is the keyword here).

3. You re-fetch for the first time -> all articles are already in the DB, no new article is fetched -> 2 newest articles are kept: Cloudward and Amarylis. The DB at this point contains ONLY these 2 articles, everything else is purged.

4. You now-refetch your feed, 2 articles are already in DB but 18 articles are not and they are now inserted into DB WITH CURRENT date/time autogenerated stamps. SO THESE 18 articles are now NEWER than those 2 (Cloudward and Amarylis).

5. Cleaning logic now automatically runs and removes all but 2 newest articles, which are now Beladona and Artlessly.

This behavior is result of several things:

1. Your feed does not provide stable date/time stamps for articles which is major PITA and really happens very rarely-with real-time feeds. You have to solve this issue by assigning artifical STABLE time/date to each article in your feed (or via article filter).

2. Your cleaning logic is too strict and purges articles which are still available in remote feed source (script in your case). The mindset here is that this feature is meant for purgin OLD ENOUGH articles which are not even present in online feed file anymore. Of course, if your cleaning logic purges some article it will be re-added on next feed fetch if it is still present in remote source and somehow does not suddenbly fall into "too old" state (in this case this was caused by point 1, but will likely happen VERY rarely in real-world use-cases).

Thanks for taking the time to test & explain this.

Indeed it's a specific use-case: limiting the article amount of CSS2RSS-based feeds without any date elements for the new Date selector to grab from when fetching the feed after feed creation/database clearing & optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status-Fixed Ticket is resolved. Type-Enhancement This is request for brand new feature.
Projects
None yet
Development

No branches or pull requests

4 participants