Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete records from views table with millions of records #300

Open
Rattone opened this issue Jun 7, 2024 · 3 comments
Open

Delete records from views table with millions of records #300

Rattone opened this issue Jun 7, 2024 · 3 comments

Comments

@Rattone
Copy link

Rattone commented Jun 7, 2024

In one project, +6 million views were reached in less than 2 years, resulting in a table over 1GB.
What is the best way to store, for example in some table field, visits older than a year and then delete them?

@Rattone Rattone changed the title Delete records from table with millions of records Delete records from views table with millions of records Jun 7, 2024
@cyrildewit
Copy link
Owner

Hey @Rattone,

Wow that's an impressive amount of views. Approximately 8.000 views per day (if not mistaken).

When I created this package I didn't had real use-case and therefore never experienced performance issues on a large scale. I did think about this problem for some time, but never persuaded with implementing strategies to decrease the views table after a period of time. As you already suggest, after a period of time we could remove the views and store the total number for the desired resolutions: per day, per month etc. This would of course decrease the precision of the views counts, but that's the accepted trade off.

This package does not provide a solution out of the box, but you could defintely implement a pruning strategy yourself. The complexity will probably lay in making sure that the views count still make sense after combining the archived views count and the actual count.

What ideas do you have on your mind? You know your application context better than I do. If you are not using the views count query options extensively, then the solution to your problem could be easier than what I'm thinking 😄

@Rattone
Copy link
Author

Rattone commented Jun 10, 2024

@cyrildewit I did some checks today, many of the accesses are bots, about half I think. In the last period, the number of records tracked compared to those marked on Google Analytics has increased.
I haven't investigated these bots in depth but there are days when in a few minutes I receive hundreds of visits on different posts from the same user, with even 10 posts opened in the same second.

As for the prune strategy, I solved with a UPDATE SELECT query to update an "archived_views" field with COUNT(DISTINCT(visitor)) and then a DELETE query.
I only keep the details of the last 6 months, so I might need some statistics. For total_unique_views I have a scheduled task that adds archived_views and the number of unique visits from the views table. This way the table remains under 1GB :-D

@Rattone
Copy link
Author

Rattone commented Jun 10, 2024

an example
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants