Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple logs related to the reverse DNS map #542

Closed
jovimon opened this issue Aug 20, 2024 · 3 comments
Closed

Multiple logs related to the reverse DNS map #542

jovimon opened this issue Aug 20, 2024 · 3 comments

Comments

@jovimon
Copy link

jovimon commented Aug 20, 2024

I upgraded to version 8.12.0 and when trying to import lots of repots, I'm seeing lots of log entries like this:

INFO - [utils.py:340] - Loading included reverse DNS map...

Turning debug on, they always seem to be placed around these log entries:

DEBUG:utils.py:422:IP address AAA.BBB.CCC.DDD reverse_dns not found
...
DEBUG:utils.py:420:IP address XXX.YYY.ZZZ.VVV added to cache

As per the code, it seems like it loads every time the "base_reverse_dns_map.csv" file and makes a lookup.

Wouldn't it be better for the file to be loaded once and make all the lookups over the same instance?

I'm sorry but my programming knowledge isn't enough for me to know how to solve this.

Thank you very much.

@N4v41
Copy link
Contributor

N4v41 commented Aug 29, 2024

Indeed, I've noticed that the program loads the reverse DNS map every time it runs the function: get_service_from_reverse_dns_base_domain. Additionally, when the log level is set to info, this log is generated with each query. This has caused a slight slowdown when processing aggregated reports with 70k to 80k entries, predominantly from Google and Microsoft, extending the parsing time of such a report to approximately 12 to 15 minutes.

As @seanthegeek mentioned in this comment on issue 500, the purpose is to keep the map updated. However, I believe that isolating this function and loading the map just once, then using a timedelta to refresh the DNS map if it's older than one day, could significantly improve parsing performance. This change would also reduce the frequency of file loading, thus decreasing disk IO if the map is stored locally, and lessen the number of times the map is retrieved from GitHub.

@seanthegeek
Copy link
Contributor

My intent was to only have the file procesed at startup. I'll revisit the code on Monday.

seanthegeek added a commit that referenced this issue Sep 3, 2024
- Fix processing of SMTP-TLS reports (#549)
- Skip invalid aggregate report rows without calling the whole report invalid
  - Some providers such as GoDaddy will send reports with some rows missing a source IP address, while other rows are fine
- Fix Dovecot support by using the seperator provided by the IPMAP namespace when possible (PR #552 closes #551)
- Only download `base_reverse_dns_map.csv` once (fixes #542)
- Update included `base_reverse_dns_map.csv`
  - Replace University category with Education to be more inclusive
- Update included `dbip-country-lite.mmdb`
@jovimon
Copy link
Author

jovimon commented Sep 9, 2024

thank you very much @N4v41 for the additional research and @seanthegeek for the patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants