Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about using "guessed_protocol_id" #1957

Closed
vel21ripn opened this issue Apr 28, 2023 · 4 comments
Closed

Question about using "guessed_protocol_id" #1957

vel21ripn opened this issue Apr 28, 2023 · 4 comments
Labels

Comments

@vel21ripn
Copy link
Contributor

Consider a fairly typical option: a tcp connection to port 443.
(This is a very frequent situation for the Telegram protocol)

In this case "guessed_protocol" would be "TLS". Based on the ip address was determined "guessed_protocol_id_by_ip=Telegram".

On the first packet with payload, we determine that it is not "TLS" and it is excluded from further checks.
If the parsing of the tcp stream data failed, then the final result will be "confidence: match by ip" and "proto: TLS.Telegram", which is actually not correct, since "TLS" is missing in this case.

As a solution, you can offer this option:
do not result in "guessed_protocol" if it was excluded (flow->excluded_protocol_bitmask bitmap).

And another question: is the ndpi_detection_giveup() function in the ndpi_main.c file

  if((flow->guessed_protocol_id_by_ip != NDPI_PROTOCOL_UNKNOWN)
     && ((ret.app_protocol == NDPI_PROTOCOL_UNKNOWN) || (ret.master_protocol == NDPI_PROTOCOL_UNKNOWN))) {

    if(ret.app_protocol == NDPI_PROTOCOL_UNKNOWN)
      ndpi_int_change_protocol(ndpi_str, flow,
                               flow->guessed_protocol_id_by_ip, ret.master_protocol,
                               NDPI_CONFIDENCE_MATCH_BY_IP);
    else
      /* master_protocol == NDPI_PROTOCOL_UNKNOWN) */
      ndpi_int_change_protocol(ndpi_str, flow,
                               flow->guessed_protocol_id_by_ip, ret.app_protocol,
                               NDPI_CONFIDENCE_DPI_PARTIAL);

Why is NDPI_CONFIDENCE_DPI_PARTIAL used instead of NDPI_CONFIDENCE_MATCH_BY_IP?

@vel21ripn
Copy link
Contributor Author

@IvanNardi Please look at the traffic examples from the directory "utils/tcp_check_seq/samples_guessed_protocol" in the repository "vel21ripn/nDPI" branch "tcp_ack_padding". There are full debug output dumps (-V 4).

Note that the TLS protocol is excluded after the first packet containing data has been processed.

It is possible that there is a typo in ndpi_detection_giveup(). NDPI_CONFIDENCE_DPI_PARTIAL is specified instead of NDPI_CONFIDENCE_MATCH_BY_IP.

@IvanNardi
Copy link
Collaborator

@vel21ripn , I will try to take a look next week

IvanNardi added a commit to IvanNardi/nDPI that referenced this issue May 16, 2023
Return the "classification-by-ip" as protocol results only if no other
results are available.
In particular, never return something like
"protocol_by_port/protocol_by_ip" (i.e. `NTP/Apple`,
BitTorrent/GoogleCloud`, `Zoom/AWS`) because this kind of classification
is quite confusing, if not plainly wrong.

Notes:
* the information about "classification-by-ip" is always available, so
no information is lost with this change;
* in the unit tests, the previous classifications with confidence
`NDPI_CONFIDENCE_DPI_PARTIAL` were wrong, as noted in ntop#1957
@IvanNardi
Copy link
Collaborator

IvanNardi commented May 16, 2023

@vel21ripn, sorry for the delay.
I am going to push a PR that will change that code. In particular, the part about NDPI_CONFIDENCE_MATCH_BY_IP issue will be completely removed; that value of confidence was wrong anyway, just as you noticed.

About the first issue, let me use the same example that you reported: a Telegram connection using TCP over 443. If we are not able to properly classify it, the result TLS with confidence NDPI_CONFIDENCE_MATCH_BY_PORT seems right (with TELEGRAM in the guessed_protocol_id_by_ip field) [after my PR has been merged]

The logic that you are suggesting make sense only when nDPI is always able to classify a protocol, even if only mid-flow is provided: that is possible only for some UDP protocols (SNMP, NETFLOW, maybe DNS...). Please note that this logic is already implemented: see is_udp_not_guessable_protocol().
But we can not safely extend it to TCP/TLS: even if we excluded it, the session might still be TLS, because the dissector recognize it only with the handshake or with packets starting at the TLS Application Block boundary, and that is not always the case.

Please, take a look at #1981

lucaderi pushed a commit that referenced this issue May 17, 2023
Return the "classification-by-ip" as protocol results only if no other
results are available.
In particular, never return something like
"protocol_by_port/protocol_by_ip" (i.e. `NTP/Apple`,
BitTorrent/GoogleCloud`, `Zoom/AWS`) because this kind of classification
is quite confusing, if not plainly wrong.

Notes:
* the information about "classification-by-ip" is always available, so
no information is lost with this change;
* in the unit tests, the previous classifications with confidence
`NDPI_CONFIDENCE_DPI_PARTIAL` were wrong, as noted in #1957
@vel21ripn
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants