Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed libmagic bug #490

Merged
merged 2 commits into from
Jun 14, 2023
Merged

Fixed libmagic bug #490

merged 2 commits into from
Jun 14, 2023

Conversation

sinha-toyeesh
Copy link
Contributor

Referring to issue #206 , Engrampa uses Libmagic only if extension is not known, otherwise it determines mime type from the file extension, even though the --enable-magic flag is used, along with libmagic support.
This pull request adds an if condition to use libmagic if --enable-magic if used.

Copy link
Member

@zhuyaliang zhuyaliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to merge

Copy link
Member

@lukefromdc lukefromdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This BROKE opening of one of my Changelog.Debian.gz files (which I edit a lot in packing changelogs into .debs) which contain a single ChangeLog.Debian text file

Test file included, this one had some minor issues as a Debian changelog but engrampa never had trouble with this file or any similar file without this PR applied

Changelog.Debian.gz

@sinha-toyeesh
Copy link
Contributor Author

sinha-toyeesh commented Jun 12, 2023

Could this be a local issue ? it seems to work on my machine.
2023-06-13_04-35

Verification would be appreciated

@lukefromdc
Copy link
Member

lukefromdc commented Jun 12, 2023 via email

@zhuyaliang
Copy link
Member

@lukefromdc Could you show me the error message

@lukefromdc
Copy link
Member

This is what I got on clicking on any of those Changelog.Debian.gz files, or trying to open them from terminal in engrampa:

Engrampa_Error

@zhuyaliang
Copy link
Member

@lukefromdc Can you help me run commands from the terminal command line and view the output

mkdir /tmp/test-engrampa 
cp -rf Changelog.Debian.gz /tmp/test-engrampa/
cd /tmp/test-engrampa
engrampa -h  Changelog.Debian.gz

src/fr-archive.c Outdated Show resolved Hide resolved
@cwendling
Copy link
Member

Actually I can reproduce @lukefromdc's problem. And looking into it is that libmagic returns application/gzip, whereas the builtin lookup returns application/x-gzip.

IMO, there's 2 things to do here:

  1. do as @zhuyaliang suggests in the patch comments, his changes work (and make sense: use the fallback but the other way around)
  2. as a real solution, stop using strcmp/sncasecmp to match MIME types, but either do it manually and accept CATEGORY/["x-"]NAME, or probably smarter, use g_content_type_equals().

I am not sure of all the consequences, but these changes fixes it for me:

diff --git a/src/file-utils.c b/src/file-utils.c
index 4983a7b..6d4f328 100644
--- a/src/file-utils.c
+++ b/src/file-utils.c
@@ -555,7 +555,7 @@ gboolean
 is_mime_type (const char *mime_type,
              const char *pattern)
 {
-       return (strcasecmp (mime_type, pattern) == 0);
+       return g_content_type_equals (mime_type, pattern);
 }
 
 const char*
diff --git a/src/fr-init.c b/src/fr-init.c
index c018bfa..c7a876d 100644
--- a/src/fr-init.c
+++ b/src/fr-init.c
@@ -292,7 +292,7 @@ fr_registered_command_get_capabilities (FrRegisteredCommand *reg_com,
                FrMimeTypeCap *cap;
 
                cap = g_ptr_array_index (reg_com->caps, i);
-               if (strcmp (mime_type, cap->mime_type) == 0)
+               if (is_mime_type (mime_type, cap->mime_type))
                        return cap->current_capabilities;
        }
 

Additionally, I'd think we'd need this (but it's not required to open the file)

diff --git a/src/fr-init.c b/src/fr-init.c
index c018bfa..c7a876d 100644
--- a/src/fr-init.c
+++ b/src/fr-init.c
@@ -312,7 +312,7 @@ fr_registered_command_get_potential_capabilities (FrRegisteredCommand *reg_com,
                FrMimeTypeCap *cap;
 
                cap = g_ptr_array_index (reg_com->caps, i);
-               if ((cap->mime_type != NULL) && (strcmp (mime_type, cap->mime_type) == 0))
+               if ((cap->mime_type != NULL) && (is_mime_type (mime_type, cap->mime_type)))
                        return cap->potential_capabilities;
        }
 
@@ -508,7 +508,7 @@ get_mime_type_index (const char *mime_type)
        int i;
 
        for (i = 0; mime_type_desc[i].mime_type != NULL; i++)
-               if (strcmp (mime_type_desc[i].mime_type, mime_type) == 0)
+               if (is_mime_type (mime_type_desc[i].mime_type, mime_type))
                        return i;
        return -1;
 }

@sinha-toyeesh
Copy link
Contributor Author

Thanks @cwendling will take a look at this and update the PR

@lukefromdc lukefromdc self-requested a review June 14, 2023 03:35
Copy link
Member

@lukefromdc lukefromdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now works fine on my end after the latest changes, no more problem with gzipped Debian changelog files.

@lukefromdc lukefromdc merged commit 8e33e60 into mate-desktop:master Jun 14, 2023
@cwendling
Copy link
Member

@lukefromdc thanks for testing, but I think merging was a tad too fast here :) No need to revert though, but I think we might need to look at this a tad more carefully.

@sinha-toyeesh did you check all the implications of my suggested changes? As I mentioned, I'm not sure if this has side effects, like how it handles MIME aliases. I'm not sure it poses any problem, but as there is special handling for them, it might be worth checking a bit what's supposed to happen, and whether my suggestions interferes or not.

Also, we should really integrate @zhuyaliang's suggested change, because right now when using libmagic the logic is like if (!A && !B && !A) fail(), which doesn't make sense, A has no chance of working the second time. OTOH, the non-libmagic path does if (!A && !B && !C) fail(), which makes sense, and @zhuyaliang suggested just swapping A and C in the libmagic path, making it if (!C && !B && !A) fail().

@lukefromdc
Copy link
Member

lukefromdc commented Jun 14, 2023

If this was a screwup, let's create a second PR with the cleanups in it and get this done right. How the binaries put out by distros work for end users is more important than how neat the changelogs look.

We could really use more testers here, as all too often PR's don't get reviewed or there are not enough testers to find the corner cases. Years ago, GNOME's development versions were distributed with the alpha versions of Ubuntu, I don't know if rolling releases like Debian Sid also distributed them. Ideally that would be resumed as it would make bugs getting all the way to releases in final versions of distros much less likely.

@sinha-toyeesh
Copy link
Contributor Author

@cwendling I made the recommended changes as suggested by @zhuyaliang, as I too realised the fallacy of checking for the same thing twice in a single function, which seemed a tad bit wierd to me.
Regarding testing, as I mentioned beforehand, I couldn't really recreate the bug mentioned by @lukefromdc, so it was out of the question on my end.

@sinha-toyeesh
Copy link
Contributor Author

I wholeheartedly agree with @lukefromdc regarding both of his points, especially so on the topic of additional testers ;)

cwendling added a commit that referenced this pull request Jun 14, 2023
A blooper has been made there:

* if ENABLE_MIME is set, the intention was to try, in order: magic,
  content, filename; but it was made filename, content, magic (which
  was the same as before the changes);
* if ENABLE_MIME is not set, the intention was to try, in order:
  filename, content, magic; but it has been made magic, content, magic
  (notice the duplicate, and the missing "filename").

This probably doesn't change much in the wild as magic is gonna work
most of the time, but it's especially problematic that the non-libmagic
case doesn't have the filename test.

Anyway, fix this so the code is consistent, and we retain the behavior
for the non-libmagic case, and have the new expected one for the
libmagic case.
@cwendling
Copy link
Member

@cwendling I made the recommended changes as suggested by @zhuyaliang, as I too realised the fallacy of checking for the same thing twice in a single function, which seemed a tad bit wierd to me.

Unfortunately no, you didn't. Look at the changes again, you actually made a blunder and only changed the non-libmagic code path making it never try filename, rather than making the libmagic one prefer magics. See #491 for what I believe is the intended behavior.

Regarding testing, as I mentioned beforehand, I couldn't really recreate the bug mentioned by @lukefromdc, so it was out of the question on my end.

The libmagic case maybe (although it probably ultimately depends on the version of the magic database, for which @lukefromdc and I probably have a more recent version); but the question I raised is the other implications of the g_content_type_equals() usage and is_mime_type() generalization. To evaluate this, it "only" requires carefully looking at the code and seeing what a non-match leads to, and from there try and test those case, and see if they are relevant -- or made obsolete by the fact they we already matched the MIME type.

Anyhow, the changes I suggested seemed fine to me, but then again I didn't spend enough time evaluating the consequences. If we have enough testing, maybe it's enough? we'll see :)

@lukefromdc
Copy link
Member

The followup PR didn't create any problems on my end with a variety of files including the previous offenders

zhuyaliang pushed a commit that referenced this pull request Jun 15, 2023
A blooper has been made there:

* if ENABLE_MIME is set, the intention was to try, in order: magic,
  content, filename; but it was made filename, content, magic (which
  was the same as before the changes);
* if ENABLE_MIME is not set, the intention was to try, in order:
  filename, content, magic; but it has been made magic, content, magic
  (notice the duplicate, and the missing "filename").

This probably doesn't change much in the wild as magic is gonna work
most of the time, but it's especially problematic that the non-libmagic
case doesn't have the filename test.

Anyway, fix this so the code is consistent, and we retain the behavior
for the non-libmagic case, and have the new expected one for the
libmagic case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants