Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comic volume removal for same series name. #2573

Closed
Hugo-Matias opened this issue Jan 7, 2024 · 14 comments
Closed

Comic volume removal for same series name. #2573

Hugo-Matias opened this issue Jan 7, 2024 · 14 comments

Comments

@Hugo-Matias
Copy link

I noticed that one of my series had it's volume removed even though I have a properly populated Comicinfo.xml file for every issue.

There are 2 distinct volumes in this case 2021 and 2023 and on library creation scan it would only display the series with volume 2023 as it was the latest to be picked up.

Re-scanning would insert back the 2021 volume issues but would remove 2023's, this would switch back and forth with each scan.

I debugged the code locally to see in-depth what was going on and reached the following logic on the ProcessSeries task.

https://github.com/Kareadita/Kavita/blob/51406ddbd28bdc3b393e9b89cec6f3e55801292a/API/Services/Tasks/Scanner/ProcessSeries.cs#L563C23-L582

I'm still not sure what is the goal for this logic but I'm guessing that it makes more sense for manga, not so much for comics as there are several series with recurring names throughout the years.

Can we perhaps set a condition where this runs only if the library is of a certain type? An alternative would be to toggle this by a setting switch on the user profile.

For now I'll comment it out and see how the application behaves for comics.

@DieselTech
Copy link
Collaborator

Sounds like it's related to #2542

@Hugo-Matias
Copy link
Author

On a general level I would say that it is, although this problem mainly affects Series with existing Volumes, not special issues and I'm not sure if it's exclusively related to Comics.

Having more insight on why this logic was created would clarify the scan flow and possibly allow for a more customized implementation.

@majora2007
Copy link
Member

Can you give me some filenames and folder layout so I can get a better understanding of what exactly is happening and the underlying issue?

@Hugo-Matias
Copy link
Author

Hugo-Matias commented Jan 7, 2024

To provide a bit more context into the topic. I created a test library consisting of just 2 files, the first issues for Superman (1939) and Superman (2023). The goal is to create a single series (Superman) with 2 Volumes (1939 and 2023).

In the current state, the code will do the following.

  • Scan "Superman (1939)" folder, adding the "Superman" Series, the "1939" Volume and the # 1 Issue with file: Superman Vol.1939 # 01 (June, 1939).cbz
  • Scan "Superman (2023)" folder, adding "2023" Volume and the # 1 issue with file: Superman Vol.2023 # 01 (April, 2023).cbz
  • Notices that there is an existing Volume ("1939") that doesn't match the one currently being scanned ("2023"), it gets deleted.
    This is what the piece of code I linked above does and it will log out something like the following:

[Debug] API.Services.Tasks.Scanner.ProcessSeries [ScannerService] Removed 1 volumes from Superman where parsed infos were not mapping with volume name
[Information] API.Services.Tasks.Scanner.ProcessSeries [ScannerService] Volume cleanup code was trying to remove a volume with a file still existing on disk. File: \path\to\file\Superman Vol.2023 #1 (April, 2023).cbz
[Debug] API.Services.Tasks.Scanner.ProcessSeries [ScannerService] Removed Superman - Volume 2023: \path\to\file\Superman Vol.2023 #1 (April, 2023).cbz

The UI will display this, notice a single volume is present:

1

If I trigger a Library Scan, it will do the same steps but since the existing library is 2023 from before, it will do the same but the opposite, keeping Volume 1939:

Captura de ecrã 2024-01-07 191404

Commenting out those lines of code on my local dev setup and it seems to be behaving as expected now.

Captura de ecrã 2024-01-07 191706

I don't know what implications this might have in the context of a Manga library for instance but for Comics it's a problem that might be pushing users away from this application.

Since we have access to the library object in the method that calls UpdateVolumes(), perhaps we could add another argument for the library type and bypass that logic if it's a 1 (Comics)? Adding a bool to the library settings could also work.

UpdateVolumes(series, parsedInfos, forceUpdate);

For copyright reasons I can't provide the cbz files but I can share the comicinfo.xml, I scrape my library on ComicRack with the ComicVine plugin.

Test.zip

@charles7668
Copy link
Contributor

Superman 
│
├─Superman (1939)
│      ComicInfo.xml
│      Superman Vol.1939 # 01 (June, 1939).cbz
│
└─Superman (2023)
        ComicInfo.xml
        Superman Vol.2023 #1 (April, 2023).cbz  

This folder structure will yield the same result as you expect

@Hugo-Matias
Copy link
Author

Another approach would be to add the Volume to the Series name by default, like "Superman (1939), Superman (2023), etc.". This would create a distinct Series for each volume with it's own metadata. The benefit is that Issues wouldn't get merged together in the Issues tab as it's happening now as well.

This would render the series Volumes tab useless as there would always be a single one but the pros outweigh the cons in my opinion.

What about creating a setting to enable this logic? I think it would be a great addition for users that use Kavita mainly for comics, as that's a common way to display libraries on other applications.

@Hugo-Matias
Copy link
Author

This folder structure will yield the same result as you expect

That's essentially the folder structure of the Test library I'm working with and it doesn't seem to work.

@charles7668
Copy link
Contributor

This folder structure will yield the same result as you expect

That's essentially the folder structure of the Test library I'm working with and it doesn't seem to work.

Scanfolder
└─Superman
    ├─Superman (1939)
    │      ComicInfo.xml
    │      Superman Vol.1939 # 01 (June, 1939).cbz
    │
    └─Superman (2023)
            ComicInfo.xml
            Superman Vol.2023 #1 (April, 2023).cbz

i think the problem is your scan folder is 'Superman' ? , i try this structure 'Superman' folder is under 'Scanfolder'

@Hugo-Matias
Copy link
Author

I actually noticed the issue on a structure like that as that's how my Library is set up.

Library Root/Publisher/Series (Volume)/Issue Vol.----- #-- (Month, Year).cbz

I'll keep trying more files and see if some work as expected but stepping through the code with breakpoints I don't see how it would behave differently. I just setup the repo locally so I'm not familiar with the codebase yet but I don't think filename and folder structure matters much in this case as the metadata is being parsed from the ComicInfo.xml anyway.

@Turtlepoker
Copy link

I've has similar issue in past with series. I found the simplest workaround for this was using Series Names> Volume Name>Issues
Scanfolder
└─Superman
├─Volume 1939
│ ComicInfo.xml
│ Superman Vol.1939 # 01 (June, 1939).cbz

└─Volume 2023
ComicInfo.xml
Superman Vol.2023 #1 (April, 2023).cbz

@halszkaraptor
Copy link

I had a very similar issue related to this that felt most appropriate to add here instead of opening a new GitHub issue.

This same block of code is causing me some issues with a few Star Wars comics. In the series High Republic Adventures, there is a 2021 volume published by IDW. Then Dark Horse published 2022 and 2023 volumes of this series. The 2022 and 2023 volumes are being removed and the message "Removed 2 volumes from Star Wars: The High Republic Adventures where parsed infos were not mapping with volume name" is output from the block of code shared above.

To solve this, I am going to add the year to the series name as I don't really want to split the comics up by publishers into separate libraries.

Mildly frustrating because my library structure is the way it is based on the Kavita docs Mylar page (https://wiki.kavitareader.com/en/guides/misc/mylar) and it was unclear why these were just disappearing from the library after the initial scan. I had to dive into the log file. Everything is also matched correctly on ComicVine. This is just the way things were published unfortunately. It would be nice if the parser could handle this as it is a legitimate publishing case. Apologies of course if I am missing anything, anything is unclear or this is not the most appropriate place to share this. I'm decently new to the comic ecosystem and Kavita. I'd also be delighted to share any additional information that would be helpful.

@DieselTech
Copy link
Collaborator

@halszkaraptor

It would be nice if the parser could handle this

Work is being done to handle the parsing of comic series, that is what the initial link is pointing to: #2542 and #2714 has been opened to track the changes.

Unfortunately, it requires a lot of the core scanning code and the way Kavita handles series to be re-written. The entire data model of Kavita needs to be re-written to handle this.

To solve this, I am going to add the year to the series name as I don't really want to split the comics up by publishers into separate libraries.

Honestly I know it sucks but I would wait before making changes to your library. We are marching closer and closer to having western comics be a 1st class citizen in Kavita. If you rename your files then you would break the naming convention used in the (external) reading lists. That would stop you from using the community made lists because your series name would be different.

@projectje
Copy link

projectje commented May 19, 2024

  • changing the series name to include volume or year breaks some things
  • doing not so only shows 1 volume

So... requires a python script that:

  • goes through every file in your collection,
  • updates the series name in the ComicInfo.xml with the extension "(space)(-)Volume N" where N is from the field Volume (hmm maybe something more explicit)

Then when this is fixed.

  • Run the same script but remove all (space)(-)Volume N in all ComicInfo.Xml files to bring the ComicInfo.XML again to its original state.

Following works for me:


from ComicInfo import ComicInfo

import shutil
import xml.dom.minidom
import xml.etree.ElementTree as ET
import zipfile
from tempfile import TemporaryFile

class _ComicFile:



    def __init__(self,has_existing_metadata=False,
                 metadatafilename="ComicInfo.xml",
                  archive_path="c:\temp",
                   metadata="test", *args, **kwargs):

            self.has_existing_metadata = has_existing_metadata
            self.metadatafilename = metadatafilename
            self.archive_path = archive_path
            self.metadata = metadata

    def archive_contains_comicinfo(self):
        with zipfile.ZipFile(self.archive_path, 'r') as zip_ref:
            if self.metadatafilename in zip_ref.namelist():
                self.has_existing_metadata = True
            else:
                self.has_existing_metadata = False

    def process_comic_metadata(self):
        # TEMPORARY WORKAROUND TO ADD Volume to Series Name:
        if (self.metadata.volume is not None):
          self.metadata.series = self.metadata.series + " - Volume " + str(self.metadata.volume)

        if self.has_existing_metadata:
            # Create a temporary file to store the new zip file
            with TemporaryFile() as tmpfile:
                with zipfile.ZipFile(tmpfile, 'w') as new_zip:
                    with zipfile.ZipFile(self.archive_path, 'r') as old_zip:
                        for item in old_zip.infolist():
                            data = old_zip.read(item.filename)
                            if item.filename == self.metadatafilename:
                                root = ET.fromstring(data.decode('utf-8'))
                                # Series
                                if root.find('Series') is not None:
                                    # Element exists, update its text
                                    root.find('Series').text = self.metadata.series
                                else:
                                    # Element does not exist, create a new one
                                    elem = ET.SubElement(root, 'Series')
                                    elem.text = self.metadata.series

                                # Volume
                                if root.find('Volume') is not None:
                                    # Element exists, update its text
                                    root.find('Volume').text = str(self.metadata.volume)
                                else:
                                    # Element does not exist, create a new one
                                    elem = ET.SubElement(root, 'Volume')
                                    elem.text = str(self.metadata.volume)

                                # Number
                                if root.find('Number') is not None:
                                    # Element exists, update its text
                                    root.find('Number').text = str(self.metadata.number)
                                else:
                                    # Element does not exist, create a new one
                                    elem = ET.SubElement(root, 'Number')
                                    elem.text = str(self.metadata.number)
                                data = ET.tostring(root, encoding='utf-8')
                                dom = xml.dom.minidom.parseString(data)
                                data = dom.toprettyxml()
                            new_zip.writestr(item, data)
                # Replace the old zip file with the new one
                print('updating comicinfo in ' + self.archive_path)
                shutil.move(tmpfile.name, self.archive_path)
        else:
            root = ET.Element('ComicInfo')
            series = ET.SubElement(root, 'Series')
            series.text = self.metadata.series
            volume = ET.SubElement(root, 'Volume')
            volume.text = str(self.metadata.volume)
            number = ET.SubElement(root, 'Number')
            number.text = str(self.metadata.number)

            # Convert the XML to bytes
            xml_string = ET.tostring(root, encoding='utf-8').decode('utf-8')
            with zipfile.ZipFile(self.archive_path, 'a') as zip_ref:
                # Write the new XML file to the archive
                print('writing new comicinfo to ' + self.archive_path)
                zip_ref.writestr(self.metadatafilename, xml_string)`

@majora2007
Copy link
Member

I believe the delivered Comic Vine library type solves this issue released with v0.8.0. Please let me know if not.

@majora2007 majora2007 moved this to Done in Backlog Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

7 participants