Comic volume removal for same series name. #2573

Hugo-Matias · 2024-01-07T18:29:23Z

I noticed that one of my series had it's volume removed even though I have a properly populated Comicinfo.xml file for every issue.

There are 2 distinct volumes in this case 2021 and 2023 and on library creation scan it would only display the series with volume 2023 as it was the latest to be picked up.

Re-scanning would insert back the 2021 volume issues but would remove 2023's, this would switch back and forth with each scan.

I debugged the code locally to see in-depth what was going on and reached the following logic on the ProcessSeries task.

https://github.com/Kareadita/Kavita/blob/51406ddbd28bdc3b393e9b89cec6f3e55801292a/API/Services/Tasks/Scanner/ProcessSeries.cs#L563C23-L582

I'm still not sure what is the goal for this logic but I'm guessing that it makes more sense for manga, not so much for comics as there are several series with recurring names throughout the years.

Can we perhaps set a condition where this runs only if the library is of a certain type? An alternative would be to toggle this by a setting switch on the user profile.

For now I'll comment it out and see how the application behaves for comics.

DieselTech · 2024-01-07T18:33:03Z

Sounds like it's related to #2542

Hugo-Matias · 2024-01-07T18:50:30Z

On a general level I would say that it is, although this problem mainly affects Series with existing Volumes, not special issues and I'm not sure if it's exclusively related to Comics.

Having more insight on why this logic was created would clarify the scan flow and possibly allow for a more customized implementation.

majora2007 · 2024-01-07T19:32:09Z

Can you give me some filenames and folder layout so I can get a better understanding of what exactly is happening and the underlying issue?

Hugo-Matias · 2024-01-07T19:50:43Z

To provide a bit more context into the topic. I created a test library consisting of just 2 files, the first issues for Superman (1939) and Superman (2023). The goal is to create a single series (Superman) with 2 Volumes (1939 and 2023).

In the current state, the code will do the following.

Scan "Superman (1939)" folder, adding the "Superman" Series, the "1939" Volume and the # 1 Issue with file: Superman Vol.1939 # 01 (June, 1939).cbz
Scan "Superman (2023)" folder, adding "2023" Volume and the # 1 issue with file: Superman Vol.2023 # 01 (April, 2023).cbz
Notices that there is an existing Volume ("1939") that doesn't match the one currently being scanned ("2023"), it gets deleted.
This is what the piece of code I linked above does and it will log out something like the following:

[Debug] API.Services.Tasks.Scanner.ProcessSeries [ScannerService] Removed 1 volumes from Superman where parsed infos were not mapping with volume name
[Information] API.Services.Tasks.Scanner.ProcessSeries [ScannerService] Volume cleanup code was trying to remove a volume with a file still existing on disk. File: \path\to\file\Superman Vol.2023 #1 (April, 2023).cbz
[Debug] API.Services.Tasks.Scanner.ProcessSeries [ScannerService] Removed Superman - Volume 2023: \path\to\file\Superman Vol.2023 #1 (April, 2023).cbz

The UI will display this, notice a single volume is present:

If I trigger a Library Scan, it will do the same steps but since the existing library is 2023 from before, it will do the same but the opposite, keeping Volume 1939:

Commenting out those lines of code on my local dev setup and it seems to be behaving as expected now.

I don't know what implications this might have in the context of a Manga library for instance but for Comics it's a problem that might be pushing users away from this application.

Since we have access to the library object in the method that calls UpdateVolumes(), perhaps we could add another argument for the library type and bypass that logic if it's a 1 (Comics)? Adding a bool to the library settings could also work.

Kavita/API/Services/Tasks/Scanner/ProcessSeries.cs

Line 150 in 51406dd

UpdateVolumes(series, parsedInfos, forceUpdate);

For copyright reasons I can't provide the cbz files but I can share the comicinfo.xml, I scrape my library on ComicRack with the ComicVine plugin.

Test.zip

charles7668 · 2024-01-07T20:04:06Z

Superman 
│
├─Superman (1939)
│      ComicInfo.xml
│      Superman Vol.1939 # 01 (June, 1939).cbz
│
└─Superman (2023)
        ComicInfo.xml
        Superman Vol.2023 #1 (April, 2023).cbz

This folder structure will yield the same result as you expect

Hugo-Matias · 2024-01-07T20:07:40Z

Another approach would be to add the Volume to the Series name by default, like "Superman (1939), Superman (2023), etc.". This would create a distinct Series for each volume with it's own metadata. The benefit is that Issues wouldn't get merged together in the Issues tab as it's happening now as well.

This would render the series Volumes tab useless as there would always be a single one but the pros outweigh the cons in my opinion.

What about creating a setting to enable this logic? I think it would be a great addition for users that use Kavita mainly for comics, as that's a common way to display libraries on other applications.

Hugo-Matias · 2024-01-07T20:09:49Z

This folder structure will yield the same result as you expect

That's essentially the folder structure of the Test library I'm working with and it doesn't seem to work.

charles7668 · 2024-01-07T20:15:05Z

This folder structure will yield the same result as you expect

That's essentially the folder structure of the Test library I'm working with and it doesn't seem to work.

Scanfolder
└─Superman
    ├─Superman (1939)
    │      ComicInfo.xml
    │      Superman Vol.1939 # 01 (June, 1939).cbz
    │
    └─Superman (2023)
            ComicInfo.xml
            Superman Vol.2023 #1 (April, 2023).cbz

i think the problem is your scan folder is 'Superman' ? , i try this structure 'Superman' folder is under 'Scanfolder'

Hugo-Matias · 2024-01-07T20:54:38Z

I actually noticed the issue on a structure like that as that's how my Library is set up.

Library Root/Publisher/Series (Volume)/Issue Vol.----- #-- (Month, Year).cbz

I'll keep trying more files and see if some work as expected but stepping through the code with breakpoints I don't see how it would behave differently. I just setup the repo locally so I'm not familiar with the codebase yet but I don't think filename and folder structure matters much in this case as the metadata is being parsed from the ComicInfo.xml anyway.

Turtlepoker · 2024-01-12T04:12:43Z

I've has similar issue in past with series. I found the simplest workaround for this was using Series Names> Volume Name>Issues
Scanfolder
└─Superman
├─Volume 1939
│ ComicInfo.xml
│ Superman Vol.1939 # 01 (June, 1939).cbz
│
└─Volume 2023
ComicInfo.xml
Superman Vol.2023 #1 (April, 2023).cbz

halszkaraptor · 2024-02-17T05:17:52Z

I had a very similar issue related to this that felt most appropriate to add here instead of opening a new GitHub issue.

This same block of code is causing me some issues with a few Star Wars comics. In the series High Republic Adventures, there is a 2021 volume published by IDW. Then Dark Horse published 2022 and 2023 volumes of this series. The 2022 and 2023 volumes are being removed and the message "Removed 2 volumes from Star Wars: The High Republic Adventures where parsed infos were not mapping with volume name" is output from the block of code shared above.

To solve this, I am going to add the year to the series name as I don't really want to split the comics up by publishers into separate libraries.

Mildly frustrating because my library structure is the way it is based on the Kavita docs Mylar page (https://wiki.kavitareader.com/en/guides/misc/mylar) and it was unclear why these were just disappearing from the library after the initial scan. I had to dive into the log file. Everything is also matched correctly on ComicVine. This is just the way things were published unfortunately. It would be nice if the parser could handle this as it is a legitimate publishing case. Apologies of course if I am missing anything, anything is unclear or this is not the most appropriate place to share this. I'm decently new to the comic ecosystem and Kavita. I'd also be delighted to share any additional information that would be helpful.

DieselTech · 2024-02-19T20:46:04Z

@halszkaraptor

It would be nice if the parser could handle this

Work is being done to handle the parsing of comic series, that is what the initial link is pointing to: #2542 and #2714 has been opened to track the changes.

Unfortunately, it requires a lot of the core scanning code and the way Kavita handles series to be re-written. The entire data model of Kavita needs to be re-written to handle this.

To solve this, I am going to add the year to the series name as I don't really want to split the comics up by publishers into separate libraries.

Honestly I know it sucks but I would wait before making changes to your library. We are marching closer and closer to having western comics be a 1st class citizen in Kavita. If you rename your files then you would break the naming convention used in the (external) reading lists. That would stop you from using the community made lists because your series name would be different.

projectje · 2024-05-19T02:05:39Z

changing the series name to include volume or year breaks some things
doing not so only shows 1 volume

So... requires a python script that:

goes through every file in your collection,
updates the series name in the ComicInfo.xml with the extension "(space)(-)Volume N" where N is from the field Volume (hmm maybe something more explicit)

Then when this is fixed.

Run the same script but remove all (space)(-)Volume N in all ComicInfo.Xml files to bring the ComicInfo.XML again to its original state.

Following works for me:

from ComicInfo import ComicInfo

import shutil
import xml.dom.minidom
import xml.etree.ElementTree as ET
import zipfile
from tempfile import TemporaryFile

class _ComicFile:



    def __init__(self,has_existing_metadata=False,
                 metadatafilename="ComicInfo.xml",
                  archive_path="c:\temp",
                   metadata="test", *args, **kwargs):

            self.has_existing_metadata = has_existing_metadata
            self.metadatafilename = metadatafilename
            self.archive_path = archive_path
            self.metadata = metadata

    def archive_contains_comicinfo(self):
        with zipfile.ZipFile(self.archive_path, 'r') as zip_ref:
            if self.metadatafilename in zip_ref.namelist():
                self.has_existing_metadata = True
            else:
                self.has_existing_metadata = False

    def process_comic_metadata(self):
        # TEMPORARY WORKAROUND TO ADD Volume to Series Name:
        if (self.metadata.volume is not None):
          self.metadata.series = self.metadata.series + " - Volume " + str(self.metadata.volume)

        if self.has_existing_metadata:
            # Create a temporary file to store the new zip file
            with TemporaryFile() as tmpfile:
                with zipfile.ZipFile(tmpfile, 'w') as new_zip:
                    with zipfile.ZipFile(self.archive_path, 'r') as old_zip:
                        for item in old_zip.infolist():
                            data = old_zip.read(item.filename)
                            if item.filename == self.metadatafilename:
                                root = ET.fromstring(data.decode('utf-8'))
                                # Series
                                if root.find('Series') is not None:
                                    # Element exists, update its text
                                    root.find('Series').text = self.metadata.series
                                else:
                                    # Element does not exist, create a new one
                                    elem = ET.SubElement(root, 'Series')
                                    elem.text = self.metadata.series

                                # Volume
                                if root.find('Volume') is not None:
                                    # Element exists, update its text
                                    root.find('Volume').text = str(self.metadata.volume)
                                else:
                                    # Element does not exist, create a new one
                                    elem = ET.SubElement(root, 'Volume')
                                    elem.text = str(self.metadata.volume)

                                # Number
                                if root.find('Number') is not None:
                                    # Element exists, update its text
                                    root.find('Number').text = str(self.metadata.number)
                                else:
                                    # Element does not exist, create a new one
                                    elem = ET.SubElement(root, 'Number')
                                    elem.text = str(self.metadata.number)
                                data = ET.tostring(root, encoding='utf-8')
                                dom = xml.dom.minidom.parseString(data)
                                data = dom.toprettyxml()
                            new_zip.writestr(item, data)
                # Replace the old zip file with the new one
                print('updating comicinfo in ' + self.archive_path)
                shutil.move(tmpfile.name, self.archive_path)
        else:
            root = ET.Element('ComicInfo')
            series = ET.SubElement(root, 'Series')
            series.text = self.metadata.series
            volume = ET.SubElement(root, 'Volume')
            volume.text = str(self.metadata.volume)
            number = ET.SubElement(root, 'Number')
            number.text = str(self.metadata.number)

            # Convert the XML to bytes
            xml_string = ET.tostring(root, encoding='utf-8').decode('utf-8')
            with zipfile.ZipFile(self.archive_path, 'a') as zip_ref:
                # Write the new XML file to the archive
                print('writing new comicinfo to ' + self.archive_path)
                zip_ref.writestr(self.metadatafilename, xml_string)`

majora2007 · 2024-07-09T19:07:19Z

I believe the delivered Comic Vine library type solves this issue released with v0.8.0. Please let me know if not.

majora2007 closed this as completed Jul 9, 2024

majora2007 added this to Backlog Aug 23, 2024

majora2007 moved this to Done in Backlog Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comic volume removal for same series name. #2573

Comic volume removal for same series name. #2573

Hugo-Matias commented Jan 7, 2024

DieselTech commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024

majora2007 commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024 •

edited

Loading

charles7668 commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024

charles7668 commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024

Turtlepoker commented Jan 12, 2024

halszkaraptor commented Feb 17, 2024

DieselTech commented Feb 19, 2024

projectje commented May 19, 2024 •

edited

Loading

majora2007 commented Jul 9, 2024

Comic volume removal for same series name. #2573

Comic volume removal for same series name. #2573

Comments

Hugo-Matias commented Jan 7, 2024

DieselTech commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024

majora2007 commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024 • edited Loading

charles7668 commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024

charles7668 commented Jan 7, 2024

Hugo-Matias commented Jan 7, 2024

Turtlepoker commented Jan 12, 2024

halszkaraptor commented Feb 17, 2024

DieselTech commented Feb 19, 2024

projectje commented May 19, 2024 • edited Loading

majora2007 commented Jul 9, 2024

Hugo-Matias commented Jan 7, 2024 •

edited

Loading

projectje commented May 19, 2024 •

edited

Loading