diff --git a/README.md b/README.md index 592332e31..a8e8fe902 100644 --- a/README.md +++ b/README.md @@ -49,8 +49,9 @@ TagStudio is a photo & file organization application with an underlying system t - Description, Notes (Multiline Text Fields) - Tags, Meta Tags, Content Tags (Tag Boxes) - Create rich tags composed of a name, a list of aliases, and a list of “subtags” - being tags in which these tags inherit values from. -- Search for entries based on tags, ~~metadata~~ (TBA), or filenames/filetypes (using `filename: `) -- Special search conditions for entries that are: `untagged`/`no tags` and `empty`/`no fields`. +- Search for entries based on tags, ~~metadata~~ (TBA), or filenames/filetypes (using `filename:`) +- Special search conditions for entries that are: `untagged`/`no_tags`, `empty`/`no_fields`, `no_author`/`no_artist`, and `missing`/`no_file`. +- Search for entries using Boolean expressions. (See the [search cheat-sheet](#search-cheat-sheet) section for more) > [!NOTE] > For more information on the project itself, please see the [FAQ](#faq) section as well as the [documentation](/doc/index.md). @@ -133,6 +134,65 @@ Inevitably, some of the files inside your library will be renamed, moved, or del Libraries are saved upon exiting the program. To manually save, select File -> Save Library from the menu bar. To save a backup of your library, select File -> Save Library Backup from the menu bar. +### Search Cheat-Sheet + +#### The Basics + +After loading a tagged library, enter your search into the bar that says `Search Entries` at the top of the window. Every tag needs a space after it. If your tag contains spaces between words, substitute underscores _ for the spaces. Capitalization doesn't affect tags and searches. After you have typed your search, press enter or click the `Search` button to the right of the search bar. It may take a moment for the search to complete. After you have made a few searches, you can use the arrows `<` `>` on the left hand side of the search bar to bring back previous searches. +- **dog favorites** searches favorites for any entry tagged "dog" +- **fat_cat dress_up** searches entries for the "fat cat" tag and searches for the "dress up" tag + +#### Search Modes + +On the right side of the window, just below the search bar, there is a dropdown option to choose the search bar. The options are `And (Includes All Tags)`, and `Or (Includes Any Tag)`. In And mode, if you have a list of search terms in the search bar, your search will try to match entries fitting all the search terms in the list. In Or mode, your search will try to match entries fitting even just one of the search terms. + +#### Optional Terms and Partial Match Terms + +If you have a search list with many search terms, you may use tilde ~ before a term to mark it as an optional term in And mode, or as a partial match term in Or mode. +- In And mode, an entry with even just one of the tilde ~ marked optional terms can match a search list that uses them, so long as the entry matches all other terms in the list. + - **costume ~witch ~skeleton** matches all entries tagged costume and witch, and all entries tagged costume and skeleton. +- In Or mode, an entry with all of the tilde ~ marked partial match terms will match a search list that uses them, even if the entry matches no other terms in the list. + - **lake river ~indoors ~life_vest** matches all entries tagged "lake", tagged "river", or tagged with both "indoors" and "life vest". + +> [!NOTE] +> For simplicity, the remaining search examples are written for And mode unless otherwise specified. + +#### Exclude Terms + +Using minus -, exclamation mark !, or "not" before a term matches when the term doesn't. "not" needs a space after it. Tilde ~ will have no effect unless it comes before the exclude indicator. +- **-golf** matches any entry that doesn't have the "golf" tag. +- **party -birthday** matches any entry that has the "party" tag, but that does not have the "birthday" tag. + +#### Parentheses + +Surrounding a list of search terms with parentheses () makes it act like a single term, and allows for more complicated nesting of tags. Every parenthesis needs a space after it, or it will be interpreted as being part of a tag. Square brackets \[\] and curly braces {} can be used too. +- **woods -( scary ~weapon ~animal )** matches any entry with the "woods" tag, unless it is also tagged "scary" and "weapon", or "scary" and "animal" + +#### Other Operators + +It is also possible to use various Boolean operators directly when combining tags instead of relying on tags to be combined automatically. These operators will be evaluated from left to right, after parentheses and exclusion operators, but before the implicit operations in lists of search terms. Every one of these Boolean operators needs a space after it, or it will be interpreted as being part of a tag. Many operators are supported, and many ways of writing the operators are supported. Here is a list: "and", "^", "&", "&&", "or", "v", "|", "||", "nor", "nand", "xor", "!=", "!=\=", "xnor", "=", "=\=", or "=\=\=". +- **sad == dark_clothes and photograph** matches any entry tagged "photograph", but that doesn't have exactly one "sad" or "dark clothes" tag without the other. + +#### Metatags + +There are some terms that can be matched even without the need to be specifically tagged ahead of time. The following are the supported metatags: +- **untagged**/**no_tags** whether the entry has no tags at all yet. +- **empty**/**no_fields** whether the entry has no fields at all, tag or otherwise. This covers text lines, text boxes, dates, and more. +- **no_author**/**no_artist** whether the "Author" or "Artist" fields aren't present in the entry. +- **filename:** matches any portion of text in the name and subdirectory that the file is located in relative to the library's path. Here are some usage examples: + - **filename:copy.png** matches any .PNG file whose filename ends with "copy" regardless of directory, such as "shared img \- Copy.png" + - **filename:subdir1\subdir2** matches any file in subdir2, such as "subdir1\subdir2\subdir3\hidden.gif" + - **filename:subdir3\photo.jpg** matches any file named photo.jpg in a folder called "subdir3", even if the path to it from the library directory is something like "subdir1\subdir2\subdir3\photo.jpg". +- **tag_id:** matches any entry with a tag whose internal id matches the number. Click on a tag in the preview pane on the right to replace the search with a tag_id expression for that tag. + - **tag_id:1001** matches any entry tagged with the first custom tag created for the library. +- Hopefully more coming in the future! + +#### Escape Characters + +If the name of one of your tags overlaps with the search syntax, then put a backslash \\ or a forward slash / just before the tag to ensure it's treated like a tag. +- **~clip_art ~stock_photo \\\~cute~** matches entries tagged "clip art" and "\~cute~" or "stock photo" and "\~cute~". Without the backslash, the tilde ~ at the start of the "\~cute~" tag would mistakenly result in the mandatory "\~cute~" tag being interpreted as an optional "cute~" tag, missing the first tilde. +- **transparent -\\empty -roses** matches entries tagged "transparent", but not tagged "empty" or "roses". Without the backslash, the "empty" tag would be interpreted as a metatag instead of a tag, meaning that the results would mistakenly include entries tagged "empty", since the entry didn't have "no_fields". + ### Half-Implemented Features #### Fix Duplicate Files @@ -184,7 +244,6 @@ Of the several features I have planned for the project, these are broken up into - Improved search - Sortable Search - - Boolean Search - Coexisting Text + Tag Search - Searchable File Metadata - Comprehensive Tag management tab diff --git a/doc/updates/planned_features.md b/doc/updates/planned_features.md index 12f892029..14b8d4c32 100644 --- a/doc/updates/planned_features.md +++ b/doc/updates/planned_features.md @@ -19,7 +19,6 @@ The following lists outline the planned major and minor features for TagStudio, - Settings Menu - Custom User Colors - Search Engine Rework - - Boolean Search - Tag Objects In Search - Search For Fields - Sortable Search Results diff --git a/tagstudio/src/core/library.py b/tagstudio/src/core/library.py index 7eedb8b82..3a207cd74 100644 --- a/tagstudio/src/core/library.py +++ b/tagstudio/src/core/library.py @@ -19,9 +19,10 @@ from src.core.enums import FieldID from src.core.json_typing import JsonCollation, JsonEntry, JsonLibary, JsonTag -from src.core.utils.str import strip_punctuation +from src.core.utils.str import replace_whitespace from src.core.utils.web import strip_web_protocol from src.core.enums import SearchMode +from src.core.search import SearchQuery from src.core.constants import ( BACKUP_FOLDER_NAME, COLLAGE_FOLDER_NAME, @@ -1327,7 +1328,7 @@ def get_entry_id_from_filepath(self, filename: Path): def search_library( self, - query: str = None, + query_string: str = None, entries=True, collations=True, tag_groups=True, @@ -1338,209 +1339,81 @@ def search_library( Returns a list of (str, int) tuples consisting of a result type and ID. """ - # self.filtered_entries.clear() results: list[tuple[ItemType, int]] = [] collations_added = [] - # print(f"Searching Library with query: {query} search_mode: {search_mode}") - if query: - # start_time = time.time() - query = query.strip().lower() - query_words: list[str] = query.split(" ") - all_tag_terms: list[str] = [] - only_untagged: bool = "untagged" in query or "no tags" in query - only_empty: bool = "empty" in query or "no fields" in query - only_missing: bool = "missing" in query or "no file" in query - allow_adv: bool = "filename:" in query_words - tag_only: bool = "tag_id:" in query_words - if allow_adv: - query_words.remove("filename:") - if tag_only: - query_words.remove("tag_id:") - # TODO: Expand this to allow for dynamic fields to work. - only_no_author: bool = "no author" in query or "no artist" in query - - # Preprocess the Tag terms. - if query_words: - # print(query_words, self._tag_strings_to_id_map) - for i, term in enumerate(query_words): - for j, term in enumerate(query_words): - if ( - query_words[i : j + 1] - and " ".join(query_words[i : j + 1]) - in self._tag_strings_to_id_map - ): - all_tag_terms.append(" ".join(query_words[i : j + 1])) - # print(all_tag_terms) - - # This gets rid of any accidental term inclusions because they were words - # in another term. Ex. "3d" getting added in "3d art" - for i, term in enumerate(all_tag_terms): - for j, term2 in enumerate(all_tag_terms): - if i != j and all_tag_terms[i] in all_tag_terms[j]: - # print( - # f'removing {all_tag_terms[i]} because {all_tag_terms[i]} was in {all_tag_terms[j]}') - all_tag_terms.remove(all_tag_terms[i]) - break - - # print(all_tag_terms) + if query_string: + # SearchQuery is not intended to have any direct access to + # this library instance or to its entries. That's in order + # to minimize refactoring if it is reprogrammed to return an + # SQL query instead of evaluating entries itself. + search_query = SearchQuery(query_string, search_mode) + + # By default, SearchQuery does not know the ID of any of its + # tags, or the IDs of any of their child tags. + # share_tag_requests() returns a list of potential tag + # strings that the SearchQuery may need to know the IDs of + # in order to evaluate an entry + tags_to_identify: list[str] = search_query.share_tag_requests() + tag_text_to_id_clusters: dict[str, list[int]] = {} + for tag_text in tags_to_identify: + cluster: set[int] = set() + + # Add the immediate associated Tags to the set (ex. Name, Alias hits) + # Since this term could technically map to multiple IDs, iterate over it + # (You're 99.9999999% likely to just get 1 item) + if tag_text in self._tag_strings_to_id_map: + for id in self._tag_strings_to_id_map[tag_text]: + cluster.add(id) + cluster = cluster.union(set(self.get_tag_cluster(id))) + + tag_text_to_id_clusters[tag_text] = list(cluster) + + fields_to_identify: set[str] = search_query.share_field_requests() + used_fields: set[str] = set() + for field in self.default_fields: + field_name = field["name"] + field_name = replace_whitespace(field_name) + field_name = field_name.lower() + if field_name in fields_to_identify: + used_fields.add(field_name) + + search_query.receive_requested_lib_info( + tag_text_to_id_clusters, used_fields + ) - # non_entry_count = 0 - # Iterate over all Entries ============================================================= + # This loop evaluates the search query against each entry + # and adds the entry to results if it matches the search. for entry in self.entries: - allowed_ext: bool = entry.filename.suffix.lower() not in self.ext_list - # try: - # entry: Entry = self.entries[self.file_to_library_index_map[self._source_filenames[i]]] - # print(f'{entry}') + if self.is_exclude_list == (entry.filename.suffix in self.ext_list): + # The filename of the current entry is not relevant + # to this Library, so skip the entry. + continue - if allowed_ext == self.is_exclude_list: - # If the entry has tags of any kind, append them to this main tag list. - entry_tags: list[int] = [] - entry_authors: list[str] = [] - if entry.fields: - for field in entry.fields: - field_id = list(field.keys())[0] - if self.get_field_obj(field_id)["type"] == "tag_box": - entry_tags.extend(field[field_id]) - if self.get_field_obj(field_id)["name"] == "Author": - entry_authors.extend(field[field_id]) - if self.get_field_obj(field_id)["name"] == "Artist": - entry_authors.extend(field[field_id]) - - # print(f'Entry Tags: {entry_tags}') - - # Add Entries from special flags ------------------------------- - # TODO: Come up with a more user-resistent way to 'archived' and 'favorite' tags. - if only_untagged: - if not entry_tags: - results.append((ItemType.ENTRY, entry.id)) - elif only_no_author: - if not entry_authors: - results.append((ItemType.ENTRY, entry.id)) - elif only_empty: - if not entry.fields: - results.append((ItemType.ENTRY, entry.id)) - elif only_missing: - if ( - self.library_dir / entry.path / entry.filename - ).resolve() in self.missing_files: - results.append((ItemType.ENTRY, entry.id)) - - # elif query == "archived": - # if entry.tags and self._tag_names_to_tag_id_map[self.archived_word.lower()][0] in entry.tags: - # self.filtered_file_list.append(file) - # pb.value = len(self.filtered_file_list) - # elif query in entry.path.lower(): - - # NOTE: This searches path and filenames. - - if allow_adv: - if [q for q in query_words if (q in str(entry.path).lower())]: - results.append((ItemType.ENTRY, entry.id)) - elif [ - q for q in query_words if (q in str(entry.filename).lower()) - ]: - results.append((ItemType.ENTRY, entry.id)) - elif tag_only: - if entry.has_tag(self, int(query_words[0])): - results.append((ItemType.ENTRY, entry.id)) - - # elif query in entry.filename.lower(): - # self.filtered_entries.append(index) - elif entry_tags: - # function to add entry to results - def add_entry(entry: Entry): - # self.filter_entries.append() - # self.filtered_file_list.append(file) - # results.append((SearchItemType.ENTRY, entry.id)) - added = False - for f in entry.fields: - if self.get_field_attr(f, "type") == "collation": - if ( - self.get_field_attr(f, "content") - not in collations_added - ): - results.append( - ( - ItemType.COLLATION, - self.get_field_attr(f, "content"), - ) - ) - collations_added.append( - self.get_field_attr(f, "content") - ) - added = True - - if not added: - results.append((ItemType.ENTRY, entry.id)) - - if search_mode == SearchMode.AND: # Include all terms - # For each verified, extracted Tag term. - failure_to_union_terms = False - for term in all_tag_terms: - # If the term from the previous loop was already verified: - if not failure_to_union_terms: - cluster: set = set() - # Add the immediate associated Tags to the set (ex. Name, Alias hits) - # Since this term could technically map to multiple IDs, iterate over it - # (You're 99.9999999% likely to just get 1 item) - for id in self._tag_strings_to_id_map[term]: - cluster.add(id) - cluster = cluster.union( - set(self.get_tag_cluster(id)) - ) - # print(f'Full Cluster: {cluster}') - # For each of the Tag IDs in the term's ID cluster: - for t in cluster: - # Assume that this ID from the cluster is not in the Entry. - # Wait to see if proven wrong. - failure_to_union_terms = True - # If the ID actually is in the Entry, - if t in entry_tags: - # There wasn't a failure to find one of the term's cluster IDs in the Entry. - # There is also no more need to keep checking the rest of the terms in the cluster. - failure_to_union_terms = False - # print(f"FOUND MATCH: {t}") - break - # print(f'\tFailure to Match: {t}') - # # failure_to_union_terms is used to determine if all terms in the query were found in the entry. - # # If there even were tag terms to search through AND they all match an entry - if all_tag_terms and not failure_to_union_terms: - add_entry(entry) - - if search_mode == SearchMode.OR: # Include any terms - # For each verified, extracted Tag term. - for term in all_tag_terms: - # Add the immediate associated Tags to the set (ex. Name, Alias hits) - # Since this term could technically map to multiple IDs, iterate over it - # (You're 99.9999999% likely to just get 1 item) - for id in self._tag_strings_to_id_map[term]: - # If the ID actually is in the Entry, - if id in entry_tags: - # check if result already contains the entry - if (ItemType.ENTRY, entry.id) not in results: - add_entry(entry) - break - - # sys.stdout.write( - # f'\r[INFO][FILTER]: {len(self.filtered_file_list)} matches found') - # sys.stdout.flush() - - # except: - # # # Put this here to have new non-registered images show up - # # if query == "untagged" or query == "no author" or query == "no artist": - # # self.filtered_file_list.append(file) - # # non_entry_count = non_entry_count + 1 - # pass - - # end_time = time.time() - # print( - # f'[INFO][FILTER]: {len(self.filtered_entries)} matches found ({(end_time - start_time):.3f} seconds)') + entry_tag_ids: list[int] = [] + entry_fields_text: dict[str, str] = {} + + for field in entry.fields: + field_id = list(field.keys())[0] + field_obj = self.get_field_obj(field_id) + + # If the entry has tags of any kind, append their ids to entry_tag_ids. + if field_obj["type"] == "tag_box": + entry_tag_ids.extend(field[field_id]) + field_name = field_obj["name"] + field_name = replace_whitespace(field_name) + field_name = field_name.lower() + if field_obj["type"] in ["text_line", "text_box"]: + entry_fields_text[field_name] = field[field_id] + else: + entry_fields_text[field_name] = None - # if non_entry_count: - # print( - # f'[INFO][FILTER]: There are {non_entry_count} new files in {self.source_dir} that do not have entries. These will not appear in most filtered results.') - # if not self.filtered_entries: - # print("[INFO][FILTER]: Filter returned no results.") + if search_query.match_entry( + path=entry.path, + filename=entry.filename, + tag_ids=entry_tag_ids, + fields_text=entry_fields_text, + ): + results.append((ItemType.ENTRY, entry.id)) else: for entry in self.entries: added = False @@ -1629,8 +1502,8 @@ def search_tags( for string in self._tag_strings_to_id_map: # O(n), n = tags exact_match: bool = False partial_match: bool = False - query = strip_punctuation(query).lower() - string = strip_punctuation(string).lower() + query = replace_whitespace(query).lower() + string = replace_whitespace(string).lower() if query == string: exact_match = True @@ -1796,13 +1669,13 @@ def update_tag(self, tag: Tag) -> None: # Remember that _tag_names_to_tag_id_map maps strings to a LIST of ids. # print( # f'Removing connection from "{old_tag.name.lower()}" to {old_tag.id} in {self._tag_names_to_tag_id_map[old_tag.name.lower()]}') - old_name: str = strip_punctuation(old_tag.name).lower() + old_name: str = replace_whitespace(old_tag.name).lower() self._tag_strings_to_id_map[old_name].remove(old_tag.id) # Delete the map key if it doesn't point to any other IDs. if not self._tag_strings_to_id_map[old_name]: del self._tag_strings_to_id_map[old_name] if old_tag.shorthand: - old_sh: str = strip_punctuation(old_tag.shorthand).lower() + old_sh: str = replace_whitespace(old_tag.shorthand).lower() # print( # f'Removing connection from "{old_tag.shorthand.lower()}" to {old_tag.id} in {self._tag_names_to_tag_id_map[old_tag.shorthand.lower()]}') self._tag_strings_to_id_map[old_sh].remove(old_tag.id) @@ -1811,7 +1684,7 @@ def update_tag(self, tag: Tag) -> None: del self._tag_strings_to_id_map[old_sh] if old_tag.aliases: for alias in old_tag.aliases: - old_a: str = strip_punctuation(alias).lower() + old_a: str = replace_whitespace(alias).lower() # print( # f'Removing connection from "{alias.lower()}" to {old_tag.id} in {self._tag_names_to_tag_id_map[alias.lower()]}') self._tag_strings_to_id_map[old_a].remove(old_tag.id) @@ -2209,18 +2082,18 @@ def _map_tag_strings_to_tag_id(self, tag: Tag) -> None: Uses name_and_alias_to_tag_id_map. """ # tag_id: int, tag_name: str, tag_aliases: list[str] = [] - name: str = strip_punctuation(tag.name).lower() + name: str = replace_whitespace(tag.name).lower() if name not in self._tag_strings_to_id_map: self._tag_strings_to_id_map[name] = [] self._tag_strings_to_id_map[name].append(tag.id) - shorthand: str = strip_punctuation(tag.shorthand).lower() + shorthand: str = replace_whitespace(tag.shorthand).lower() if shorthand not in self._tag_strings_to_id_map: self._tag_strings_to_id_map[shorthand] = [] self._tag_strings_to_id_map[shorthand].append(tag.id) for alias in tag.aliases: - alias = strip_punctuation(alias).lower() + alias = replace_whitespace(alias).lower() if alias not in self._tag_strings_to_id_map: self._tag_strings_to_id_map[alias] = [] self._tag_strings_to_id_map[alias].append(tag.id) diff --git a/tagstudio/src/core/search.py b/tagstudio/src/core/search.py new file mode 100644 index 000000000..4d9fc48b8 --- /dev/null +++ b/tagstudio/src/core/search.py @@ -0,0 +1,724 @@ +"""Search query parsing functionality for use by the src.core.library.Library object in TagStudio""" + +import re +import os # for os.path.sep + +from abc import ABC, abstractmethod +from collections import deque +from pathlib import Path + +from src.core.enums import SearchMode +from src.core.utils.str import replace_whitespace + +"""Container for search relevant entry data received by SearchQuery instances""" + + +class _EntrySearchableData: + def __init__( + self, + path: Path, + filename: Path, + tag_ids: list[int], + fields_text: dict[str, str | None], + ): + self.path = path + self.filename = filename + self.tag_ids = tag_ids + self.fields_text = fields_text + + # Path.__str__() is almost 10x slower than anything else you + # will find in this file combined. If _TagNode.match() ever ends + # up evaluating (str(path) + os.path.sep + str(filename)).lower(), + # then the result is cached here in case the query needs to use + # it multiple times per entry. + self.filestring = "" + + +class ParseError(Exception): + """Thrown when the user formats a search incorrectly. + + Usually this is due to a bad Boolean operator or an extra closing + parenthesis. + + Currently there is no handling logic to inform the user that + that they made a mistake. Instead, ParseErrors are caught, and + problematic Boolean operators are simply ignored during parsing. + """ + + +class _Token: + """A raw token from a search query. + + kind should be "oparen", "cparen", "binary", "unary", or "tag". + """ + + def __init__(self, kind: str, text: str): + self.kind = kind + self.text = text + + def __str__(self): + return f"({self.kind}, {self.text})" + + +class _SynNode(ABC): + """A single node in a SearchQuery's syntax tree.""" + + # _ListNodes need to be able to tell when one of their children is a + # tilde in _ListNode.eval(), so _SynNode has an is_tilde parameter. + def __init__(self, is_tilde: bool = False) -> None: + self.is_tilde = is_tilde + + # Recursively tests and returns whether the passed entry matches + # this _SynNode along with any necessary children. + # + # The entry parameter must contain any information needed for any + # _TagNode in the syntax tree to evaluate whether it matches the + # entry. + @abstractmethod + def match(self, entry: _EntrySearchableData) -> bool: + pass + + # # Implement this at a later date to replace _SynNode.match(). + # # + # # Recursively compile and return an SQL query to retrieve entries + # # that match this _SynNode along with any necessary children. + # @abstractmethod + # def compile_SQL(self): + # pass + + # Recursively returns a string representing this _SynNode and all + # its children. Useful for debugging. + @abstractmethod + def __str__(self) -> str: + pass + + +class _ListNode(_SynNode): + """A list type node in a SearchQuery's syntax tree. + + The default root node of a SearchQuery is an instance of this class. + The _SynNodes that hold the terms after an open parenthesis ( in a + search query string are also instances of this class. + + Instances of this class can have an arbitrary number of child nodes, + (including zero) + """ + + # _ListNode instances change how they match entries based on the + # search_mode. + def __init__(self, search_mode: SearchMode, children: list[_SynNode]) -> None: + super().__init__() + self.search_mode = search_mode + self.children = children + + # The tilde ~ unary operator acts as a flag to indicate whether a + # _ListNode's child node should be treated as optional or partial + # instead of being treated as a normal term. + # + # If search mode is AND, then a search query like + # ( t1 ~t2 t3 ~t4 t5 ~t6 ) is evaluated like + # ( t1 and t3 and t4 and ( t2 or t4 or t6 ) ) + # If search mode is OR, then + # ( t1 ~t2 t3 ~t4 t5 ~t6 ) is evaluated like + # ( t1 or t3 or t4 or ( t2 and t4 and t6 ) ) + # + # Note: _ListNode.match() has to keep track of whether tilde ~ is + # used, otherwise having at least one "optional term" becomes + # mandatory in AND mode, and otherwise having no "partial match + # terms" would automatically count as a full match in OR mode. + def match(self, entry: _EntrySearchableData) -> bool: + if self.search_mode is SearchMode.AND: + uses_optional = False + fulfils_optional = False + for child in self.children: + if child.is_tilde: + uses_optional = True + if child.match(entry): + fulfils_optional = True + elif not child.match(entry): + return False + return not uses_optional or fulfils_optional + elif self.search_mode is SearchMode.OR: + uses_partial = False + fulfils_partial = True + for child in self.children: + if child.is_tilde: + uses_partial = True + if not child.match(entry): + fulfils_partial = False + elif child.match(entry): + return True + return uses_partial and fulfils_partial + + def __str__(self) -> str: + s = "L(" + for child_node in self.children: + s += str(child_node) + s += " " + s = s.removesuffix(" ") + s += ")" + return s + + +class _BinaryNode(_SynNode): + """A two-input operator node in a SearchQuery's syntax tree. + + Instances of this class represent any explicit Boolean operations on + two inputs in the search query string. + + This class has two child nodes to act as the two inputs of this + Boolean operator. + """ + + # operator_text is the operator's raw representation in the search + # query string. This should be one of "and", "^", "&", "&&", "or", + # "v", "|", "||", "nor", "nand", "xor", "!=", "!==", "xnor", "=", + # "==", or "===". + def __init__(self, operator_text, left_in: _SynNode, right_in: _SynNode) -> None: + super().__init__() + self.operator_text = operator_text + self.left_in = left_in + self.right_in = right_in + + def match(self, entry: _EntrySearchableData) -> bool: + match self.operator_text: + case "and" | "^" | "&" | "&&": + return self.left_in.match(entry) and self.right_in.match(entry) + case "or" | "v" | "|" | "||": + return self.left_in.match(entry) or self.right_in.match(entry) + case "nor": + return not self.left_in.match(entry) and not self.right_in.match(entry) + case "nand": + return not self.left_in.match(entry) or not self.right_in.match(entry) + case "xor" | "!=" | "!==": + return self.left_in.match(entry) != self.right_in.match(entry) + case "xnor" | "=" | "==" | "===": + return self.left_in.match(entry) == self.right_in.match(entry) + case other: + raise ValueError( + "self.operator_text must be a valid binary operator. self.operator_text was" + f" '{self.operator_text}'" + ) + + def __str__(self) -> str: + return f"B({self.left_in} {self.operator_text} {self.right_in})" + + +class _UnaryNode(_SynNode): + """A one-input operator node in a SearchQuery's syntax tree. + + Instances of this class represent either the tilde flag, or + exclusion operations in the search query string. + + This class has one child node. + """ + + # operator_text is the operator's raw representation in the search + # query string. This should be one of "not", "-", "!", or "~". + def __init__(self, operator_text, input: _SynNode) -> None: + super().__init__(is_tilde=(operator_text == "~")) + self.operator_text = operator_text + self.input = input + + def match(self, entry: _EntrySearchableData) -> bool: + match self.operator_text: + case "not" | "-" | "!": + return not self.input.match(entry) + # Tilde ~ acts like the identity operator. ~ does not + # directly affect the output. Tilde ~ acts as a flag for + # _ListNode.match() and should not have any other affect. + case "~": + return self.input.match(entry) + case other: + raise ValueError( + "self.operator_text must be a valid unary operator. self.operator_text was" + f" '{self.operator_text}'" + ) + + def __str__(self) -> str: + return f"U({self.operator_text} {self.input})" + + +class _TagNode(_SynNode): + """A tag or metatag leaf node in a SearchQuery's syntax tree. + + Instances of this class represent tags and metatags in search query + strings. + + Instances of his class don't have child nodes, but currently if the + _TagNode represents a tag, then the id_cluster attribute must be set + by SearchQuery before their match() method can be called. id_cluster + should contain any ids associated with the tag and with any child + tags it may have. + + And if an instance of this class represents a field, then the + used_fields attribute must be set by SearchQuery before the + instance's match() method can be called. used_fields should contain + at the names of any relevant fields that entries could be using. + """ + + # token_text should store this tag's raw representation in the + # search query string. (With any escape character included.) + def __init__(self, token_text) -> None: + super().__init__() + self.token_text = token_text + + # search.py is not meant to have direct access to src.core.library, + # so all relevant data has to be passed into this _TagNode manually. + # In the future, a compile_SQL method can be used to return an SQL + # query without any library data being passed at all. + id_cluster: list[int] + used_fields: set[str] + + def match(self, entry: _EntrySearchableData) -> bool: + metatag = self.token_text.replace("-", "_") + match metatag.replace("no_", "no"): + # empty + case "empty" | "nofields": + return not entry.fields_text + # no-artist + case "noauthor" | "noartist": + return ( + "author" not in entry.fields_text + and "artist" not in entry.fields_text + ) + # untagged + case "untagged" | "notags": + return not entry.tag_ids + + # filename:example.png + if ( + self.token_text.startswith("file_name:") + or self.token_text.startswith("file-name:") + or self.token_text.startswith("filename:") + ): + filename = self.token_text + filename = filename.removeprefix("file") + filename = filename.removeprefix("-") + filename = filename.removeprefix("_") + filename = filename.removeprefix("name:") + # str(path) has a noticeable runtime cost, so cache the + # result in case the query needs it multiple times per entry + if not entry.filestring: + entry.filestring = ( + str(entry.path) + os.path.sep + str(entry.filename) + ).lower() + return filename in entry.filestring + + # tag_id:1005 + if ( + self.token_text.startswith("tag_id:") + or self.token_text.startswith("tag-id:") + or self.token_text.startswith("tagid:") + ): + tag_id_text = self.token_text + tag_id_text = tag_id_text.removeprefix("tag") + tag_id_text = tag_id_text.removeprefix("_") + tag_id_text = tag_id_text.removeprefix("-") + tag_id_text = tag_id_text.removeprefix("id:") + return tag_id_text.isdecimal() and int(tag_id_text) in entry.tag_ids + + if ":" in self.token_text: + field_name_text, _, field_content_goal_text = self.token_text.partition(":") + field_name_text = field_name_text.lower() + field_content_goal_text = field_content_goal_text.lower() + + # hastitle:True + goal_to_have_field = field_content_goal_text in [ + "true", + "t", + "yes", + "y", + "1", + ] + goal_not_to_have_field = field_content_goal_text in [ + "false", + "f", + "no", + "n", + "0", + ] + if ( + goal_to_have_field or goal_not_to_have_field + ) and field_name_text.startswith("has"): + # hastitle:True + if field_name_text.removeprefix("has") in self.used_fields: + return ( + field_name_text.removeprefix("has") in entry.fields_text + ) == goal_to_have_field + # has_title:True + if field_name_text.removeprefix("has_") in self.used_fields: + return ( + field_name_text.removeprefix("has_") in entry.fields_text + ) == goal_to_have_field + # has-title:True + if field_name_text.removeprefix("has-") in self.used_fields: + return ( + field_name_text.removeprefix("has-") in entry.fields_text + ) == goal_to_have_field + # hastitle:True + + # URL:artstation + field_name = field_name_text + if field_name in self.used_fields: + if field_name not in entry.fields_text: + return False + field_content = entry.fields_text[field_name] + if field_content is None: + return False + field_content = replace_whitespace(field_content) + field_content = field_content.lower() + return field_content_goal_text in field_content + + if self.token_text.startswith("has"): + # hasdescription + field_name = self.token_text + if field_name.removeprefix("has") in self.used_fields: + return field_name.removeprefix("has") in entry.fields_text + # has_description + if field_name.removeprefix("has_") in self.used_fields: + return field_name.removeprefix("has_") in entry.fields_text + # has-description + if field_name.removeprefix("has-") in self.used_fields: + return field_name.removeprefix("has-") in entry.fields_text + + # token_text is a tag in the entry + for tag_id in self.id_cluster: + # If the ID actually is in the src.core.library.Entry, + if tag_id in entry.tag_ids: + return True + # token_text is still a tag, even though it's not present in the entry + if self.id_cluster: + return False + + # unknown tag, unknown syntax + return False + + def __str__(self) -> str: + return f"T({self.token_text})" + + +# The named regex capturing groups each correspond with one kind of +# token, these are oparen, cparen, binary, unary, and tag. +# +# This regex is designed so that exactly one named capturing group will +# capture a token every time the regex matches. +# +# Only intended to be used by SearchQuery._tokenize(). +_token_regex = re.compile( + r"(?P[([{])(?:\s|$)" + r"|(?P[)\]}])(?:\s|$)" + r"|(?P[&^|v=]|or|\|\||and|&&|nor|nand|xor|!=|!==|xnor|==|===)(?:\s|$)" + r"|(?P[-~!]|not(?=\s|$))" + r"|(?P\S+)(?:\s|$)" +) + + +class SearchQuery: + """This class parses, manages, and interprets search queries. + + search.py is not meant to have direct access to src.core.library, so + so all relevant data has to be passed through to this SearchQuery's + _TagNodes to be stored while this SearchQuery is evaluated against + each and every entry in the library. + + In the future, a compile_SQL method can be used to return an SQL + query without having to manage any library data at all. No + persistence would be needed for that use case and this class could + be converted entirely into a function. + """ + + def __init__(self, query_string, search_mode: SearchMode): + query_tokens: list[_Token] = self._tokenize(query_string.lower()) + + # The _tag_name_to_tag_nodes attribute keeps track of the + # potential tags whose id_clusters the _TagNodes need, and which + # tag nodes requested the information. That way this SearchQuery + # can share requests for the information, and that way this + # SearchQuery can pass received information to the proper + # _TagNodes. + self._tag_name_to_tag_nodes: dict[str, list[_TagNode]] = {} + # The _requested_fields_names attribute keeps track of the + # potential fields that the _TagNodes want to check are in use + # That way this SearchQuery can share requests for the + # information. + self._requested_fields_names: set[str] = set() + # The _tag_nodes_requesting_fields attribute keeps track of all + # _TagNodes that requested to learn whether a field is in use, + # that way this SearchQuery can pass all received information on + # which fields are in use to any requesting _TagNodes. + self._tag_nodes_requesting_fields: set[_TagNode] = set() + + self._syntax_root: _ListNode = self._parse_list_node( + deque(query_tokens), search_mode + ) + + def _tokenize(self, query_string: str) -> list[_Token]: + regex_matches = _token_regex.finditer(query_string) + + tokens: list[_Token] = [] + for match in regex_matches: + # Each re.Match contains a dictionary for the named + # capturing groups in the regex. The keys of the dictionary + # are the names of the groups. If a particular named group + # in a re.Match captured a string of text, then the value of + # the group's dictionary entry is the text that it captured. + # If the named group captured nothing, then its entry's + # value is None. + for match_key, match_value in match.groupdict().items(): + if match_value is not None: + tokens.append(_Token(kind=match_key, text=match_value)) + break + + return tokens + + # This operation can be done because of an open parenthesis in the + # token list or in order to parse the root node of the syntax tree. + # oparens is True to indicate the former case, and False to indicate + # the latter. + # + # The only reason this is an instance method is because + # self._save_lib_info_request() needs to be called whenever a + # _TagNode is called. + def _parse_list_node( + self, tokens: deque[_Token], search_mode: SearchMode, oparens=False + ) -> _ListNode: + children: list[_SynNode] = [] + while tokens: + token = tokens.popleft() + match token.kind: + case "oparen": + list_node = self._parse_list_node(tokens, search_mode, oparens=True) + children.append(list_node) + case "cparen": + if oparens: + return _ListNode(search_mode, children) + # Ignore the erroneous token. Do not raise exception. + # else: + # cparen_text = token.text + # raise ParseError( + # f"'{cparen_text}' has no corresponding open parenthesis." + # ) + case "binary": + if not children: + continue + last_child = children.pop() + try: + binary_node = self._parse_binary_node( + tokens, + search_mode, + oparens, + left_in=last_child, + operator_text=token.text, + ) + # Ignore the error. Do not inform the user. + except ParseError: + children.append(last_child) + else: + children.append(binary_node) + case "unary": + try: + unary_node = self._parse_unary_node( + tokens, search_mode, oparens, operator_text=token.text + ) + # Ignore the error. Do not inform the user. + except ParseError: + pass + else: + children.append(unary_node) + case "tag": + tag_node = _TagNode(token.text) + + self._save_lib_info_request(tag_node) + + children.append(tag_node) + return _ListNode(search_mode, children) + + # This parse operation should only ever be called with a + # _parse_list_node operation higher on the call stack. If that + # operation is waiting for a closed parenthesis, then this operation + # is erroneous and the closed parenthesis should be returned to the + # queue. If that operation is not waiting for a closed parenthesis, + # then this operation can safely consume and ignore closed + # parentheses. oparens is True to indicate the former case, and + # False to indicate the latter. + # + # The only reason this is an instance method is because + # self._save_lib_info_request() needs to be called whenever a + # _TagNode is called. + def _parse_binary_node( + self, + tokens: deque[_Token], + search_mode: SearchMode, + oparens: bool, + operator_text, + left_in: _SynNode, + ) -> _BinaryNode: + while tokens: + token = tokens.popleft() + match token.kind: + case "oparen": + list_node = self._parse_list_node(tokens, search_mode, oparens=True) + return _BinaryNode(operator_text, left_in, list_node) + case "cparen": + if oparens: + tokens.appendleft(token) + cparen_text = token.text + raise ParseError( + f"'{operator_text}' cannot be followed by '{cparen_text}'." + ) + # Ignore the erroneous token. Do not raise exception. + # else: + # cparen_text = token.text + # raise ParseError( + # f"'{cparen_text}' has no corresponding open parenthesis." + # ) + # Ignore the erroneous token. Do not raise exception. + case "binary": + # second_operator_text = token_text + # raise ParseError( + # f"'{operator_text}' cannot be followed by '{second_operator_text}'" + # ) + pass + case "unary": + unary_node = self._parse_unary_node( + tokens, search_mode, oparens, operator_text=token.text + ) + return _BinaryNode(operator_text, left_in, unary_node) + case "tag": + tag_node = _TagNode(token.text) + + self._save_lib_info_request(tag_node) + + return _BinaryNode(operator_text, left_in, tag_node) + raise ParseError(f"'{operator_text}' is not followed by a second term.") + + # This parse operation should only ever be called with a + # _parse_list_node operation higher on the call stack. If that + # operation is waiting for a closed parenthesis, then this operation + # is erroneous and the closed parenthesis should be returned to the + # queue. If that operation is not waiting for a closed parenthesis, + # then this operation can safely consume and ignore closed + # parentheses. oparens is True to indicate the former case, and + # False to indicate the latter. + # + # The only reason this is an instance method is because + # self._save_lib_info_request() needs to be called whenever a + # _TagNode is called. + def _parse_unary_node( + self, + tokens: deque[_Token], + search_mode: SearchMode, + oparens: bool, + operator_text: str, + ) -> _UnaryNode: + while tokens: + token = tokens.popleft() + match token.kind: + case "oparen": + list_node = self._parse_list_node(tokens, search_mode, oparens=True) + return _UnaryNode(operator_text, list_node) + case "cparen": + if oparens: + tokens.appendleft(token) + cparen_text = token.text + raise ParseError( + f"'{operator_text}' cannot be followed by '{cparen_text}'." + ) + # Ignore the erroneous token. Do not raise exception. + # else: + # cparen_text = token.text + raise ParseError( + f"'{cparen_text}' has no corresponding open parenthesis." + ) + case "binary": + tokens.appendleft(token) + second_operator_text = token.text + raise ParseError( + f"'{operator_text}' cannot be followed by '{second_operator_text}'." + ) + case "unary": + unary_node = self._parse_unary_node( + tokens, search_mode, oparens, operator_text=token.text + ) + return _UnaryNode(operator_text=operator_text, input=unary_node) + case "tag": + tag_node = _TagNode(token.text) + + self._save_lib_info_request(tag_node) + + return _UnaryNode(operator_text, tag_node) + raise ParseError(f"'{operator_text}' is not followed by a term.") + + # Assumes the token_text is for a regular tag and not a metatag. + # The reply will be ignored anyway if it is. + def _save_lib_info_request(self, tag_node: _TagNode) -> None: + # These escape characters prevent the syntax node from being + # interpreted as an open parenthesis, a closed parenthesis, a + # unary operator, a binary operator, or a metatag, but this code + # ensures that an escape character it will not interfere with it + # as a tag. + if tag_node.token_text.startswith("/"): + tag_name = tag_node.token_text.removeprefix("/") + else: + tag_name = tag_node.token_text.removeprefix("\\") + # Multiple tag nodes can be associated with the same tag text. + if tag_name not in self._tag_name_to_tag_nodes: + self._tag_name_to_tag_nodes[tag_name] = [] + self._tag_name_to_tag_nodes[tag_name].append(tag_node) + + if ":" in tag_node.token_text: + self._tag_nodes_requesting_fields.add(tag_node) + + field_name = tag_node.token_text.partition(":")[0] + field_name = field_name.lower() + + self._requested_fields_names.add(field_name) + + if field_name.startswith("has"): + self._requested_fields_names.add(field_name.removeprefix("has")) + if field_name.startswith("has_"): + self._requested_fields_names.add(field_name.removeprefix("has_")) + if field_name.startswith("has-"): + self._requested_fields_names.add(field_name.removeprefix("has-")) + + if tag_node.token_text.startswith("has"): + self._tag_nodes_requesting_fields.add(tag_node) + + field_name = tag_node.token_text.removeprefix("has") + self._requested_fields_names.add(field_name) + + if field_name.startswith("_"): + self._requested_fields_names.add(field_name.removeprefix("_")) + if field_name.startswith("-"): + self._requested_fields_names.add(field_name.removeprefix("-")) + + def share_tag_requests(self) -> list[str]: + return list(self._tag_name_to_tag_nodes.keys()) + + def share_field_requests(self) -> set[str]: + return self._requested_fields_names + + def receive_requested_lib_info( + self, tag_name_to_id_clusters: dict[str, list[int]], used_fields: set[str] + ): + for tag_name, id_cluster in tag_name_to_id_clusters.items(): + for tag_node in self._tag_name_to_tag_nodes[tag_name]: + tag_node.id_cluster = id_cluster + + for tag_node in self._tag_nodes_requesting_fields: + tag_node.used_fields = used_fields + + def match_entry( + self, + path: Path, + filename: Path, + tag_ids: list[int], + fields_text: dict[str, str], + ): + return self._syntax_root.match( + _EntrySearchableData(path, filename, tag_ids, fields_text) + ) + + def __str__(self): + return str(self._syntax_root) diff --git a/tagstudio/src/core/utils/str.py b/tagstudio/src/core/utils/str.py index 11c0105ce..096b727e9 100644 --- a/tagstudio/src/core/utils/str.py +++ b/tagstudio/src/core/utils/str.py @@ -2,25 +2,11 @@ # Licensed under the GPL-3.0 License. # Created for TagStudio: https://github.com/CyanVoxel/TagStudio +import re -def strip_punctuation(string: str) -> str: - """Returns a given string stripped of all punctuation characters.""" - return ( - string.replace("(", "") - .replace(")", "") - .replace("[", "") - .replace("]", "") - .replace("{", "") - .replace("}", "") - .replace("'", "") - .replace("`", "") - .replace("’", "") - .replace("‘", "") - .replace('"', "") - .replace("“", "") - .replace("”", "") - .replace("_", "") - .replace("-", "") - .replace(" ", "") - .replace(" ", "") - ) +_space_regex = re.compile("\\s+") + + +def replace_whitespace(string: str) -> str: + """Returns a given string replacing all runs of whitespace characters with underscore _.""" + return re.sub(_space_regex, "_", string) diff --git a/tagstudio/src/qt/widgets/tag_box.py b/tagstudio/src/qt/widgets/tag_box.py index 06b8b1fe5..a626521e2 100644 --- a/tagstudio/src/qt/widgets/tag_box.py +++ b/tagstudio/src/qt/widgets/tag_box.py @@ -98,7 +98,6 @@ def set_tags(self, tags: list[int]): self.base_layout.takeAt(0).widget().deleteLater() is_recycled = True for tag in tags: - # TODO: Remove space from the special search here (tag_id:x) once that system is finalized. # tw = TagWidget(self.lib, self.lib.get_tag(tag), True, True, # on_remove_callback=lambda checked=False, t=tag: (self.lib.get_entry(self.item.id).remove_tag(self.lib, t, self.field_index), self.updated.emit()), # on_click_callback=lambda checked=False, q=f'tag_id: {tag}': (self.driver.main_window.searchField.setText(q), self.driver.filter_items(q)), @@ -106,7 +105,7 @@ def set_tags(self, tags: list[int]): # ) tw = TagWidget(self.lib, self.lib.get_tag(tag), True, True) tw.on_click.connect( - lambda checked=False, q=f"tag_id: {tag}": ( + lambda checked=False, q=f"tag_id:{tag}": ( self.driver.main_window.searchField.setText(q), self.driver.filter_items(q), ) diff --git a/tagstudio/tests/core/test_search.py b/tagstudio/tests/core/test_search.py new file mode 100644 index 000000000..13bacf4a5 --- /dev/null +++ b/tagstudio/tests/core/test_search.py @@ -0,0 +1,1195 @@ +from pathlib import Path + +from src.core.search import SearchQuery +from src.core.enums import SearchMode + + +def sqmtch( + search_query: SearchQuery, + tag_ids: list[int] = [], + fields_text: dict[str, str] = None, + path="subfolder", + filename="entry.png", +) -> bool: + if fields_text is None: + if tag_ids: + fields_text = {"tags": None} + else: + fields_text = {} + return search_query.match_entry( + path=Path(path), + filename=Path(filename), + tag_ids=tag_ids, + fields_text=fields_text, + ) + + +def test_empty_AND_construction(): + search_query = SearchQuery("", SearchMode.AND) + assert search_query + + +def test_empty_OR_construction(): + search_query = SearchQuery("", SearchMode.OR) + assert search_query + + +def test_tokenize_empty_list(): + search_query = SearchQuery("", SearchMode.AND) + assert str(search_query) == "L()" + + +def test_tokenize_ws_list(): + search_query = SearchQuery(" \t\n\r", SearchMode.AND) + assert str(search_query) == "L()" + + +def test_tokenize_ascii_tag(): + search_query = SearchQuery( + "abcdefghijklmnopqrstuvwxyz0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`|{}~", + SearchMode.AND, + ) + assert ( + str(search_query) + == "L(T(abcdefghijklmnopqrstuvwxyz0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`|{}~))" + ) + + +def test_tokenize_leading_ws_tag(): + search_query = SearchQuery(" tag", SearchMode.AND) + assert str(search_query) == "L(T(tag))" + + +def test_tokenize_trailing_ws_tag(): + search_query = SearchQuery("tag ", SearchMode.AND) + assert str(search_query) == "L(T(tag))" + + +def test_tokenize_unary_minus(): + search_query = SearchQuery("-tag", SearchMode.AND) + assert str(search_query) == "L(U(- T(tag)))" + + +def test_tokenize_all_unary(): + search_query = SearchQuery("~-not !tag", SearchMode.AND) + assert str(search_query) == "L(U(~ U(- U(not U(! T(tag))))))" + + +def test_tokenize_all_unary_with_ws(): + search_query = SearchQuery(" ~ - not ! tag ", SearchMode.AND) + assert str(search_query) == "L(U(~ U(- U(not U(! T(tag))))))" + + +def test_tokenize_exc_before_equ_tag(): + search_query = SearchQuery("!=^_^=", SearchMode.AND) + assert str(search_query) == "L(U(! T(=^_^=)))" + + +def test_tokenize_exc_before_equ_equ_tag(): + search_query = SearchQuery("!==tag==", SearchMode.AND) + assert str(search_query) == "L(U(! T(==tag==)))" + + +def test_tokenize_exc_before_equ_equ_equ_equ(): + search_query = SearchQuery("!====", SearchMode.AND) + assert str(search_query) == "L(U(! T(====)))" + + +def test_tokenize_escaped_unary_tags(): + search_query = SearchQuery("/~ /- /! /not", SearchMode.AND) + assert str(search_query) == "L(T(/~) T(/-) T(/!) T(/not))" + + +def test_tokenize_binary_and(): + search_query = SearchQuery("tag1 and tag2", SearchMode.AND) + assert str(search_query) == "L(B(T(tag1) and T(tag2)))" + + +def test_tokenize_all_binary(): + search_query = SearchQuery( + "t01 & t02 ^ t03 | t04 v t05 or t06 || t07 and t08 && t09 nor t10 nand t11 xor t12 != t13 !== t14 xnor t15 == t16 === t17", + SearchMode.AND, + ) + assert ( + str(search_query) + == "L(B(B(B(B(B(B(B(B(B(B(B(B(B(B(B(B(T(t01) & T(t02)) ^ T(t03)) | T(t04)) v T(t05)) or T(t06)) || T(t07)) and T(t08)) && T(t09)) nor T(t10)) nand T(t11)) xor T(t12)) != T(t13)) !== T(t14)) xnor T(t15)) == T(t16)) === T(t17)))" + ) + + +def test_tokenize_leading_binary_tag(): + search_query = SearchQuery("tag1 ^_^ tag3", SearchMode.AND) + assert str(search_query) == "L(T(tag1) T(^_^) T(tag3))" + + +def test_tokenize_binary_oparen_tag(): + search_query = SearchQuery("tag1 |( tag3", SearchMode.AND) + assert str(search_query) == "L(T(tag1) T(|() T(tag3))" + + +def test_tokenize_binary_cparen_tag(): + search_query = SearchQuery("tag1 |) tag3", SearchMode.AND) + assert str(search_query) == "L(T(tag1) T(|)) T(tag3))" + + +def test_tokenize_oparen_binary_tag(): + search_query = SearchQuery("tag1 (| tag3", SearchMode.AND) + assert str(search_query) == "L(T(tag1) T((|) T(tag3))" + + +def test_tokenize_cparen_binary_tag(): + search_query = SearchQuery("tag1 )| tag3", SearchMode.AND) + assert str(search_query) == "L(T(tag1) T()|) T(tag3))" + + +def test_tokenize_escaped_binary_tag(): + search_query = SearchQuery("tag1 /and tag3", SearchMode.AND) + assert str(search_query) == "L(T(tag1) T(/and) T(tag3))" + + +def test_tokenize_nested_lists(): + search_query = SearchQuery("( )", SearchMode.AND) + assert str(search_query) == "L(L())" + + +def test_tokenize_mixed_paren_nexted_lists(): + search_query = SearchQuery("{ [ ( ] } )", SearchMode.AND) + assert str(search_query) == "L(L(L(L())))" + + +def test_tokenize_list_omit_cparen(): + search_query = SearchQuery("{ [ (", SearchMode.AND) + assert str(search_query) == "L(L(L(L())))" + + +def test_tokenize_list_omit_oparen(): + search_query = SearchQuery("] } )", SearchMode.AND) + assert str(search_query) == "L()" + + +def test_tokenize_leading_ws_list(): + search_query = SearchQuery(" ( )", SearchMode.AND) + assert str(search_query) == "L(L())" + + +def test_tokenize_trailing_ws_list(): + search_query = SearchQuery("( ) ", SearchMode.AND) + assert str(search_query) == "L(L())" + + +def test_tokenize_leading_ws_omit_cparen(): + search_query = SearchQuery(" (", SearchMode.AND) + assert str(search_query) == "L(L())" + + +def test_tokenize_trailing_ws_omit_cparen(): + search_query = SearchQuery("( ", SearchMode.AND) + assert str(search_query) == "L(L())" + + +def test_tokenize_leading_ws_omit_oparen(): + search_query = SearchQuery(" )", SearchMode.AND) + assert str(search_query) == "L()" + + +def test_tokenize_trailing_ws_omit_oparen(): + search_query = SearchQuery(") ", SearchMode.AND) + assert str(search_query) == "L()" + + +def test_tokenize_starting_oparen_tag(): + search_query = SearchQuery("(:", SearchMode.AND) + assert str(search_query) == "L(T((:))" + + +def test_tokenize_ending_oparen_tag(): + search_query = SearchQuery(":(", SearchMode.AND) + assert str(search_query) == "L(T(:())" + + +def test_tokenize_starting_cparen_tag(): + search_query = SearchQuery("):", SearchMode.AND) + assert str(search_query) == "L(T():))" + + +def test_tokenize_ending_cparen_tag(): + search_query = SearchQuery(":)", SearchMode.AND) + assert str(search_query) == "L(T(:)))" + + +def test_tokenize_unary_paren(): + search_query = SearchQuery("-( tag", SearchMode.AND) + assert str(search_query) == "L(U(- L(T(tag))))" + + +def test_parse_cparen(): + search_query = SearchQuery("( ) ( ) ( )", SearchMode.AND) + assert str(search_query) == "L(L() L() L())" + + +def test_parse_nested_cparen(): + search_query = SearchQuery("( ( ) ( ) ) ( ( ) ( ) )", SearchMode.AND) + assert str(search_query) == "L(L(L() L()) L(L() L()))" + + +def test_parse_unary_in_nested_list(): + search_query = SearchQuery("( -tag )", SearchMode.AND) + assert str(search_query) == "L(L(U(- T(tag))))" + + +def test_parse_binary_in_nested_list(): + search_query = SearchQuery("( tag1 and tag2 )", SearchMode.AND) + assert str(search_query) == "L(L(B(T(tag1) and T(tag2))))" + + +def test_parse_tag_in_nested_list(): + search_query = SearchQuery("( tag )", SearchMode.AND) + assert str(search_query) == "L(L(T(tag)))" + + +def test_parse_ignore_sole_unary(): + search_query = SearchQuery("not", SearchMode.AND) + assert str(search_query) == "L()" + + +def test_parse_ignore_unary_after_tag(): + search_query = SearchQuery("tag -", SearchMode.AND) + assert str(search_query) == "L(T(tag))" + + +def test_parse_ignore_unary_before_cparen(): + search_query = SearchQuery("( tag1 -) tag2", SearchMode.AND) + assert str(search_query) == "L(L(T(tag1)) T(tag2))" + + +def test_parse_ignore_cparen_after_unary(): + search_query = SearchQuery("tag1 not ) tag2", SearchMode.AND) + assert str(search_query) == "L(T(tag1) U(not T(tag2)))" + + +def test_parse_ignore_nested_unary(): + search_query = SearchQuery("tag - - ", SearchMode.AND) + assert str(search_query) == "L(T(tag))" + + +def test_parse_ignore_unary_before_binary(): + search_query = SearchQuery("tag1 -and tag2", SearchMode.AND) + assert str(search_query) == "L(B(T(tag1) and T(tag2)))" + + +def test_parse_ignore_unary_before_ignored_binary(): + search_query = SearchQuery("( tag1 -and ) tag2", SearchMode.AND) + assert str(search_query) == "L(L(T(tag1)) T(tag2))" + + +def test_parse_ignore_exc_before_xnor2(): + search_query = SearchQuery("tag1 !=== tag2", SearchMode.AND) + assert str(search_query) == "L(B(T(tag1) === T(tag2)))" + + +def test_parse_list_in_unary(): + search_query = SearchQuery("-( )", SearchMode.AND) + assert str(search_query) == "L(U(- L()))" + + +def test_parse_ignore_nested_unary_before_cparen(): + search_query = SearchQuery("-( - ) tag", SearchMode.AND) + assert str(search_query) == "L(U(- L()) T(tag))" + + +def test_parse_ignore_cparen_with_nested_unary(): + search_query = SearchQuery(" ) - ) - ) tag ) ", SearchMode.AND) + assert str(search_query) == "L(U(- U(- T(tag))))" + + +def test_parse_ignore_sole_binary(): + search_query = SearchQuery("and", SearchMode.AND) + assert str(search_query) == "L()" + + +def test_parse_ignore_binary_after_tag(): + search_query = SearchQuery("tag and", SearchMode.AND) + assert str(search_query) == "L(T(tag))" + + +def test_parse_ignore_binary_before_tag(): + search_query = SearchQuery("and tag", SearchMode.AND) + assert str(search_query) == "L(T(tag))" + + +def test_parse_ignore_binary_before_cparen(): + search_query = SearchQuery("( tag1 and ) tag2", SearchMode.AND) + assert str(search_query) == "L(L(T(tag1)) T(tag2))" + + +def test_parse_ignore_cparen_after_binary(): + search_query = SearchQuery("tag1 and ) tag2", SearchMode.AND) + assert str(search_query) == "L(B(T(tag1) and T(tag2)))" + + +def test_parse_ignore_binary_after_binary(): + search_query = SearchQuery("tag1 and nand tag2", SearchMode.AND) + assert str(search_query) == "L(B(T(tag1) and T(tag2)))" + + +def test_parse_binary_before_unary(): + search_query = SearchQuery("tag1 and not tag2", SearchMode.AND) + assert str(search_query) == "L(B(T(tag1) and U(not T(tag2))))" + + +def test_parse_ignore_binary_before_failed_unary(): + search_query = SearchQuery("( tag1 and not ) tag2", SearchMode.AND) + assert str(search_query) == "L(L(T(tag1)) T(tag2))" + + +def test_share_tag_requests_list(): + search_query = SearchQuery(r"/\tag1 \/tag2 tag3 tag3", SearchMode.AND) + assert str(search_query) == r"L(T(/\tag1) T(\/tag2) T(tag3) T(tag3))" + assert search_query.share_tag_requests() == [r"\tag1", "/tag2", "tag3"] + assert search_query.share_field_requests() == set() + + +def test_share_tag_requests_unary(): + search_query = SearchQuery(r"~/\tag1 ~\/tag2 ~tag3 ~tag3", SearchMode.AND) + assert ( + str(search_query) + == r"L(U(~ T(/\tag1)) U(~ T(\/tag2)) U(~ T(tag3)) U(~ T(tag3)))" + ) + assert search_query.share_tag_requests() == [r"\tag1", "/tag2", "tag3"] + assert search_query.share_field_requests() == set() + + +def test_share_tag_requests_binary(): + search_query = SearchQuery( + r"( ) and /\tag1 and \/tag2 and tag3 and tag3", SearchMode.AND + ) + assert ( + str(search_query) + == r"L(B(B(B(B(L() and T(/\tag1)) and T(\/tag2)) and T(tag3)) and T(tag3)))" + ) + assert search_query.share_tag_requests() == [r"\tag1", "/tag2", "tag3"] + assert search_query.share_field_requests() == set() + + +def test_share_field_requests_list(): + search_query = SearchQuery( + r"hasfield1 has_field2 has-field3 hasfield4:true has_field5:true has-field6:true field7:text", + SearchMode.AND, + ) + assert ( + str(search_query) + == r"L(T(hasfield1) T(has_field2) T(has-field3) T(hasfield4:true) T(has_field5:true) T(has-field6:true) T(field7:text))" + ) + assert search_query.share_tag_requests() == [ + "hasfield1", + "has_field2", + "has-field3", + "hasfield4:true", + "has_field5:true", + "has-field6:true", + "field7:text", + ] + assert search_query.share_field_requests() == set( + [ + "field1", + "field2", + "field3", + "field4", + "field5", + "field6", + "field7", + "_field2", + "_field5", + "-field3", + "-field6", + "hasfield4", + "has_field5", + "has-field6", + "field4:true", + "field5:true", + "field6:true", + "_field5:true", + "-field6:true", + ] + ) + + +def test_share_field_requests_unary(): + search_query = SearchQuery( + r"~hasfield1 ~has_field2 ~has-field3 ~hasfield4:true ~has_field5:true ~has-field6:true ~field7:text", + SearchMode.AND, + ) + assert ( + str(search_query) + == r"L(U(~ T(hasfield1)) U(~ T(has_field2)) U(~ T(has-field3)) U(~ T(hasfield4:true)) U(~ T(has_field5:true)) U(~ T(has-field6:true)) U(~ T(field7:text)))" + ) + assert search_query.share_tag_requests() == [ + "hasfield1", + "has_field2", + "has-field3", + "hasfield4:true", + "has_field5:true", + "has-field6:true", + "field7:text", + ] + assert search_query.share_field_requests() == set( + [ + "field1", + "field2", + "field3", + "field4", + "field5", + "field6", + "field7", + "_field2", + "_field5", + "-field3", + "-field6", + "hasfield4", + "has_field5", + "has-field6", + "field4:true", + "field5:true", + "field6:true", + "_field5:true", + "-field6:true", + ] + ) + + +def test_share_field_requests_binary(): + search_query = SearchQuery( + r"hasfield1 and has_field2 and has-field3 and hasfield4:true and has_field5:true and has-field6:true and field7:text", + SearchMode.AND, + ) + assert ( + str(search_query) + == r"L(B(B(B(B(B(B(T(hasfield1) and T(has_field2)) and T(has-field3)) and T(hasfield4:true)) and T(has_field5:true)) and T(has-field6:true)) and T(field7:text)))" + ) + assert search_query.share_tag_requests() == [ + "hasfield1", + "has_field2", + "has-field3", + "hasfield4:true", + "has_field5:true", + "has-field6:true", + "field7:text", + ] + assert search_query.share_field_requests() == set( + [ + "field1", + "field2", + "field3", + "field4", + "field5", + "field6", + "field7", + "_field2", + "_field5", + "-field3", + "-field6", + "hasfield4", + "has_field5", + "has-field6", + "field4:true", + "field5:true", + "field6:true", + "_field5:true", + "-field6:true", + ] + ) + + +def test_receive_requested_lib_info_true(): + search_query = SearchQuery("tag1 tag2 tag2", SearchMode.AND) + assert str(search_query) == "L(T(tag1) T(tag2) T(tag2))" + assert search_query.share_tag_requests() == ["tag1", "tag2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"tag1": [2, 4], "tag2": [3, 5]}, set()) + assert sqmtch(search_query, tag_ids=[2, 3]) + assert sqmtch(search_query, tag_ids=[2, 5]) + assert sqmtch(search_query, tag_ids=[4, 3]) + assert sqmtch(search_query, tag_ids=[4, 5]) + assert not sqmtch(search_query, tag_ids=[2, 4]) + assert not sqmtch(search_query, tag_ids=[3, 5]) + assert not sqmtch(search_query, tag_ids=[2, 6]) + assert not sqmtch(search_query, tag_ids=[1, 3]) + + +def test_eval_and_mode_list(): + search_query = SearchQuery("tag1 tag2 tag3", SearchMode.AND) + assert str(search_query) == "L(T(tag1) T(tag2) T(tag3))" + assert search_query.share_tag_requests() == ["tag1", "tag2", "tag3"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"tag1": [1], "tag2": [2], "tag3": [3]}, set() + ) + assert sqmtch(search_query, tag_ids=[1, 2, 3]) + assert not sqmtch(search_query, tag_ids=[2, 3]) + assert not sqmtch(search_query, tag_ids=[1, 3]) + assert not sqmtch(search_query, tag_ids=[3]) + assert not sqmtch(search_query, tag_ids=[1, 2]) + assert not sqmtch(search_query, tag_ids=[2]) + assert not sqmtch(search_query, tag_ids=[1]) + assert not sqmtch(search_query, tag_ids=[]) + + +def test_eval_or_mode_list(): + search_query = SearchQuery("tag1 tag2 tag3", SearchMode.OR) + assert str(search_query) == "L(T(tag1) T(tag2) T(tag3))" + assert search_query.share_tag_requests() == ["tag1", "tag2", "tag3"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"tag1": [1], "tag2": [2], "tag3": [3]}, set() + ) + assert sqmtch(search_query, tag_ids=[1, 2, 3]) + assert sqmtch(search_query, tag_ids=[2, 3]) + assert sqmtch(search_query, tag_ids=[1, 3]) + assert sqmtch(search_query, tag_ids=[3]) + assert sqmtch(search_query, tag_ids=[1, 2]) + assert sqmtch(search_query, tag_ids=[2]) + assert sqmtch(search_query, tag_ids=[1]) + assert not sqmtch(search_query, tag_ids=[]) + + +def test_eval_optional_tags_list(): + search_query = SearchQuery("tag1 ~tag2 ~tag3", SearchMode.AND) + assert str(search_query) == "L(T(tag1) U(~ T(tag2)) U(~ T(tag3)))" + assert search_query.share_tag_requests() == ["tag1", "tag2", "tag3"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"tag1": [1], "tag2": [2], "tag3": [3]}, set() + ) + assert sqmtch(search_query, tag_ids=[1, 2, 3]) + assert not sqmtch(search_query, tag_ids=[2, 3]) + assert sqmtch(search_query, tag_ids=[1, 3]) + assert not sqmtch(search_query, tag_ids=[3]) + assert sqmtch(search_query, tag_ids=[1, 2]) + assert not sqmtch(search_query, tag_ids=[2]) + assert not sqmtch(search_query, tag_ids=[1]) + assert not sqmtch(search_query, tag_ids=[]) + + +def test_eval_partial_tags_list(): + search_query = SearchQuery("tag1 ~tag2 ~tag3", SearchMode.OR) + assert str(search_query) == "L(T(tag1) U(~ T(tag2)) U(~ T(tag3)))" + assert search_query.share_tag_requests() == ["tag1", "tag2", "tag3"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"tag1": [1], "tag2": [2], "tag3": [3]}, set() + ) + assert sqmtch(search_query, tag_ids=[1, 2, 3]) + assert sqmtch(search_query, tag_ids=[2, 3]) + assert sqmtch(search_query, tag_ids=[1, 3]) + assert not sqmtch(search_query, tag_ids=[3]) + assert sqmtch(search_query, tag_ids=[1, 2]) + assert not sqmtch(search_query, tag_ids=[2]) + assert sqmtch(search_query, tag_ids=[1]) + assert not sqmtch(search_query, tag_ids=[]) + + +def test_eval_all_unary(): + search_query = SearchQuery("-tag1 !tag2 not tag3 -~tag4", SearchMode.AND) + assert ( + str(search_query) + == "L(U(- T(tag1)) U(! T(tag2)) U(not T(tag3)) U(- U(~ T(tag4))))" + ) + assert search_query.share_tag_requests() == ["tag1", "tag2", "tag3", "tag4"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"tag1": [1], "tag2": [2], "tag3": [3], "tag4": [4]}, set() + ) + assert sqmtch(search_query, tag_ids=[]) + + +def test_eval_binary_and_false(): + search_query = SearchQuery("t1 and t2 t1 ^ t2 t1 & t2 t1 && t2", SearchMode.OR) + assert ( + str(search_query) + == "L(B(T(t1) and T(t2)) B(T(t1) ^ T(t2)) B(T(t1) & T(t2)) B(T(t1) && T(t2)))" + ) + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert not sqmtch(search_query, tag_ids=[]) + assert not sqmtch(search_query, tag_ids=[1]) + assert not sqmtch(search_query, tag_ids=[2]) + + +def test_eval_binary_and_true(): + search_query = SearchQuery("t1 and t2 t1 ^ t2 t1 & t2 t1 && t2", SearchMode.AND) + assert ( + str(search_query) + == "L(B(T(t1) and T(t2)) B(T(t1) ^ T(t2)) B(T(t1) & T(t2)) B(T(t1) && T(t2)))" + ) + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert sqmtch(search_query, tag_ids=[1, 2]) + + +def test_eval_binary_or_false(): + search_query = SearchQuery("t1 or t2 t1 v t2 t1 | t2 t1 || t2", SearchMode.OR) + assert ( + str(search_query) + == "L(B(T(t1) or T(t2)) B(T(t1) v T(t2)) B(T(t1) | T(t2)) B(T(t1) || T(t2)))" + ) + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert not sqmtch(search_query, tag_ids=[]) + + +def test_eval_binary_or_true(): + search_query = SearchQuery("t1 or t2 t1 v t2 t1 | t2 t1 || t2", SearchMode.AND) + assert ( + str(search_query) + == "L(B(T(t1) or T(t2)) B(T(t1) v T(t2)) B(T(t1) | T(t2)) B(T(t1) || T(t2)))" + ) + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert sqmtch(search_query, tag_ids=[1]) + assert sqmtch(search_query, tag_ids=[2]) + assert sqmtch(search_query, tag_ids=[1, 2]) + + +def test_eval_binary_nor_false(): + search_query = SearchQuery("t1 nor t2", SearchMode.OR) + assert str(search_query) == "L(B(T(t1) nor T(t2)))" + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert not sqmtch(search_query, tag_ids=[1]) + assert not sqmtch(search_query, tag_ids=[2]) + assert not sqmtch(search_query, tag_ids=[1, 2]) + + +def test_eval_binary_nor_true(): + search_query = SearchQuery("t1 nor t2", SearchMode.AND) + assert str(search_query) == "L(B(T(t1) nor T(t2)))" + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert sqmtch(search_query, tag_ids=[]) + + +def test_eval_binary_nand_false(): + search_query = SearchQuery("t1 nand t2", SearchMode.OR) + assert str(search_query) == "L(B(T(t1) nand T(t2)))" + assert search_query.share_tag_requests() == ["t1", "t2"] + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert not sqmtch(search_query, tag_ids=[1, 2]) + + +def test_eval_binary_nand_true(): + search_query = SearchQuery("t1 nand t2", SearchMode.AND) + assert str(search_query) == "L(B(T(t1) nand T(t2)))" + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert sqmtch(search_query, tag_ids=[]) + assert sqmtch(search_query, tag_ids=[1]) + assert sqmtch(search_query, tag_ids=[2]) + + +def test_eval_binary_xor_false(): + search_query = SearchQuery("t1 xor t2 t1 != t2 t1 !== t2", SearchMode.OR) + assert ( + str(search_query) + == "L(B(T(t1) xor T(t2)) B(T(t1) != T(t2)) B(T(t1) !== T(t2)))" + ) + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert not sqmtch(search_query, tag_ids=[1, 2]) + assert not sqmtch(search_query, tag_ids=[]) + + +def test_eval_binary_xor_true(): + search_query = SearchQuery("t1 xor t2 t1 != t2 t1 !== t2", SearchMode.AND) + assert ( + str(search_query) + == "L(B(T(t1) xor T(t2)) B(T(t1) != T(t2)) B(T(t1) !== T(t2)))" + ) + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert sqmtch(search_query, tag_ids=[1]) + assert sqmtch(search_query, tag_ids=[2]) + + +def test_eval_binary_xnor_false(): + search_query = SearchQuery("t1 xnor t2 t1 = t2 t1 == t2 t1 === t2", SearchMode.OR) + assert ( + str(search_query) + == "L(B(T(t1) xnor T(t2)) B(T(t1) = T(t2)) B(T(t1) == T(t2)) B(T(t1) === T(t2)))" + ) + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert not sqmtch(search_query, tag_ids=[1]) + assert not sqmtch(search_query, tag_ids=[2]) + + +def test_eval_binary_xnor_true(): + search_query = SearchQuery("t1 xnor t2 t1 = t2 t1 == t2 t1 === t2", SearchMode.AND) + assert ( + str(search_query) + == "L(B(T(t1) xnor T(t2)) B(T(t1) = T(t2)) B(T(t1) == T(t2)) B(T(t1) === T(t2)))" + ) + assert search_query.share_tag_requests() == ["t1", "t2"] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info({"t1": [1], "t2": [2]}, set()) + assert sqmtch(search_query, tag_ids=[1, 2]) + assert sqmtch(search_query, tag_ids=[]) + + +def test_eval_tag_empty_tags_false(): + search_query = SearchQuery("empty no_fields no-fields nofields", SearchMode.OR) + assert str(search_query) == "L(T(empty) T(no_fields) T(no-fields) T(nofields))" + assert search_query.share_tag_requests() == [ + "empty", + "no_fields", + "no-fields", + "nofields", + ] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"empty": [], "no_fields": [], "no-fields": [], "nofields": []}, set() + ) + assert not sqmtch(search_query, fields_text={"tags": None}) + + +def test_eval_tag_empty_text_false(): + search_query = SearchQuery("empty no_fields no-fields nofields", SearchMode.OR) + assert str(search_query) == "L(T(empty) T(no_fields) T(no-fields) T(nofields))" + assert search_query.share_tag_requests() == [ + "empty", + "no_fields", + "no-fields", + "nofields", + ] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"empty": [], "no_fields": [], "no-fields": [], "nofields": []}, set() + ) + assert not sqmtch(search_query, fields_text={"description": "desc"}) + + +def test_eval_tag_empty_true(): + search_query = SearchQuery("empty no_fields no-fields nofields", SearchMode.AND) + assert str(search_query) == "L(T(empty) T(no_fields) T(no-fields) T(nofields))" + assert search_query.share_tag_requests() == [ + "empty", + "no_fields", + "no-fields", + "nofields", + ] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"empty": [], "no_fields": [], "no-fields": [], "nofields": []}, set() + ) + assert sqmtch(search_query) + + +def test_eval_tag_no_author_author_false(): + search_query = SearchQuery( + "no_author no-author noauthor no_artist no-artist noartist", SearchMode.OR + ) + assert ( + str(search_query) + == "L(T(no_author) T(no-author) T(noauthor) T(no_artist) T(no-artist) T(noartist))" + ) + assert search_query.share_tag_requests() == [ + "no_author", + "no-author", + "noauthor", + "no_artist", + "no-artist", + "noartist", + ] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + { + "no_author": [], + "no-author": [], + "noauthor": [], + "no_artist": [], + "no-artist": [], + "noartist": [], + }, + set(), + ) + assert not sqmtch(search_query, fields_text={"author": "william_shakespeare"}) + + +def test_eval_tag_no_author_artist_false(): + search_query = SearchQuery( + "no_author no-author noauthor no_artist no-artist noartist", SearchMode.OR + ) + assert ( + str(search_query) + == "L(T(no_author) T(no-author) T(noauthor) T(no_artist) T(no-artist) T(noartist))" + ) + assert search_query.share_tag_requests() == [ + "no_author", + "no-author", + "noauthor", + "no_artist", + "no-artist", + "noartist", + ] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + { + "no_author": [], + "no-author": [], + "noauthor": [], + "no_artist": [], + "no-artist": [], + "noartist": [], + }, + set(), + ) + assert not sqmtch(search_query, fields_text={"artist": "leonardo_da_vinci"}) + + +def test_eval_tag_no_author_true(): + search_query = SearchQuery( + "no_author no-author noauthor no_artist no-artist noartist", SearchMode.AND + ) + assert ( + str(search_query) + == "L(T(no_author) T(no-author) T(noauthor) T(no_artist) T(no-artist) T(noartist))" + ) + assert search_query.share_tag_requests() == [ + "no_author", + "no-author", + "noauthor", + "no_artist", + "no-artist", + "noartist", + ] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + { + "no_author": [], + "no-author": [], + "noauthor": [], + "no_artist": [], + "no-artist": [], + "noartist": [], + }, + set(), + ) + assert sqmtch( + search_query, + fields_text={"tags": None, "title": "title", "description": "desc"}, + ) + assert sqmtch(search_query) + + +def test_eval_tag_untagged_false(): + search_query = SearchQuery("untagged no_tags no-tags notags", SearchMode.OR) + assert str(search_query) == "L(T(untagged) T(no_tags) T(no-tags) T(notags))" + assert search_query.share_tag_requests() == [ + "untagged", + "no_tags", + "no-tags", + "notags", + ] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"untagged": [], "no_tags": [], "no-tags": [], "notags": []}, set() + ) + assert not sqmtch( + search_query, + tag_ids=[1], + fields_text={"tags": None, "title": "title", "description": "desc"}, + ) + assert not sqmtch(search_query, tag_ids=[1], fields_text={"tags": None}) + + +def test_eval_tag_untagged_true(): + search_query = SearchQuery("untagged no_tags no-tags notags", SearchMode.AND) + assert str(search_query) == "L(T(untagged) T(no_tags) T(no-tags) T(notags))" + assert search_query.share_tag_requests() == [ + "untagged", + "no_tags", + "no-tags", + "notags", + ] + assert search_query.share_field_requests() == set() + search_query.receive_requested_lib_info( + {"untagged": [], "no_tags": [], "no-tags": [], "notags": []}, set() + ) + assert sqmtch( + search_query, + tag_ids=[], + fields_text={"tags": None, "title": "title", "description": "desc"}, + ) + assert sqmtch(search_query, tag_ids=[], fields_text={"tags": None}) + assert sqmtch( + search_query, tag_ids=[], fields_text={"title": "title", "description": "desc"} + ) + + +def test_eval_tag_filename(): + search_query = SearchQuery( + "filename:subfolder1 file_name:subfolder1 file-name:subfolder1 filename:entry1.png file_name:entry1.png file-name:entry1.png", + SearchMode.AND, + ) + assert ( + str(search_query) + == "L(T(filename:subfolder1) T(file_name:subfolder1) T(file-name:subfolder1) T(filename:entry1.png) T(file_name:entry1.png) T(file-name:entry1.png))" + ) + assert search_query.share_tag_requests() == [ + "filename:subfolder1", + "file_name:subfolder1", + "file-name:subfolder1", + "filename:entry1.png", + "file_name:entry1.png", + "file-name:entry1.png", + ] + assert search_query.share_field_requests() == set( + ["filename", "file_name", "file-name"] + ) + search_query.receive_requested_lib_info( + { + "filename:subfolder1": [], + "file_name:subfolder1": [], + "file-name:subfolder1": [], + "filename:entry1.png": [], + "file_name:entry1.png": [], + "file-name:entry1.png": [], + }, + set(), + ) + assert sqmtch(search_query, path="subfolder1", filename="entry1.png") + assert not sqmtch(search_query, path="subfolder2", filename="entry1.png") + assert not sqmtch(search_query, path="subfolder1", filename="entry2.png") + assert not sqmtch(search_query, path="subfolder2", filename="entry2.png") + + +def test_eval_tag_tag_id(): + search_query = SearchQuery("tag_id:1 tag-id:2 tagid:3", SearchMode.AND) + assert str(search_query) == "L(T(tag_id:1) T(tag-id:2) T(tagid:3))" + assert search_query.share_tag_requests() == ["tag_id:1", "tag-id:2", "tagid:3"] + assert search_query.share_field_requests() == set(["tag_id", "tag-id", "tagid"]) + search_query.receive_requested_lib_info( + {"tag_id:1": [], "tag-id:2": [], "tagid:3": []}, set() + ) + assert sqmtch(search_query, tag_ids=[1, 2, 3]) + assert not sqmtch(search_query, tag_ids=[1, 2]) + assert not sqmtch(search_query, tag_ids=[2, 3]) + assert not sqmtch(search_query, tag_ids=[1, 3]) + + +def test_eval_tag_fields_true(): + search_query = SearchQuery( + r"hasfield1 has_field2 has-field3 hasfield4:false has_field5:false has-field6:false field7:text", + SearchMode.AND, + ) + assert ( + str(search_query) + == r"L(T(hasfield1) T(has_field2) T(has-field3) T(hasfield4:false) T(has_field5:false) T(has-field6:false) T(field7:text))" + ) + assert search_query.share_tag_requests() == [ + "hasfield1", + "has_field2", + "has-field3", + "hasfield4:false", + "has_field5:false", + "has-field6:false", + "field7:text", + ] + assert search_query.share_field_requests() == set( + [ + "field1", + "field2", + "field3", + "field4", + "field5", + "field6", + "field7", + "_field2", + "_field5", + "-field3", + "-field6", + "hasfield4", + "has_field5", + "has-field6", + "field4:false", + "field5:false", + "field6:false", + "_field5:false", + "-field6:false", + ] + ) + search_query.receive_requested_lib_info( + { + "hasfield1": [], + "has_field2": [], + "has-field3": [], + "hasfield4:false": [], + "has_field5:false": [], + "has-field6:false": [], + "field7:text": [], + }, + set( + [ + "field1", + "field2", + "field3", + "field4", + "field5", + "field6", + "field7", + ] + ), + ) + assert sqmtch( + search_query, + fields_text={ + "tags": None, + "description": "desc", + "field1": "", + "field2": "", + "field3": "", + "field7": "text", + }, + ) + assert sqmtch( + search_query, + fields_text={ + "field1": "", + "field2": "", + "field3": "", + "field7": "text", + }, + ) + assert sqmtch( + search_query, + fields_text={ + "tags": None, + "description": "desc", + "field1": None, + "field2": None, + "field3": None, + "field7": "text", + }, + ) + assert sqmtch( + search_query, + fields_text={ + "field1": None, + "field2": None, + "field3": None, + "field7": "text", + }, + ) + + +def test_eval_tag_fields_false(): + search_query = SearchQuery( + r"hasfield1 has_field2 has-field3 hasfield4:false has_field5:false has-field6:false field7:text", + SearchMode.OR, + ) + assert ( + str(search_query) + == r"L(T(hasfield1) T(has_field2) T(has-field3) T(hasfield4:false) T(has_field5:false) T(has-field6:false) T(field7:text))" + ) + assert search_query.share_tag_requests() == [ + "hasfield1", + "has_field2", + "has-field3", + "hasfield4:false", + "has_field5:false", + "has-field6:false", + "field7:text", + ] + assert search_query.share_field_requests() == set( + [ + "field1", + "field2", + "field3", + "field4", + "field5", + "field6", + "field7", + "_field2", + "_field5", + "-field3", + "-field6", + "hasfield4", + "has_field5", + "has-field6", + "field4:false", + "field5:false", + "field6:false", + "_field5:false", + "-field6:false", + ] + ) + search_query.receive_requested_lib_info( + { + "hasfield1": [], + "has_field2": [], + "has-field3": [], + "hasfield4:false": [], + "has_field5:false": [], + "has-field6:false": [], + "field7:text": [], + }, + set( + [ + "field1", + "field2", + "field3", + "field4", + "field5", + "field6", + "field7", + ] + ), + ) + assert not sqmtch( + search_query, + fields_text={ + "tags": None, + "description": "desc", + "field4": "", + "field5": "", + "field6": "", + "field7": "te_xt_txet", + }, + ) + assert not sqmtch( + search_query, + fields_text={ + "field4": "", + "field5": "", + "field6": "", + "field7": "te_xt_txet", + }, + ) + assert not sqmtch( + search_query, + fields_text={ + "tags": None, + "description": "desc", + "field4": None, + "field5": None, + "field6": None, + "field7": None, + }, + ) + assert not sqmtch( + search_query, + fields_text={ + "field4": None, + "field5": None, + "field6": None, + "field7": None, + }, + )