You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been getting false positives for some ASCII text files that do not contain source code. In fact here is the contents of one of those files that gets reported as "C++". Those two lines in the text file come back as C++ code as reported by guesslang Guess().language_name()
'These are the contents of file1\nAnd this is the second line of file1'
Here is the snippet of code I use to open and read the contents of all ASCII type files in python. This is a
def getSourceLanguage(filename):
"""
Using guesslang test the text/plain file types if they are written
in a programming language
"""
langName = "ASCII text"
# 1. Open file read contents to string
# 2. run guesslang on the string
try:
with open(filename, 'r') as theFile:
fileContents = theFile.read()
langName = Guess().language_name(fileContents)
except:
print(f"Error: could not open {filename}")
logger.error(f"Error: could not open {filename}")
return langName
Is there a way to tighten up the language type checking?
The text was updated successfully, but these errors were encountered:
Ok I just read this statement from the "How does GuessLang guess" section of the docs.
"Other from that, very small files can be misclassified. " Maybe it misidentifies as source code because my test ASCII file is so small.
I have been getting false positives for some ASCII text files that do not contain source code. In fact here is the contents of one of those files that gets reported as "C++". Those two lines in the text file come back as C++ code as reported by guesslang Guess().language_name()
'These are the contents of file1\nAnd this is the second line of file1'
Here is the snippet of code I use to open and read the contents of all ASCII type files in python. This is a
def getSourceLanguage(filename):
"""
Using guesslang test the text/plain file types if they are written
in a programming language
"""
Is there a way to tighten up the language type checking?
The text was updated successfully, but these errors were encountered: