Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip any non-ASCII characters in the files #28

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

thread13
Copy link

@thread13 thread13 commented Jun 4, 2016

… since this seems to break the source code parser ; certainly the preferred way to go would be to convince the parser to accept non-ASCII characters ( Python does that, for one thing ), but as a quick fix:

--- __init__.py.orig    2016-06-04 20:24:26.507343246 +1000
+++ __init__.py 2016-06-04 20:39:29.206238307 +1000
@@ -23,6 +23,19 @@
 import getopt, sys, os, string, re
 import keyword, parser, symbol, token

+_re_ascii_filter = '[^%s]' % (re.escape(string.printable), )
+
+def ascii_dammit( sourcecode, _re_expr = re.compile( _re_ascii_filter ) ):
+    """
+        just ignore all non-ascii characters 
+        since any identifiers should be ASCII anyway ;
+        nb: this will work for utf-8 as well
+        
+    """
+
+    result = _re_expr.sub( '', sourcecode )
+    return result
+

 class Mark(object):
     """ Marks, as defined by Cscope, that are implemented.
@@ -234,6 +247,7 @@
     # Add path info to any syntax errors in the source files
     if filecontents:
         try:
+            filecontents = ascii_dammit( filecontents )
             indexbuff_len = parseSource(filecontents, indexbuff, indexbuff_len, dump)
         except (SyntaxError, AssertionError) as e:
             e.filename = fullpath

@thread13
Copy link
Author

thread13 commented Jun 4, 2016

pycscope also does not like embedded '\0'-s : ( btw, probably it shall add the filename to the printed exception )

Traceback (most recent call last):
  File "/usr/local/bin/pycscope", line 9, in <module>
    load_entry_point('pycscope==1.2.1', 'console_scripts', 'pycscope')()
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 128, in main
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 171, in work
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 237, in parseFile
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 938, in parseSource
TypeError: suite() argument 1 must be string without null bytes, not str

@thread13
Copy link
Author

thread13 commented Jun 4, 2016

printing the filename of the file that brings us down ( commit 50e42f9 in the fork ):

$ diff -u __init__.py.new __init__.py 
--- __init__.py.orig    2016-06-04 22:31:10.027610098 +1000
+++ __init__.py 2016-06-04 22:53:03.805282697 +1000
@@ -247,11 +247,16 @@
     # Add path info to any syntax errors in the source files
     if filecontents:
         try:
             indexbuff_len = parseSource(filecontents, indexbuff, indexbuff_len, dump)
         except (SyntaxError, AssertionError) as e:
             e.filename = fullpath
             raise e
+        except Exception as e:
+            # debug a fatal exception: 
+            e.filename = fullpath
+            print("pycscope.py: %s in %s" % (e, repr(fullpath)))
+            raise e

     return indexbuff_len

@portante portante added the bug label Oct 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants