Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature_request(debug): detailed debug information #63

Closed
5 tasks
Kristinita opened this issue Jun 9, 2020 · 5 comments
Closed
5 tasks

feature_request(debug): detailed debug information #63

Kristinita opened this issue Jun 9, 2020 · 5 comments

Comments

@Kristinita
Copy link

Kristinita commented Jun 9, 2020

1. Summary

It would be nice, if debug information will be more detailed.

Below I wrote features that would be nice to see in debug logs.

2. Debugging features

  1. Output information for each file
  2. Adapter command; see docs(adapters): third-party adapters commands #53 for details
  3. Processing time for a specific file
  4. Used cache or no
  5. Detect incorrect and/or poor quality text; see feature_request(books): detect incorrect and poor quality text #62 for details

3. Argumentation

3.1. Information for each file

It would be nice if the processing information for each file will be in output.

This will allow users to quickly determine:

  1. When processing which specific file a bug or another problem occurred. Currently, when user run ripgrep-all for directory and subdirectories, it cannot be determined.
  2. The time it takes to process the specific file (when Add support for arbitrary adapters via a config file #60 will be improved).
3.1.1. Example
KiraFirst.epub
Some output

KiraSecond.rtf
Another output

and so on.

3.2. Time

It would be nice get the time spent processing each file. Users will be able to check slow adapters and then switch to another adapter command or replace adapter (after #60 implemented).

3.2.1. Example

Minutes and seconds in the output. For example: 0:14 — 14 seconds, 1:7 — 1 minute, 7 seconds.

3.3. Cache

It would be nice to show, whether the cache was used when processing specific file. This information required to determine, why ripgrep-all have a slow speed for some files.

3.3.1. Example

Cache: true, if cache used for file processing; Cache: false, if no.

4. Example of expected behavior

4.1. Data

For example, I have a directory with files KiraFirst.epub, KiraSecond.rtf, KiraThird.djvu, KiraFourth.doc and KiraFifth.pdf. I run command rga "Kira Goddess!" in this directory.

4.2. Behavior

# [INFO] There is an entry
KiraFirst.epub
ebook-convert KiraFirst.epub KiraFirst.txt
0:14
Cache: true
Page 4147: Kira Goddess!

# [INFO] No matches “Kira Goddess!” in the book
KiraSecond.rtf
ebook-convert KiraSecond.rtf KiraSecond.txt
1:4
Cache: false
No matches

# [INFO] Scanned book, that haven't OCR layer
KiraThird.djvu
ebook-convert KiraThird.epub KiraThird.txt
0:1
Cache: false
This file haven't any searchable text

# [INFO] Bug
KiraFourth.doc
ebook-convert KiraFourth.doc KiraFourth.txt
1:14
Cache: false
# [INFO] I added output from my real bug
Syntax Error: Document stream is empty
Error: subprocess failed: ExitStatus(ExitStatus(1))
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: BaseThreadInitThunk
  15: RtlUserThreadStart
-------------------------------------------------------------------------------

# [INFO] Incorrect or poor quality text, see issue #62 for details
KiraFifth.pdf
ebook-convert KiraFifth.pdf KiraFifth.txt
0:4
Cache: true
WARNING! Possibly, file “KiraFifth.pdf” have a text written not in natural language. The reason for this may be incorrect or poor quality OCR layer. Please, check your “KiraFifth.pdf”.

Thanks.

@phiresky
Copy link
Owner

phiresky commented Jun 9, 2020

I've opened BurntSushi/ripgrep#1610 to solve outputting more info about preprocessors via the --debug flag. Without that, I can only show output for adapters when you directly invoke rga-preproc on a single file not when running rga.

I've also added lots of debug statements and timings in 7b70188, here's an example output of next version (very complex example, usually you don't have an archive that deep):

❯ rga --rga-cache-max-blob-len=10M --debug foobar exampledir/test.zip
[2020-06-09T11:20:51Z DEBUG ripgrep_all::args] rga (our) args: ["rga", "--rga-cache-max-blob-len=10M"]
[2020-06-09T11:20:51Z DEBUG ripgrep_all::args] Configs:
    ~/.config/ripgrep-all/config.json: {
      "$schema": "./config.schema.json"
    }
    RGA_CONFIG: {}
    Args: {
      "cache_max_blob_len": 10000000
    }
    Merged: {
      "$schema": "./config.schema.json",
      "cache_max_blob_len": 10000000
    }
[2020-06-09T11:20:51Z DEBUG ripgrep_all::args] rga (passthrough) args: ["--debug", "foobar", "exampledir/test.zip"]
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters] Chosen available adapters: ffmpeg,pandoc,poppler,zip,decompress,tar,sqlite
[2020-06-09T11:20:51Z DEBUG rga] rg command to run: "rg" "--no-line-number" "--smart-case" "--pre" "~/.cargo/bin/rga-preproc" "--pre-glob" "*.{mkv,MKV,mp4,MP4,avi,AVI,epub,EPUB,odt,ODT,docx,DOCX,fb2,FB2,ipynb,IPYNB,pdf,PDF,zip,ZIP,tgz,TGZ,tbz,TBZ,tbz2,TBZ2,gz,GZ,bz2,BZ2,xz,XZ,zst,ZST,tar,TAR,db,DB,db3,DB3,sqlite,SQLITE,sqlite3,SQLITE3}" "--debug" "foobar" "exampledir/test.zip"
DEBUG|rg::config|crates/core/config.rs:40: ~/.config/ripgreprc: arguments loaded from config file: ["--smart-case", "--max-columns=500", "--max-columns-preview"]
DEBUG|rg::args|crates/core/args.rs:543: final argv: ["rg", "--smart-case", "--max-columns=500", "--max-columns-preview", "--no-line-number", "--smart-case", "--pre", "~/.cargo/bin/rga-preproc", "--pre-glob", "*.{mkv,MKV,mp4,MP4,avi,AVI,epub,EPUB,odt,ODT,docx,DOCX,fb2,FB2,ipynb,IPYNB,pdf,PDF,zip,ZIP,tgz,TGZ,tbz,TBZ,tbz2,TBZ2,gz,GZ,bz2,BZ2,xz,XZ,zst,ZST,tar,TAR,db,DB,db3,DB3,sqlite,SQLITE,sqlite3,SQLITE3}", "--debug", "foobar", "exampledir/test.zip"]
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 1 regexes
DEBUG|rg::search|crates/core/search.rs:396: running preprocessor: "rga-preproc" "exampledir/test.zip"
DEBUG|grep_cli::process|crates/cli/src/process.rs:219: preprocesser command stderr: 
-------------------------------------------------------------------------------
[2020-06-09T11:20:51Z DEBUG ripgrep_all::args] Config: {"$schema":"./config.schema.json","cache_max_blob_len":10000000}
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] path (hint) to preprocess: "~/data/dev/2019/ripgrep-all/exampledir/test.zip"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters] Chosen available adapters: ffmpeg,pandoc,poppler,zip,decompress,tar,sqlite
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Archive recursion depth: 0
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Chose adapter 'zip' because of matcher Fast(FileExtension("zip"))
~/data/dev/2019/ripgrep-all/exampledir/test.zip adapter: zip
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Cache key (with recursion): ([("ffmpeg", 1), ("pandoc", 3), ("poppler", 1), ("zip", 1), ("decompress", 1), ("tar", 1), ("sqlite", 1)], "~/data/dev/2019/ripgrep-all/exampledir/test.zip", SystemTime { tv_sec: 1586342432, tv_nsec: 111076901 })
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc_cache] cache MISS, running adapter
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] adapting with caching...
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::zip] ~/data/dev/2019/ripgrep-all/exampledir/test.zip|test/inner.zip: 162990 bytes (162990 bytes packed)
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] path (hint) to preprocess: "test/inner.zip"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters] Chosen available adapters: ffmpeg,pandoc,poppler,zip,decompress,tar,sqlite
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Archive recursion depth: 1
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Chose adapter 'zip' because of matcher Fast(FileExtension("zip"))
test/inner.zip adapter: zip
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] adapting without caching...
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::zip] test/inner.zip: test/inner.zip|short.pdf: 53687 bytes (53283 bytes packed)
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] path (hint) to preprocess: "short.pdf"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters] Chosen available adapters: ffmpeg,pandoc,poppler,zip,decompress,tar,sqlite
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Archive recursion depth: 2
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Chose adapter 'poppler' because of matcher Fast(FileExtension("pdf"))
short.pdf adapter: poppler
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] adapting without caching...
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::spawning] executing "pdftotext" "-" "-"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] running adapter poppler took 11.5ms
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::zip] test/inner.zip: test/inner.zip|wasteland.docx: 110311 bytes (109415 bytes packed)
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] path (hint) to preprocess: "wasteland.docx"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters] Chosen available adapters: ffmpeg,pandoc,poppler,zip,decompress,tar,sqlite
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Archive recursion depth: 2
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Chose adapter 'pandoc' because of matcher Fast(FileExtension("docx"))
wasteland.docx adapter: pandoc
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] adapting without caching...
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::spawning] executing "pandoc" "--from" "docx" "--to=plain" "--wrap=none" "--atx-headers"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] running adapter pandoc took 0.247s
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] running adapter zip took 0.262s
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::zip] ~/data/dev/2019/ripgrep-all/exampledir/test.zip|test/subdir/short.pdf: 53687 bytes (53283 bytes packed)
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] path (hint) to preprocess: "test/subdir/short.pdf"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters] Chosen available adapters: ffmpeg,pandoc,poppler,zip,decompress,tar,sqlite
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Archive recursion depth: 1
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Chose adapter 'poppler' because of matcher Fast(FileExtension("pdf"))
test/subdir/short.pdf adapter: poppler
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] adapting without caching...
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::spawning] executing "pdftotext" "-" "-"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] running adapter poppler took 11.5ms
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::zip] ~/data/dev/2019/ripgrep-all/exampledir/test.zip|test/subdir/wasteland.pdf: 285191 bytes (248015 bytes packed)
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] path (hint) to preprocess: "test/subdir/wasteland.pdf"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters] Chosen available adapters: ffmpeg,pandoc,poppler,zip,decompress,tar,sqlite
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Archive recursion depth: 1
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] Chose adapter 'poppler' because of matcher Fast(FileExtension("pdf"))
test/subdir/wasteland.pdf adapter: poppler
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] adapting without caching...
[2020-06-09T11:20:51Z DEBUG ripgrep_all::adapters::spawning] executing "pdftotext" "-" "-"
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] running adapter poppler took 66.3ms
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] uncompressed output: 119.92 kB
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc] compressed output: 17.88 kB
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc_cache] running adapter zip took 0.357s
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc_cache] writing 17.88 kB to cache
[2020-06-09T11:20:51Z DEBUG ripgrep_all::preproc_cache] writing to cache took 6.10ms
-------------------------------------------------------------------------------
[2020-06-09T11:20:51Z DEBUG rga] running rg took 0.398s

@Kristinita
Copy link
Author

Type: Question

1. New release

Is any information, when roughly will the next version of ripgrep-all was released? ripgrep-all has new features, but the latest release was released almost 2 months ago. I wait, when I can review new ripgrep-all features.

2. Tip

Chocolatey supports pre-releases, if developers is not ready to release a stable version.

Thanks.

@phiresky
Copy link
Owner

I started a pretty large refactor of the internals to make #52, #64, and #60 work, but I didn't finish it yet since it makes some stuff a lot more complicated.

@Kristinita
Copy link
Author

@phiresky , is now there any rough information on when the new version be released?

Thanks for the great tool.

@phiresky
Copy link
Owner

phiresky commented May 26, 2023

I think this is fixed in 1.0.0-alpha.4

edit: also 0.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants