Skip to content

Emacs package to annotate and sort incoming PDFs into your org files.

License

Notifications You must be signed in to change notification settings

tinloaf/org-incoming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MELPA

This is org-incoming, a package to ingest PDF files into your org or org-roam files.

This package is intended to help you if you have a large number of "incoming" PDF files, e.g. scanned handwritten notes, and you want to somehow capture these PDFs in your org files. "Capturing" here can mean anything from completely transcribing them (or taking OCRed text in the PDF) to just creating an org file with a title, a date and maybe some tags, which links to the archived PDF.

Personally, I use this to ingest the PDFs resulting from my Rocketbook into my org files.

Installation

This package is available on MELPA. So, if you have MELPA you can do M-x package-install org-incoming RET. Alternativerly, to install it manually, clone the repository or get the org-incoming.el file by some other means and put it in your Emacs' load-path. After that, all you need to do is

(require 'org-incoming)

Usage

After configuring org-incoming you can start a new org-incoming session by invoking org-incoming-start. An org-incoming session will process all files in your incoming folders sequentially. Each file passes through two phases:

  1. The "query" phase
  2. The "annotation" phase

Each phase can be completed (going to the next phase or to the next incoming file) by pressing C-c C-c or invoking org-incoming-complete. When you complete the annotation phase for a file, the PDF file will be moved to the correct location and the annotation file will be created.

You can quit your org-incoming session at any point by invoking org-incoming-quit (bound to C-c C-k by default).

Query

In the query phase, your emacs frame should look like this:

Screenshot of the org-incoming query phase

The PDF in displayed in one window, and the other window contains the query buffer. The query buffer contains an form in which you should assign the PDF a title and a date. Note that you can have the date parsed automatically from the filename of the incoming file. You can use Tab and S-Tab to jump between the form fields, and pressing Return while the date field is focussed will bring up a calendar for date selection.

Press C-c C-c (or M-x org-incoming-complete RET) to complete the query phase. If you want to skip the file for now, press C-c C-s (M-x org-incoming-skip RET). If you want to quit your org-incoming session, press C-c C-k (M-x org-incoming-quit RET).

Annotation

In the annotation phase, your emacs frame should look like this:

Screenshot of the org-incoming annotation phase

Here you still see the PDF on the one side, and the annotation file to be created on the other side. Note that the annotation file is pre-filled with the title and date you gave and contains a link to the PDF file, resp. where the PDF file will be moved.

Depending on your configuration, the annotation file will also contain any automatically extracted text, and may be a plain org file, or an org-roam node.

Press C-c C-c (M-x org-incoming-complete RET)to complete the annotation phase and complete processing this file. org-incoming will then automatically proceed with the next file. If you want to skip the file for now, press C-c C-s (M-x org-incoming-skip RET). If you want to quit your org-incoming session, press C-c C-k (M-x org-incoming-quit RET).

Important functions

org-incoming-start

Start a new org-incoming session.

org-incoming-complete

Complete the current phase, advancing to the next. If the current phase is the annotation phase, the PDF file will be moved to its destination, the annotation file will be created at its destination, and the next PDF will be loaded (if any remain).

org-incoming-quit

Quit the current org-incoming session. Any input from the current query or annotation phase will be discarded, and the file currently being processed will not be moved.

org-incoming-skip

Skip the incoming file currently being processed. The file is skipped for the current org-incoming session. If you quit org-incoming and cal org-incoming-start again, the file will be processed again.

Configuration

There is one mandatory configuration setting:

  • org-incoming-dirs: A list of plists describing the source/target pairs and any settings overrides for them.

Each plist must at least contain :source <from-directory> and :target <to-directory>. For each such pair, from-directory is treated as a path to a directory that contains incoming PDF files, and to-directory is the target directory. org-incoming will place its annotation files in the to-directory, and move the PDF files into the org-incoming-pdf-subdir directory inside the to-directory.

Additionally, the plist for each folder pair can contain overrides for almost all of org-incoming's settings, in the form of :<setting-name> <value>. See the respective settings for details.

See this example:

(setq org-incoming-dirs '((:source "/home/user/incoming/folder1" :target "/home/user/org/archive")
                          (:source "/home/user/incoming/folder2" :target "/home/user/org/archive" :use-roam 't)
                          (:source "/home/user/incoming/folder3" :target "/home/user/org/todos" :pdf-subdir "originals")))

With this configuration, all PDF files in ~/incoming/folder1 and ~/incoming/folder2 will have their annotation files in ~/org/archive and (with a default org-incoming-pdf-subdir) their PDFs in ~/org/archive/pdfs. However, PDFs from ~/incloming/folder2 will be annotated with org-roam node files instead of "plain" org files. PDF files from ~/incoming/folder3 will have their annotations in ~/org/todos and their PDFs in ~/org/todos/originals.

Optional configuration

Optionally configurable variables are:

  • org-incoming-parse-date-pattern (or :parse-date-pattern)
  • org-incoming-parse-date-re (or :parse-date-re)
  • org-incoming-pdf-subdir (or :pdf-subdir)
  • org-incoming-use-roam (or :use-roam)
  • org-incoming-annotation-template (or :annotation-template)

Template configuration is explained below. For everything else, please see their respective variable documentation (M-x describe-variable <variablename> RET) for documentation. Each of these variables can be overridden for individual folder pairs by removing the org-incoming- prefix from the variable name and using the remainder as a symbol in the folder pair's plist (see the example above).

Template Configuration

The variable org-incoming-annotation-template (resp. the :annotation-template property) expects a strings that acts as a template for the annotation files. This template will be formatted using s.el's s-format, so see the documentation for details. The available fields are:

  • ${title} - The title assigned during query
  • ${date} - The date assigned during query
  • ${link} - The link to the PDF file (after moving)
  • ${extracted} - Any text extracted from the PDF file"

The default template looks like this:

#+TITLE: ${title}
#+DATE: ${date}

Link: [[${link}]]

* Extracted Text

${extracted}

License

This software is released under the MIT license, also knows as the "Expat License". See License.txt for details.

About

Emacs package to annotate and sort incoming PDFs into your org files.

Topics

Resources

License

Stars

Watchers

Forks