Skip to content

Latest commit

 

History

History
515 lines (432 loc) · 21.3 KB

HACKING.org

File metadata and controls

515 lines (432 loc) · 21.3 KB

Hacking on SHCL

So you want to contribute to SHCL? Here’s everything you need to know!

File Organization

The root folder of the SHCL source tree contains a few loose files and several sub-directories. The root folder should not contain any SHCL source files. Let’s explore the sub-directories!

/core/

This directory contains the “core” of SHCL. The core contains just the common functionality that both the SHCL library and the SHCL shell need. Think of it as a toolbox that you can use to build your own shell. It also represents the portion of SHCL that is required for non-interactive use cases (e.g. applications that just wish to use SHCL’s reader macro).

/core/support/

This folder contains the C functionality that cannot be written in Common Lisp (even with the help of CFFI). Currently, it contains two things:

  1. function wrappers around POSIX APIs which are defined as macros, and
  2. the shcl_spawn function, which provides a bit more flexibility than posix_spawn.

You might be thinking “sure, I see why you need #1, but can’t you implement #2 in Common Lisp?”. Well, yes. Sort of. Some Common Lisp compilers (namely, SBCL) won’t let you fork when there are multiple threads. SHCL uses threads. We can go behind the compiler’s back and call fork directly, but that puts us into an unsupported configuration. Its easier and more portable to just write the fork exec logic in straight C. The lisp compiler should (hopefully) never notice that we forked. That’s the theory, at least.

/shell/

This directory contains things that are only relevant to SHCL when it is used as an interactive shell. For example, shell builtins.

Note: some builtins are defined in /core/.

/test/

This directory contains tests that ensure SHCL is working correctly. The tests are run by the test.lisp script in the root directory.

/linters/

This directory contains scripts which detect stylistic issues with SHCL’s source.

SHCL Doesn’t Fork

SHCL is not a typical shell. In a typical shell, forking isn’t a problem. SHCL doesn’t fork unless it is going to exec. That design decision has far-reaching implications, and it substantially complicates many tasks that a shell must perform.

Subshells, for example, are significantly more difficult. Consider the following shell commands.

(
  exec > tmp.txt;
  VAR=value;
  cd somePath;
  exec echo some content;
  echo not echoed;
)
echo other content

Imagine how a normal forking shell would handle this. When it tries to evaluate the subshell it would fork off a new shell process. It would then evaluate the body of the subshell inside the new process. That body involves making irreversible changes to the shell process. Meanwhile, the main shell is unaffected by any of the the subshell’s changes. They are neatly and automatically contained within the subshell.

Since SHCL doesn’t fork, subshells aren’t quite so simple. Subshells are evaluated inside the same process as the main shell, but changes to the process environment must not leak outside of the subshell. With careful use of UNWIND-PROTECT, you can imagine emulating the effect of a sunshell. Just carefully put everything back to the way it was as the stack unwinds. Things get even more complicated when you consider that subshells must be able to execute in parallel with the main shell. After all, they might have been run in the background with &. You could even have multiple subshells with different opinions on what file descriptor 1 refers to!

All problems in computer science can be solved by another level of indirection – David Wheeler

To make subshells work, SHCL avoids making destructive changing the process at all. All changes go through a layer of indirection. For example, suppose you ask SHCL to redirect standard output to a file.

exec > tmp.txt

SHCL opens the desired file. Let’s say that the operating system returns file descriptor 3 as the handle for tmp.txt. Instead of using dup2 to modify file descriptor 1, SHCL instead just makes a note in its own data structures that virtual file descriptor 1 should take on the value of physical file descriptor 3. When spawning a subprocess, SHCL will consult that table and make the appropriate changes prior to calling exec.

Evaluation

SHCL took a lot of cues from Common Lisp itself. You can see this in SHCL’s main loop. It divides responsibilities just like the classic Common Lisp READ, EVAL, PRINT, LOOP~ would.

;; shell/main.lisp circa June 2019
(loop
  (let* ((form (read-with-handlers)) ; Read
         (result (multiple-value-list (eval-with-restarts form)))) ; Eval
    (debug-log status "RESULT: ~A" result) ; "Print"
    (setf last-result result)))

Note that in the above snippet, EVAL-WITH-RESTARTS actually calls COMMON-LISP:EVAL. SHCL doesn’t ever interpret shell expressions. It merely translates shell expressions into Common Lisp forms. It defers the task of evaluation to the Common Lisp environment. In an interactive shell, we can’t get around using EVAL. We can’t compile the user’s input before they finish typing it!

When SHCL is used to embed shell expressions within Common Lisp source, SHCL doesn’t need to use EVAL. For example, consider SHCL’s #$ reader macro. This macro returns a form that completely tokenizes, parses, and translates the shell expression to Common Lisp before macroexpansion completes.

#+BEING_EXAMPLE SHCL/CORE/LISP-INTERPOLATION> (macroexpand-1 ‘#$ if true; then echo woo; fi #$) (SHCL/CORE/SHELL-FORM:SHELL-IF (SHCL/CORE/SHELL-FORM:SHELL-RUN (WITH-FD-STREAMS NIL (EXPANSION-FOR-WORDS (LIST #<NAME “true”>) :EXPAND-ALIASES T :EXPAND-PATHNAME-WORDS T :SPLIT-FIELDS NIL)) :ENVIRONMENT-CHANGES NIL :FD-CHANGES NIL) (SHCL/CORE/SHELL-FORM:SHELL-RUN (WITH-FD-STREAMS NIL (EXPANSION-FOR-WORDS (LIST #<NAME “echo”> #<NAME “woo”>) :EXPAND-ALIASES T :EXPAND-PATHNAME-WORDS T :SPLIT-FIELDS NIL)) :ENVIRONMENT-CHANGES NIL :FD-CHANGES NIL)) T

As an interesting observation, note that SHCL and the host Common Lisp implementation now form a complete compiler for the POSIX Shell language. SHCL acts as the front end and the host environment acts as the back end. SHCL may very well be the first shell capable of fully compiling a shell script.

The Input Pipeline

The input pipeline is the how SHCL converts a sequence of characters into an executable lisp form. Each stage is a lazy transformation from one sequence to another sequence. You can see this in shcl/shell/main, where the pipeline is created by composing the transformation functions.

;; shell/main.lisp circa June 2019
(as-> wrapped-stream x
  (logging-token-sequence x)
  (logging-command-sequence x)
  (logging-evaluation-form-sequence x))

This results in a sequence that contains Common Lisp forms that, upon evaluation, execute the POSIX Shell expressions contained within the character stream.

Lexer

The lexer takes a character stream and produces a lazy sequence containing the tokens found in the stream.

Much like Common Lisp supports adding reader macros, the lexer can be extended using a generalization of readtables: DISPATCH-TABLE. Some lexer rules cannot be encoded in the DISPATCH-TABLE, but most rules are. Unlike Common Lisp’s readtables, DISPATCH-TABLE supports arbitrarily long dispatch rules.

Parser

The parsing phase consumes a token sequence and produces a lazy syntax tree sequence. Note: syntax trees are not simply lists. They are CLOS objects with named slots.

shcl/core/parser.lisp defines a set of PEG-style parser combinators. The parser combinators are heavily inspired by the popular Parsec library for Haskell. parser.lisp is a general purpose parser generator. In shcl/core/shell-grammar.lisp the parser generator is used to create a parser for the POSIX Shell language.

Just like the lexer supports extensions, SHCL’s parser is designed to accept extensions. SHCL uses a PEG-style parser specifically to make syntactic extensions easier. Unlike the more common LL or LR parsing schemes, PEG parsers are composable and you don’t need to reprocess the entire grammar when a change is made. Those properties made PEGs a natural fit for SHCL.

To actually modify the grammar, SHCL uses a system modeled off of Emacs Lisp’s advice functionality. Every nonterminal and terminal in SHCL’s grammar is an advisable function (see shcl/core/advice.lisp). By defining :AROUND advice on the grammar node of interest, you can easily add new parser rules into the grammar.

Translator

The translation phase consumes a sequence of syntax tree objects and produces a lazy sequence of equivalent Common Lisp forms. Typically, the post-translation form is one of the macros provided by the shcl/core/shell-form package. The macros in that package provide a DSL of sorts that has shell-like semantics.

Unlike the parse trees produced during the parsing phase, the shell-form DSL is meant to be human-friendly. It is intended to be comfortable to read and write directly.

Its worth noting that before any shell builtin, binary, or user-defined shell function is run, the arguments of the command are “expanded”. Expansion deals with things like aliases, shell variable access, and evaluation of lisp splices. Expansion isn’t conceptually part of the pipeline or the translation phase, but it is complex enough to be worth mentioning. The form produced by the translation phase includes the necessary calls to expand arguments, but expansion doesn’t take place until evaluation takes place.

Sequences

When it comes to sequences, SHCL takes a page from Clojure’s book. SHCL has a collection of generic functions (defined in shcl/core/sequence.lisp) that facilitate working with arbitrary sequence types in a uniform way. You can traverse a sequence using HEAD and TAIL, or you can extend it using ATTACH. None of the sequence functions actually modify an object in-place. There are convenience macros that modify places (e.g. POPF), but no sequences are harmed in the execution of those macros.

In this past, SHCL used iterators as the common currency instead of immutable sequences. This led to a number of problems.

  1. Traversing an iterator twice was clunky.
  2. The iterators weren’t thread-safe.
  3. Debugging code that used iterators was clunky (observing the iterator would modify it!)

A little bit more GC traffic was deemed an acceptable trade-of to avoid those issues.

Data

Since subshells are simply threads, there is a high risk of one subshell stomping on another subshell’s state. Using immutable objects and a generally functional model greatly simplifies the task of maintaining thread-based subshells.

Common Lisp doesn’t have great tools off-the-shelf for defining and working with immutable objects. Sure, you can simply avoid defining writer methods for slots, but that is just the first step. What happens when you need to make a tweak in some deeply nested data structure of immutable objects? Rebuilding the entire data structure can be fairly cumbersome. Without some macro assistance, there is a lot of boilerplate. More importantly, that boilerplate needs to be updated whenever the data structure is redefined. Consider the following example of a read-only class. If we later decide to add a new slot we’ll have to update all the with-altered-slot functions to preserve that new slot. When you consider inheritance, the code below is outright broken – there might be slots we don’t know about!

(defclass foo ()
  ((slot1 :initarg :slot1 :reader foo-slot1)
   (slot2 :initarg :slot2 :reader foo-slot2)))

(defun with-altered-slot1 (foo new-value)
  (make-instance 'foo :slot1 new-value :slot2 (foo-slot2 foo)))

(defun with-altered-slot2 (foo new-value)
  (make-instance 'foo :slot1 (foo-slot1 foo) :slot2 new-value))

(with-altered-slot1 (with-altered-slot2 (get-foo) 123) 456)

shcl/core/data.lisp provides macros to make managing immutable objects easier. It takes inspiration from Haskell, Haskell’s Lens library, and FSet’s setf expanders. There are three key ingredients in data.lisp.

CLONE

The CLONE generic function provides a way to make a shallow copy of an object. It intentionally does not handle all objects. Some STANDARD-OBJECT subclasses may behave strangely when naively copied.

CLONE goes a step beyond just copying. For class types, CLONE accepts initargs, too. CLONE allocates a fresh object, copies all the slot values of the original object into the new object, and then uses SHARED-INITIALIZE to process any initargs. The end result is that CLONE behaves more like a variant of MAKE-INSTANCE than a simple copier function.

Returning to our FOO example above, CLONE makes rewriting slots in a new instance much easier. The only price is that we must ensure that our FOO class can be copied correctly. In this instance, FOO is a “dumb struct” and can be copied naively.

(defclass foo ()
  ((slot1 :initarg :slot1 :reader foo-slot1)
   (slot2 :initarg :slot2 :reader foo-slot2)))

(define-clone-method foo) ;; convenience macro for trivially clonable types

(clone (get-foo) :slot1 456 :slot2 123)

DEFINE-CLONING-SETF-EXPANDER

CLONE greatly simplifies the task of updating a complex read-only data structure, but it does little to help with nested data structures. Suppose our FOO class holds an instance of another immutable class and we want to update one of the inner class’s slots.

(defclass bar ()
  ((slot1 :initarg :slot1 :reader bar-slot1)))

(define-clone-method bar)

(defclass foo ()
  ((slot1 :initarg :slot1 :reader foo-slot1)
   (slot2 :initarg :slot2 :reader foo-slot2)
   (bar :initarg :bar :reader foo-bar)))

(define-clone-method foo)

(let ((foo (get-foo)))
  (clone foo :bar (clone (foo-bar foo) :slot1 123)))

This is workable, but it lacks the pizzazz that we had when the structure was flat. We’re repeating ourselves here. We specified that we wanted to operate on the BAR slot in two different ways: the initarg and the call to FOO-BAR. It also just feels heavy. Our goal is to make read-only objects feel just as ergonomic as mutable objects. We’re not living up to that standard with just CLONE.

SHCL’s answer to that is to define SETF expanders that do the heavy lifting for you. DEFINE-CLONING-SETF-EXPANDER defines a SETF expander that clones the target object before making any changes to it. If you’re familiar with FSet’s SETF behaviors, this should be familiar.

(defclass bar ()
  ((slot1
    :reader bar-slot1
    :writer set-bar-slot1
    :initform 'abc)))

(define-clone-method bar)
(define-cloning-setf-expander bar-slot1 set-bar-slot1)

(defclass foo ()
  ((bar
    :reader foo-bar
    :writer set-foo-bar
    :initform (make-instance 'bar))))

(define-clone-method foo)
(define-cloning-setf-expander foo-bar set-foo-bar)

(let* ((foo (make-instance 'foo))
       (old-foo foo))
  (setf (bar-slot1 (foo-bar foo)) 123)
  (assert (eql (bar-slot1 (foo-bar foo)) 123))
  (assert (eql (bar-slot1 (foo-bar old-foo)) 'abc))
  foo)

There are two things that are unaesthetic about this.

  1. We’re using procedural constructs and statefully modifying the place holding the immutable object.
  2. Under the hood, the “immutable” object is modified before the SETF operation completes.

Issue #1 is intrinsic to this approach. The fact of the matter is that its easier to bring a bit of functional goodness to SETF than it is to build a rock-solid replacement for SETF that returns a new value instead of changing one. Using SETF as the backbone is just practical.

Issue #2 is much more complex than it seems. First, notice that it is essential that mutation is allowed during the initialization of an object. With the way CLOS works, you can write an :AROUND method that sees the object pre-initialization and post-initialization. As a result, methods on CLOS’s initialization functions must be tolerant of mutation. In addition, methods on the initialization functions should not assume the object won’t be further modified after they return. A method can check if there are deeper layers to the method onion, but it cannot check if there are layers further out.[fn::Technically, an :AROUND method with exclusively EQL specializers could confidently assume that it is the outer-most method. Even then, you’d need to have the specialization on one of the outer-most functions of the object creation flow (e.g. MAKE-INSTANCE or CLONE). It seems unlikely that anyone would write such a method on CLONE specifically so that they can observe a fully-post-initialization end state, so we’re going to just ignore that possibility in this argument.] As a result, initialization methods must be tolerant of changes that occur after they return.

Thus, the object cannot really be considered immutable until after control returns to the caller that initiated the creation of the object. At that point, it is actually quite helpful if the creator of the object is allowed to mutate it in some way before handing it off to someone else. Otherwise, all interesting state will need to be modifiable via initargs! The owner of a class type may not wish to expose the slot’s initarg, or the state may not be known to the class’s owner.

Suppose we have a FAVORITE-COLOR accessor generic function. Some objects may store a favorite color in a slot, but other types may store their favorite color elsewhere.

(defclass person ()
  ((favorite-color :reader favorite-color :writer set-favorite-color)))

(defmethod favorite-color ((list list))
  (cdr (assoc 'color list)))

(defmethod set-favorite-color (new-value (list list))
  (let ((entry (assoc 'color list)))
    (if entry
        (setf (cdr entry) new-value)
        (error "This list doesn't have a favorite color!"))))

What if we wanted to support cloning a list while changing its favorite color? Sure, we could define a method on CLONE that knows how to handle a :FAVORITE-COLOR initarg, but methods on a generic function are a limited resource. You can’t have two different libraries both try to add initargs to the same type! For common types such as LIST, this is a real problem.

At the end of the day, we can’t assume that all interesting state can be set correctly at initialization time. Allowing ourselves to modify a freshly allocated “immutable” object is a reasonable compromise. We can still pretend we have immutable objects, and the secret mutation is very unlikely to be noticed by anyone.

DEFINE-DATA

shcl/core/data.lisp has one final trick up its sleeve. DEFINE-DATA is a fairly simple wrapper around DEFCLASS. When you define a class with DEFINE-DATA, you are promising that the class (and all future subclasses) will behave like “plain old data”. The exact meaning of “plain old data” isn’t specified, but the intuitive meaning is that it should be safe to treat the object like a C struct. By making the promise to keep the class well behaved, your class will automatically support some common operations (e.g. CLONE, FSET:COMPARE, and MAKE-LOAD-FORM).

You don’t need to use DEFINE-DATA to mark a class as plain old data. You merely need to include DATA in the class’s precedence list. Note that all future subclasses and all superclasses must be compatible with the plain old data promise, too. To help make that requirement a bit more clear, DEFINE-DATA also defaults the metaclass of the class to DATA-CLASS. The DATA-CLASS metaclass and the STANDARD-CLASS metaclass are like oil and water. You cannot inherit a STANDARD-CLASS from a DATA-CLASS or vis versa. This is to help ensure that all subclasses and superclasses have committed to the plain old data promise.

Logging

SHCL keeps an in-memory logging buffer that can be dumped on-demand using the -shcl-dump-logs builtin command. You can add a new log line to the log buffer using shcl/core/utility:debug-log.

Assorted Style Guidelines

  • Every exported symbol should have documentation. Documenting internal functions is also a Good Thing. Document methods at your discretion.
  • Only tests are allowed to access unexported symbols.
  • Treat all exported symbols as public API. No packages are private.
  • Long lines should be avoided.
  • Prefer immutable data structures (e.g. fset or define-data).
  • Prefer a functional style.
  • Prefer explicitly importing symbols with :IMPORT-FROM rather than :USE.

Highly Desired Contributions

Not sure where to begin? How about you take on one of these open problems!

  • Tab complete
  • Signal handling (this is especially thorny given the way subshells work!)
  • Job control
  • Prompt customization
  • More unit tests