Skip to content
This repository has been archived by the owner on Jan 1, 2022. It is now read-only.

Lib Blitz Style Summary and Overhaul #72

Open
21 of 62 tasks
epage opened this issue Dec 6, 2021 · 0 comments
Open
21 of 62 tasks

Lib Blitz Style Summary and Overhaul #72

epage opened this issue Dec 6, 2021 · 0 comments

Comments

@epage
Copy link
Owner

epage commented Dec 6, 2021

Issue by kbknapp
Tuesday May 09, 2017 at 19:50 GMT
Originally opened as clap-rs/clap#950


In an effort to get 3.x underway this will be the summary issue linking to all tracking issues. Many of the issues that will come out of this will be excellent "first time" issues that I'd be willing to mentor!

I will be updating this summary periodically (in addition to working on all the other issues in the queue) as well as linking to tracking issues for individual tracking issues as I create them.


From Rust API Guidelines

Rust API guidelines

Crate conformance checklist

  • Organization (crate is structured in an intelligible way)
    • Crate root re-exports common functionality (C-REEXPORT)
    • Modules provide a sensible API hierarchy (C-HIERARCHY)
      • Move completions and (eventual) manpage generation to separate module or crate (#951)
  • Naming (crate aligns with Rust naming conventions)
    • Casing conforms to RFC 430 (C-CASE)
    • Ad-hoc conversions follow as_, to_, into_ conventions (C-CONV)
    • Methods on collections that produce iterators follow iter, iter_mut, into_iter (C-ITER)
    • Iterator type names match the methods that produce them (C-ITER-TY)
    • Ownership suffixes use _mut and _ref (C-OWN-SUFFIX)
    • Single-element containers implement appropriate getters (C-GETTERS)
  • Interoperability (crate interacts nicely with other library functionality)
    • Types eagerly implement common traits (C-COMMON-TRAITS)
      • Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash Debug,
        Display, Default
      • #952
    • Conversions use the standard traits From, AsRef, AsMut (C-CONV-TRAITS)
    • Collections implement FromIterator and Extend (C-COLLECT)
    • Data structures implement Serde's Serialize, Deserialize (C-SERDE)
    • Crate has a "serde" cfg option that enables Serde (C-SERDE-CFG)
    • Types are Send and Sync where possible (C-SEND-SYNC)
    • Error types are Send and Sync (C-SEND-SYNC-ERR)
    • Error types are meaningful, not () (C-MEANINGFUL-ERR)
    • Binary number types provide Hex, Octal, Binary formatting (C-NUM-FMT)
  • Macros (crate presents well-behaved macros)
  • Documentation (crate is abundantly documented)
  • Predictability (crate enables legible code that acts how it looks)
    • Smart pointers do not add inherent methods (C-SMART-PTR)
    • Conversions live on the most specific type involved (C-CONV-SPECIFIC)
    • Functions with a clear receiver are methods (C-METHOD)
    • Functions do not take out-parameters (C-NO-OUT)
    • Operator overloads are unsurprising (C-OVERLOAD)
    • Only smart pointers implement Deref and DerefMut (C-DEREF)
    • Deref and DerefMut never fail (C-DEREF-FAIL)
    • Constructors are static, inherent methods (C-CTOR)
  • Flexibility (crate supports diverse real-world use cases)
    • Functions expose intermediate results to avoid duplicate work (C-INTERMEDIATE)
    • Caller decides where to copy and place data (C-CALLER-CONTROL)
    • Functions minimize assumptions about parameters by using generics (C-GENERIC)
    • Traits are object-safe if they may be useful as a trait object (C-OBJECT)
  • Type safety (crate leverages the type system effectively)
    • Newtypes provide static distinctions (C-NEWTYPE)
    • Arguments convey meaning through types, not bool or Option (C-CUSTOM-TYPE)
    • Types for a set of flags are bitflags, not enums (C-BITFLAG)
    • Builders enable construction of complex values (C-BUILDER)
  • Dependability (crate is unlikely to do the wrong thing)
  • Debuggability (crate is conducive to easy debugging)
  • Future proofing (crate is free to improve without breaking users' code)
  • Necessities (to whom they matter, they really matter)
    • Public dependencies of a stable crate are stable (C-STABLE)
    • Crate and its dependencies have a permissive license (C-PERMISSIVE)

Organization

Crate root re-exports common functionality (C-REEXPORT)

Crates pub use the most common types for convenience, so that clients do not
have to remember or write the crate's module hierarchy to use these types.

Re-exporting is covered in more detail in the The Rust Programming Language
under Crates and Modules.

Examples from serde_json

The serde_json::Value type is the most commonly used type from serde_json.
It is a re-export of a type that lives elsewhere in the module hierarchy, at
serde_json::value::Value. The serde_json::value module defines
other JSON-value-related things that are not re-exported. For example
serde_json::value::Index is the trait that defines types that can be used to
index into a Value using square bracket indexing notation. The Index trait
is not re-exported at the crate root because it would be comparatively rare for
a client crate to need to refer to it.

In addition to types, functions can be re-exported as well. In serde_json the
serde_json::from_str function is a re-export of a function from the
serde_json::de deserialization module, which contains other less common
deserialization-related functionality that is not re-exported.

Modules provide a sensible API hierarchy (C-HIERARCHY)

Examples from Serde

The serde crate is two independent frameworks in one crate - a serialization
half and a deserialization half. The crate is divided accordingly into
serde::ser and serde::de. Part of the deserialization framework is
isolated under serde::de::value because it is a relatively large API surface
that is relatively unimportant, and it would crowd the more common, more
important functionlity located in serde::de if it were to share the same
namespace.

Naming

Casing conforms to RFC 430 (C-CASE)

Basic Rust naming conventions are described in RFC 430.

In general, Rust tends to use CamelCase for "type-level" constructs (types and
traits) and snake_case for "value-level" constructs. More precisely:

Item Convention
Crates unclear
Modules snake_case
Types CamelCase
Traits CamelCase
Enum variants CamelCase
Functions snake_case
Methods snake_case
General constructors new or with_more_details
Conversion constructors from_some_other_type
Local variables snake_case
Static variables SCREAMING_SNAKE_CASE
Constant variables SCREAMING_SNAKE_CASE
Type parameters concise CamelCase, usually single uppercase letter: T
Lifetimes short lowercase, usually a single letter: 'a, 'de, 'src

In CamelCase, acronyms count as one word: use Uuid rather than UUID. In
snake_case, acronyms are lower-cased: is_xid_start.

In snake_case or SCREAMING_SNAKE_CASE, a "word" should never consist of a
single letter unless it is the last "word". So, we have btree_map rather than
b_tree_map, but PI_2 rather than PI2.

Examples from the standard library

The whole standard library. This guideline should be easy!

Ad-hoc conversions follow as_, to_, into_ conventions (C-CONV)

Conversions should be provided as methods, with names prefixed as follows:

Prefix Cost Ownership
as_ Free borrowed -> borrowed
to_ Expensive borrowed -> owned
into_ Variable owned -> owned

For example:

  • str::as_bytes() gives a &[u8] view into a &str, which is free.
  • str::to_owned() copies a &str to a new String, which may require memory
    allocation.
  • String::into_bytes() takes ownership a String and yields the underlying
    Vec<u8>, which is free.
  • BufReader::into_inner() takes ownership of a buffered reader and extracts
    out the underlying reader, which is free. Data in the buffer is
    discarded.
  • BufWriter::into_inner() takes ownership of a buffered writer and extracts
    out the underlying writer, which requires a potentially expensive flush of any
    buffered data.

Conversions prefixed as_ and into_ typically decrease abstraction, either
exposing a view into the underlying representation (as) or deconstructing data
into its underlying representation (into). Conversions prefixed to_, on the
other hand, typically stay at the same level of abstraction but do some work to
change one representation into another.

More examples from the standard library

Methods on collections that produce iterators follow iter, iter_mut, into_iter (C-ITER)

Per RFC 199.

For a container with elements of type U, iterator methods should be named:

fn iter(&self) -> Iter             // Iter implements Iterator<Item = &U>
fn iter_mut(&mut self) -> IterMut  // IterMut implements Iterator<Item = &mut U>
fn into_iter(self) -> IntoIter     // IntoIter implements Iterator<Item = U>

This guideline applies to data structures that are conceptually homogeneous
collections. As a counterexample, the str type is slice of bytes that are
guaranteed to be valid UTF-8. This is conceptually more nuanced than a
homogeneous collection so rather than providing the
iter/iter_mut/into_iter group of iterator methods, it provides
str::bytes to iterate as bytes and str::chars to iterate as chars.

This guideline applies to methods only, not functions. For example
percent_encode from the url crate returns an iterator over percent-encoded
string fragments. There would be no clarity to be had by using an
iter/iter_mut/into_iter convention.

Examples from the standard library

Iterator type names match the methods that produce them (C-ITER-TY)

A method called into_iter() should return a type called IntoIter and
similarly for all other methods that return iterators.

This guideline applies chiefly to methods, but often makes sense for functions
as well. For example the percent_encode function from the url crate
returns an iterator type called PercentEncode.

These type names make the most sense when prefixed with their owning module, for
example vec::IntoIter.

Examples from the standard library

Ownership suffixes use _mut, _ref (C-OWN-SUFFIX)

Functions often come in multiple variants: immutably borrowed, mutably borrowed,
and owned.

The right default depends on the function in question. Variants should be marked
through suffixes.

Exceptions

In the case of iterators, the moving variant can also be understood as an into
conversion, into_iter, and for x in v.into_iter() reads arguably better than
for x in v.iter_move(), so the convention is into_iter.

For mutably borrowed variants, if the mut qualifier is part of a type name,
it should appear as it would appear in the type. For example
Vec::as_mut_slice returns a mut slice; it does what it says.

Immutably borrowed by default

If foo uses/produces an immutable borrow by default, use:

  • The _mut suffix (e.g. foo_mut) for the mutably borrowed variant.
  • The _move suffix (e.g. foo_move) for the owned variant.
Examples from the standard library

TODO rust-api-guidelines#37

Owned by default

If foo uses/produces owned data by default, use:

  • The _ref suffix (e.g. foo_ref) for the immutably borrowed variant.
  • The _mut suffix (e.g. foo_mut) for the mutably borrowed variant.
Examples from the standard library

Single-element containers implement appropriate getters (C-GETTERS)

Single-element contains where accessing the element cannot fail should implement
get and get_mut, with the following signatures.

fn get(&self) -> &V;
fn get_mut(&mut self) -> &mut V;

Single-element containers where the element is Copy (e.g. Cell-like
containers) should instead return the value directly, and not implement a
mutable accessor. TODO rust-api-guidelines#44

fn get(&self) -> V;

For getters that do runtime validation, consider adding unsafe _unchecked
variants.

unsafe fn get_unchecked(&self, index) -> &V;
Examples from the standard library

Interoperability

Types eagerly implement common traits (C-COMMON-TRAITS)

Rust's trait system does not allow orphans: roughly, every impl must live
either in the crate that defines the trait or the implementing type.
Consequently, crates that define new types should eagerly implement all
applicable, common traits.

To see why, consider the following situation:

  • Crate std defines trait Display.
  • Crate url defines type Url, without implementing Display.
  • Crate webapp imports from both std and url,

There is no way for webapp to add Display to url, since it defines
neither. (Note: the newtype pattern can provide an efficient, but inconvenient
workaround.

The most important common traits to implement from std are:

Conversions use the standard traits From, AsRef, AsMut (C-CONV-TRAITS)

The following conversion traits should be implemented where it makes sense:

The following conversion traits should never be implemented:

These traits have a blanket impl based on From and TryFrom. Implement those
instead.

Examples from the standard library
  • From<u16> is implemented for u32 because a smaller integer can always be
    converted to a bigger integer.
  • From<u32> is not implemented for u16 because the conversion may not be
    possible if the integer is too big.
  • TryFrom<u32> is implemented for u16 and returns an error if the integer is
    too big to fit in u16.
  • From<Ipv6Addr> is implemented for IpAddr, which is a type that can
    represent both v4 and v6 IP addresses.

Collections implement FromIterator and Extend (C-COLLECT)

FromIterator and Extend enable collections to be used conveniently with
the following iterator methods:

FromIterator is for creating a new collection containing items from an
iterator, and Extend is for adding items from an iterator onto an existing
collection.

Examples from the standard library
  • Vec<T> implements both FromIterator<T> and Extend<T>.

Data structures implement Serde's Serialize, Deserialize (C-SERDE)

Types that play the role of a data structure should implement Serialize and
Deserialize.

An example of a type that plays the role of a data structure is
linked_hash_map::LinkedHashMap.

An example of a type that does not play the role of a data structure is
byteorder::LittleEndian.

Crate has a "serde" cfg option that enables Serde (C-SERDE-CFG)

If the crate relies on serde_derive to provide Serde impls, the name of the
cfg can still be simply "serde" by using this workaround. Do not use a
different name for the cfg like "serde_impls" or "serde_serialization".

Types are Send and Sync where possible (C-SEND-SYNC)

Send and Sync are automatically implemented when the compiler determines
it is appropriate.

In types that manipulate raw pointers, be vigilant that the Send and Sync
status of your type accurately reflects its thread safety characteristics. Tests
like the following can help catch unintentional regressions in whether the type
implements Send or Sync.

#[test]
fn test_send() {
    fn assert_send<T: Send>() {}
    assert_send::<MyStrangeType>();
}

#[test]
fn test_sync() {
    fn assert_sync<T: Sync>() {}
    assert_sync::<MyStrangeType>();
}

Error types are Send and Sync (C-SEND-SYNC-ERR)

An error that is not Send cannot be returned by a thread run with
thread::spawn. An error that is not Sync cannot be passed across threads
using an Arc. These are common requirements for basic error handling in a
multithreaded application.

Binary number types provide Hex, Octal, Binary formatting (C-NUM-FMT)

These traits control the representation of a type under the {:X}, {:x},
{:o}, and {:b} format specifiers.

Implement these traits for any number type on which you would consider doing
bitwise manipulations like | or &. This is especially appropriate for
bitflag types. Numeric quantity types like struct Nanoseconds(u64) probably do
not need these.

Error types are meaningful, not () (C-MEANINGFUL-ERR)

When defining functions that return Result, and the error carries no
useful additional information, do not use () as the error type. ()
does not implement std::error::Error, and this causes problems for
callers that expect to be able to convert errors to Error. Common
error handling libraries like error-chain expect errors to implement
Error.

Instead, define a meaningful error type specific to your crate.

Examples from the standard library
  • ParseBoolError is returned when failing to parse a bool from a string.

Macros

Input syntax is evocative of the output (C-EVOCATIVE)

Rust macros let you dream up practically whatever input syntax you want. Aim to
keep input syntax familiar and cohesive with the rest of your users' code by
mirroring existing Rust syntax where possible. Pay attention to the choice and
placement of keywords and punctuation.

A good guide is to use syntax, especially keywords and punctuation, that is
similar to what will be produced in the output of the macro.

For example if your macro declares a struct with a particular name given in the
input, preface the name with the keyword struct to signal to readers that a
struct is being declared with the given name.

// Prefer this...
bitflags! {
    struct S: u32 { /* ... */ }
}

// ...over no keyword...
bitflags! {
    S: u32 { /* ... */ }
}

// ...or some ad-hoc word.
bitflags! {
    flags S: u32 { /* ... */ }
}

Another example is semicolons vs commas. Constants in Rust are followed by
semicolons so if your macro declares a chain of constants, they should likely be
followed by semicolons even if the syntax is otherwise slightly different from
Rust's.

// Ordinary constants use semicolons.
const A: u32 = 0b000001;
const B: u32 = 0b000010;

// So prefer this...
bitflags! {
    struct S: u32 {
        const C = 0b000100;
        const D = 0b001000;
    }
}

// ...over this.
bitflags! {
    struct S: u32 {
        const E = 0b010000,
        const F = 0b100000,
    }
}

Macros are so diverse that these specific examples won't be relevant, but think
about how to apply the same principles to your situation.

Item macros compose well with attributes (C-MACRO-ATTR)

Macros that produce more than one output item should support adding attributes
to any one of those items. One common use case would be putting individual items
behind a cfg.

bitflags! {
    struct Flags: u8 {
        #[cfg(windows)]
        const ControlCenter = 0b001;
        #[cfg(unix)]
        const Terminal = 0b010;
    }
}

Macros that produce a struct or enum as output should support attributes so that
the output can be used with derive.

bitflags! {
    #[derive(Default, Serialize)]
    struct Flags: u8 {
        const ControlCenter = 0b001;
        const Terminal = 0b010;
    }
}

Item macros work anywhere that items are allowed (C-ANYWHERE)

Rust allows items to be placed at the module level or within a tighter scope
like a function. Item macros should work equally well as ordinary items in all
of these places. The test suite should include invocations of the macro in at
least the module scope and function scope.

#[cfg(test)]
mod tests {
    test_your_macro_in_a!(module);

    #[test]
    fn anywhere() {
        test_your_macro_in_a!(function);
    }
}

As a simple example of how things can go wrong, this macro works great in a
module scope but fails in a function scope.

macro_rules! broken {
    ($m:ident :: $t:ident) => {
        pub struct $t;
        pub mod $m {
            pub use super::$t;
        }
    }
}

broken!(m::T); // okay, expands to T and m::T

fn g() {
    broken!(m::U); // fails to compile, super::U refers to the containing module not g
}

Item macros support visibility specifiers (C-MACRO-VIS)

Follow Rust syntax for visibility of items produced by a macro. Private by
default, public if pub is specified.

bitflags! {
    struct PrivateFlags: u8 {
        const A = 0b0001;
        const B = 0b0010;
    }
}

bitflags! {
    pub struct PublicFlags: u8 {
        const C = 0b0100;
        const D = 0b1000;
    }
}

Type fragments are flexible (C-MACRO-TY)

If your macro accepts a type fragment like $t:ty in the input, it should be
usable with all of the following:

  • Primitives: u8, &str
  • Relative paths: m::Data
  • Absolute paths: ::base::Data
  • Upward relative paths: super::Data
  • Generics: Vec<String>

As a simple example of how things can go wrong, this macro works great with
primitives and absolute paths but fails with relative paths.

macro_rules! broken {
    ($m:ident => $t:ty) => {
        pub mod $m {
            pub struct Wrapper($t);
        }
    }
}

broken!(a => u8); // okay

broken!(b => ::std::marker::PhantomData<()>); // okay

struct S;
broken!(c => S); // fails to compile

Documentation

Crate level docs are thorough and include examples (C-CRATE-DOC)

See RFC 1687.

All items have a rustdoc example (C-EXAMPLE)

Every public module, trait, struct, enum, function, method, macro, and type
definition should have an example that exercises the functionality.

The purpose of an example is not always to show how to use the item. For
example users can be expected to know how to instantiate and match on an enum
like enum E { A, B }. Rather, an example is often intended to show why
someone would want to use
the item.

This guideline should be applied within reason.

A link to an applicable example on another item may be sufficient. For example
if exactly one function uses a particular type, it may be appropriate to write a
single example on either the function or the type and link to it from the other.

Examples use ?, not try!, not unwrap (C-QUESTION-MARK)

Like it or not, example code is often copied verbatim by users. Unwrapping an
error should be a conscious decision that the user needs to make.

A common way of structuring fallible example code is the following. The lines
beginning with # are compiled by cargo test when building the example but
will not appear in user-visible rustdoc.

/// ```rust
/// # use std::error::Error;
/// #
/// # fn try_main() -> Result<(), Box<Error>> {
/// your;
/// example?;
/// code;
/// #
/// #     Ok(())
/// # }
/// #
/// # fn main() {
/// #     try_main().unwrap();
/// # }
/// ```

Function docs include error conditions in "Errors" section (C-ERROR-DOC)

Per RFC 1574.

This applies to trait methods as well. Trait methods for which the
implementation is allowed or expected to return an error should be documented
with an "Errors" section.

Examples from the standard library

Some implementations of the std::io::Read::read trait method may return an
error.

/// Pull some bytes from this source into the specified buffer, returning
/// how many bytes were read.
///
/// ... lots more info ...
///
/// # Errors
///
/// If this function encounters any form of I/O or other error, an error
/// variant will be returned. If an error is returned then it must be
/// guaranteed that no bytes were read.

Function docs include panic conditions in "Panics" section (C-PANIC-DOC)

Per RFC 1574.

This applies to trait methods as well. Traits methods for which the
implementation is allowed or expected to panic should be documented with a
"Panics" section.

Examples from the standard library

The Vec::insert method may panic.

/// Inserts an element at position `index` within the vector, shifting all
/// elements after it to the right.
///
/// # Panics
///
/// Panics if `index` is out of bounds.

Prose contains hyperlinks to relevant things (C-LINK)

Links to methods within the same type usually look like this:

[`serialize_struct`]: #method.serialize_struct

Links to other types usually look like this:

[`Deserialize`]: trait.Deserialize.html

Links may also point to a parent or child module:

[`Value`]: ../enum.Value.html
[`DeserializeOwned`]: de/trait.DeserializeOwned.html

This guideline is officially recommended by RFC 1574 under the heading "Link
all the things"
.

Cargo.toml publishes CI badges for tier 1 platforms (C-CI)

The Rust compiler regards tier 1 platforms as "guaranteed to work."
Specifically they will each satisfy the following requirements:

  • Official binary releases are provided for the platform.
  • Automated testing is set up to run tests for the platform.
  • Landing changes to the rust-lang/rust repository's master branch is gated on
    tests passing.
  • Documentation for how to use and how to build the platform is available.

Stable, high-profile crates should meet the same level of rigor when it comes to
tier 1. To prove it, Cargo.toml should publish CI badges.

[badges]
travis-ci = { repository = "..." }
appveyor = { repository = "..." }

Cargo.toml includes all common metadata (C-METADATA)

  • authors
  • description
  • license
  • homepage (though see rust-api-guidelines#26)
  • documentation
  • repository
  • readme
  • keywords
  • categories

Crate sets html_root_url attribute (C-HTML-ROOT)

It should point to "https://docs.rs/$crate/$version".

Cargo.toml should contain a note next to the version to remember to bump the
html_root_url when bumping the crate version.

Cargo.toml documentation key points to docs.rs (C-DOCS-RS)

It should point to "https://docs.rs/$crate".

Predictability

Smart pointers do not add inherent methods (C-SMART-PTR)

For example, this is why the Box::into_raw function is defined the way it
is.

impl<T> Box<T> where T: ?Sized {
    fn into_raw(b: Box<T>) -> *mut T { /* ... */ }
}

let boxed_str: Box<str> = /* ... */;
let ptr = Box::into_raw(boxed_str);

If this were defined as an inherent method instead, it would be confusing at the
call site whether the method being called is a method on Box<T> or a method on
T.

impl<T> Box<T> where T: ?Sized {
    // Do not do this.
    fn into_raw(self) -> *mut T { /* ... */ }
}

let boxed_str: Box<str> = /* ... */;

// This is a method on str accessed through the smart pointer Deref impl.
boxed_str.chars()

// This is a method on Box<str>...?
boxed_str.into_raw()

Conversions live on the most specific type involved (C-CONV-SPECIFIC)

When in doubt, prefer to_/as_/into_ to from_, because they are more
ergonomic to use (and can be chained with other methods).

For many conversions between two types, one of the types is clearly more
"specific": it provides some additional invariant or interpretation that is not
present in the other type. For example, str is more specific than &[u8],
since it is a UTF-8 encoded sequence of bytes.

Conversions should live with the more specific of the involved types. Thus,
str provides both the as_bytes method and the from_utf8 constructor
for converting to and from &[u8] values. Besides being intuitive, this
convention avoids polluting concrete types like &[u8] with endless conversion
methods.

Functions with a clear receiver are methods (C-METHOD)

Prefer

impl Foo {
    pub fn frob(&self, w: widget) { /* ... */ }
}

over

pub fn frob(foo: &Foo, w: widget) { /* ... */ }

for any operation that is clearly associated with a particular type.

Methods have numerous advantages over functions:

  • They do not need to be imported or qualified to be used: all you need is a
    value of the appropriate type.
  • Their invocation performs autoborrowing (including mutable borrows).
  • They make it easy to answer the question "what can I do with a value of type
    T" (especially when using rustdoc).
  • They provide self notation, which is more concise and often more clearly
    conveys ownership distinctions.

Functions do not take out-parameters (C-NO-OUT)

Prefer

fn foo() -> (Bar, Bar)

over

fn foo(output: &mut Bar) -> Bar

for returning multiple Bar values.

Compound return types like tuples and structs are efficiently compiled and do
not require heap allocation. If a function needs to return multiple values, it
should do so via one of these types.

The primary exception: sometimes a function is meant to modify data that the
caller already owns, for example to re-use a buffer:

fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>

Operator overloads are unsurprising (C-OVERLOAD)

Operators with built in syntax (*, |, and so on) can be provided for a type
by implementing the traits in std::ops. These operators come with strong
expectations: implement Mul only for an operation that bears some resemblance
to multiplication (and shares the expected properties, e.g. associativity), and
so on for the other traits.

Only smart pointers implement Deref and DerefMut (C-DEREF)

The Deref traits are used implicitly by the compiler in many circumstances,
and interact with method resolution. The relevant rules are designed
specifically to accommodate smart pointers, and so the traits should be used
only for that purpose.

Examples from the standard library

Deref and DerefMut never fail (C-DEREF-FAIL)

Because the Deref traits are invoked implicitly by the compiler in sometimes
subtle ways, failure during dereferencing can be extremely confusing.

Constructors are static, inherent methods (C-CTOR)

In Rust, "constructors" are just a convention:

impl<T> Example<T> {
    pub fn new() -> Example<T> { /* ... */ }
}

Constructors are static (no self) inherent methods for the type that they
construct. Combined with the practice of fully importing type names, this
convention leads to informative but concise construction:

use example::Example;

// Construct a new Example.
let ex = Example::new();

This convention also applied to conversion constructors (prefix from rather
than new).

Constructors for structs with sensible defaults allow clients to concisely
override using the struct update syntax.

pub struct Config {
    pub color: Color,
    pub size: Size,
    pub shape: Shape,
}

impl Config {
    pub fn new() -> Config {
        Config {
            color: Brown,
            size: Medium,
            shape: Square,
        }
    }
}

// In user's code.
let config = Config { color: Red, .. Config::new() };
Examples from the standard library

Flexibility

Functions expose intermediate results to avoid duplicate work (C-INTERMEDIATE)

Many functions that answer a question also compute interesting related data. If
this data is potentially of interest to the client, consider exposing it in the
API.

Examples from the standard library
  • Vec::binary_search does not return a bool of whether the value was
    found, nor an Option<usize> of the index at which the value was maybe found.
    Instead it returns information about the index if found, and also the index at
    which the value would need to be inserted if not found.

  • String::from_utf8 may fail if the input bytes are not UTF-8. In the error
    case it returns an intermediate result that exposes the byte offset up to
    which the input was valid UTF-8, as well as handing back ownership of the
    input bytes.

Caller decides where to copy and place data (C-CALLER-CONTROL)

If a function requires ownership of an argument, it should take ownership of the
argument rather than borrowing and cloning the argument.

// Prefer this:
fn foo(b: Bar) {
    /* use b as owned, directly */
}

// Over this:
fn foo(b: &Bar) {
    let b = b.clone();
    /* use b as owned after cloning */
}

If a function does not require ownership of an argument, it should take a
shared or exclusive borrow of the argument rather than taking ownership and
dropping the argument.

// Prefer this:
fn foo(b: &Bar) {
    /* use b as borrowed */
}

// Over this:
fn foo(b: Bar) {
    /* use b as borrowed, it is implicitly dropped before function returns */
}

The Copy trait should only be used as a bound when absolutely needed, not as a
way of signaling that copies should be cheap to make.

Functions minimize assumptions about parameters by using generics (C-GENERIC)

The fewer assumptions a function makes about its inputs, the more widely usable
it becomes.

Prefer

fn foo<I: Iterator<Item = i64>>(iter: I) { /* ... */ }

over any of

fn foo(c: &[i64]) { /* ... */ }
fn foo(c: &Vec<i64>) { /* ... */ }
fn foo(c: &SomeOtherCollection<i64>) { /* ... */ }

if the function only needs to iterate over the data.

More generally, consider using generics to pinpoint the assumptions a function
needs to make about its arguments.

Advantages of generics
  • Reusability. Generic functions can be applied to an open-ended collection of
    types, while giving a clear contract for the functionality those types must
    provide.

  • Static dispatch and optimization. Each use of a generic function is
    specialized ("monomorphized") to the particular types implementing the trait
    bounds, which means that (1) invocations of trait methods are static, direct
    calls to the implementation and (2) the compiler can inline and otherwise
    optimize these calls.

  • Inline layout. If a struct and enum type is generic over some type
    parameter T, values of type T will be laid out inline in the
    struct/enum, without any indirection.

  • Inference. Since the type parameters to generic functions can usually be
    inferred, generic functions can help cut down on verbosity in code where
    explicit conversions or other method calls would usually be necessary.

  • Precise types. Because generic give a name to the specific type
    implementing a trait, it is possible to be precise about places where that
    exact type is required or produced. For example, a function

    fn binary<T: Trait>(x: T, y: T) -> T

    is guaranteed to consume and produce elements of exactly the same type T; it
    cannot be invoked with parameters of different types that both implement
    Trait.

Disadvantages of generics
  • Code size. Specializing generic functions means that the function body is
    duplicated. The increase in code size must be weighed against the performance
    benefits of static dispatch.

  • Homogeneous types. This is the other side of the "precise types" coin: if
    T is a type parameter, it stands for a single actual type. So for example
    a Vec<T> contains elements of a single concrete type (and, indeed, the
    vector representation is specialized to lay these out in line). Sometimes
    heterogeneous collections are useful; see trait objects.

  • Signature verbosity. Heavy use of generics can make it more difficult to
    read and understand a function's signature.

Examples from the standard library
  • std::fs::File::open takes an argument of generic type AsRef<Path>. This
    allows files to be opened conveniently from a string literal "f.txt", a
    Path, an OsString, and a few other types.

Traits are object-safe if they may be useful as a trait object (C-OBJECT)

Trait objects have some significant limitations: methods invoked through a trait
object cannot use generics, and cannot use Self except in receiver position.

When designing a trait, decide early on whether the trait will be used as an
object or as a bound on generics.

If a trait is meant to be used as an object, its methods should take and return
trait objects rather than use generics.

A where clause of Self: Sized may be used to exclude specific methods from
the trait's object. The following trait is not object-safe due to the generic
method.

trait MyTrait {
    fn object_safe(&self, i: i32);

    fn not_object_safe<T>(&self, t: T);
}

Adding a requirement of Self: Sized to the generic method excludes it from the
trait object and makes the trait object-safe.

trait MyTrait {
    fn object_safe(&self, i: i32);

    fn not_object_safe<T>(&self, t: T) where Self: Sized;
}
Advantages of trait objects
  • Heterogeneity. When you need it, you really need it.
  • Code size. Unlike generics, trait objects do not generate specialized
    (monomorphized) versions of code, which can greatly reduce code size.
Disadvantages of trait objects
  • No generic methods. Trait objects cannot currently provide generic methods.
  • Dynamic dispatch and fat pointers. Trait objects inherently involve
    indirection and vtable dispatch, which can carry a performance penalty.
  • No Self. Except for the method receiver argument, methods on trait objects
    cannot use the Self type.
Examples from the standard library
  • The io::Read and io::Write traits are often used as objects.
  • The Iterator trait has several generic methods marked with where Self: Sized to retain the ability to use Iterator as an object.

Type safety

Newtypes provide static distinctions (C-NEWTYPE)

Newtypes can statically distinguish between different interpretations of an
underlying type.

For example, a f64 value might be used to represent a quantity in miles or in
kilometers. Using newtypes, we can keep track of the intended interpretation:

struct Miles(pub f64);
struct Kilometers(pub f64);

impl Miles {
    fn as_kilometers(&self) -> Kilometers { /* ... */ }
}
impl Kilometers {
    fn as_miles(&self) -> Miles { /* ... */ }
}

Once we have separated these two types, we can statically ensure that we do not
confuse them. For example, the function

fn are_we_there_yet(distance_travelled: Miles) -> bool { /* ... */ }

cannot accidentally be called with a Kilometers value. The compiler will
remind us to perform the conversion, thus averting certain catastrophic bugs.

Arguments convey meaning through types, not bool or Option (C-CUSTOM-TYPE)

Prefer

let w = Widget::new(Small, Round)

over

let w = Widget::new(true, false)

Core types like bool, u8 and Option have many possible interpretations.

Use custom types (whether enums, struct, or tuples) to convey interpretation
and invariants. In the above example, it is not immediately clear what true
and false are conveying without looking up the argument names, but Small and
Round are more suggestive.

Using custom types makes it easier to expand the options later on, for example
by adding an ExtraLarge variant.

See the newtype pattern for a no-cost way to wrap existing types
with a distinguished name.

Types for a set of flags are bitflags, not enums (C-BITFLAG)

Rust supports enum types with explicitly specified discriminants:

enum Color {
    Red = 0xff0000,
    Green = 0x00ff00,
    Blue = 0x0000ff,
}

Custom discriminants are useful when an enum type needs to be serialized to an
integer value compatibly with some other system/language. They support
"typesafe" APIs: by taking a Color, rather than an integer, a function is
guaranteed to get well-formed inputs, even if it later views those inputs as
integers.

An enum allows an API to request exactly one choice from among many. Sometimes
an API's input is instead the presence or absence of a set of flags. In C code,
this is often done by having each flag correspond to a particular bit, allowing
a single integer to represent, say, 32 or 64 flags. Rust's bitflags crate
provides a typesafe representation of this pattern.

#[macro_use]
extern crate bitflags;

bitflags! {
    flags Flags: u32 {
        const FLAG_A = 0b00000001,
        const FLAG_B = 0b00000010,
        const FLAG_C = 0b00000100,
    }
}

fn f(settings: Flags) {
    if settings.contains(FLAG_A) {
        println!("doing thing A");
    }
    if settings.contains(FLAG_B) {
        println!("doing thing B");
    }
    if settings.contains(FLAG_C) {
        println!("doing thing C");
    }
}

fn main() {
    f(FLAG_A | FLAG_C);
}

Builders enable construction of complex values (C-BUILDER)

Some data structures are complicated to construct, due to their construction
needing:

  • a large number of inputs
  • compound data (e.g. slices)
  • optional configuration data
  • choice between several flavors

which can easily lead to a large number of distinct constructors with many
arguments each.

If T is such a data structure, consider introducing a T builder:

  1. Introduce a separate data type TBuilder for incrementally configuring a T
    value. When possible, choose a better name: e.g. Command is the builder
    for a child process, Url can be created from a ParseOptions.
  2. The builder constructor should take as parameters only the data required to
    make a T.
  3. The builder should offer a suite of convenient methods for configuration,
    including setting up compound inputs (like slices) incrementally. These
    methods should return self to allow chaining.
  4. The builder should provide one or more "terminal" methods for actually
    building a T.

The builder pattern is especially appropriate when building a T involves side
effects, such as spawning a task or launching a process.

In Rust, there are two variants of the builder pattern, differing in the
treatment of ownership, as described below.

Non-consuming builders (preferred):

In some cases, constructing the final T does not require the builder itself to
be consumed. The follow variant on std::process::Command is one example:

// NOTE: the actual Command API does not use owned Strings;
// this is a simplified version.

pub struct Command {
    program: String,
    args: Vec<String>,
    cwd: Option<String>,
    // etc
}

impl Command {
    pub fn new(program: String) -> Command {
        Command {
            program: program,
            args: Vec::new(),
            cwd: None,
        }
    }

    /// Add an argument to pass to the program.
    pub fn arg(&mut self, arg: String) -> &mut Command {
        self.args.push(arg);
        self
    }

    /// Add multiple arguments to pass to the program.
    pub fn args(&mut self, args: &[String]) -> &mut Command {
        self.args.extend_from_slice(args);
        self
    }

    /// Set the working directory for the child process.
    pub fn current_dir(&mut self, dir: String) -> &mut Command {
        self.cwd = Some(dir);
        self
    }

    /// Executes the command as a child process, which is returned.
    pub fn spawn(&self) -> io::Result<Child> {
        /* ... */
    }
}

Note that the spawn method, which actually uses the builder configuration to
spawn a process, takes the builder by immutable reference. This is possible
because spawning the process does not require ownership of the configuration
data.

Because the terminal spawn method only needs a reference, the configuration
methods take and return a mutable borrow of self.

The benefit

By using borrows throughout, Command can be used conveniently for both
one-liner and more complex constructions:

// One-liners
Command::new("/bin/cat").arg("file.txt").spawn();

// Complex configuration
let mut cmd = Command::new("/bin/ls");
cmd.arg(".");
if size_sorted {
    cmd.arg("-S");
}
cmd.spawn();

Consuming builders:

Sometimes builders must transfer ownership when constructing the final type T,
meaning that the terminal methods must take self rather than &self.

impl TaskBuilder {
    /// Name the task-to-be.
    pub fn named(mut self, name: String) -> TaskBuilder {
        self.name = Some(name);
        self
    }

    /// Redirect task-local stdout.
    pub fn stdout(mut self, stdout: Box<io::Write + Send>) -> TaskBuilder {
        self.stdout = Some(stdout);
        self
    }

    /// Creates and executes a new child task.
    pub fn spawn<F>(self, f: F) where F: FnOnce() + Send {
        /* ... */
    }
}

Here, the stdout configuration involves passing ownership of an io::Write,
which must be transferred to the task upon construction (in spawn).

When the terminal methods of the builder require ownership, there is a basic
tradeoff:

  • If the other builder methods take/return a mutable borrow, the complex
    configuration case will work well, but one-liner configuration becomes
    impossible.

  • If the other builder methods take/return an owned self, one-liners continue
    to work well but complex configuration is less convenient.

Under the rubric of making easy things easy and hard things possible, all
builder methods for a consuming builder should take and returned an owned
self. Then client code works as follows:

// One-liners
TaskBuilder::new("my_task").spawn(|| { /* ... */ });

// Complex configuration
let mut task = TaskBuilder::new();
task = task.named("my_task_2"); // must re-assign to retain ownership
if reroute {
    task = task.stdout(mywriter);
}
task.spawn(|| { /* ... */ });

One-liners work as before, because ownership is threaded through each of the
builder methods until being consumed by spawn. Complex configuration, however,
is more verbose: it requires re-assigning the builder at each step.

Dependability

Functions validate their arguments (C-VALIDATE)

Rust APIs do not generally follow the robustness principle: "be conservative
in what you send; be liberal in what you accept".

Instead, Rust code should enforce the validity of input whenever practical.

Enforcement can be achieved through the following mechanisms (listed in order of
preference).

Static enforcement:

Choose an argument type that rules out bad inputs.

For example, prefer

fn foo(a: Ascii) { /* ... */ }

over

fn foo(a: u8) { /* ... */ }

where Ascii is a wrapper around u8 that guarantees the highest bit is
zero; see newtype patterns for more details on creating typesafe
wrappers.

Static enforcement usually comes at little run-time cost: it pushes the costs to
the boundaries (e.g. when a u8 is first converted into an Ascii). It also
catches bugs early, during compilation, rather than through run-time failures.

On the other hand, some properties are difficult or impossible to express using
types.

Dynamic enforcement:

Validate the input as it is processed (or ahead of time, if necessary). Dynamic
checking is often easier to implement than static checking, but has several
downsides:

  1. Runtime overhead (unless checking can be done as part of processing the
    input).
  2. Delayed detection of bugs.
  3. Introduces failure cases, either via fail! or Result/Option types,
    which must then be dealt with by client code.

Dynamic enforcement with debug_assert!:

Same as dynamic enforcement, but with the possibility of easily turning off
expensive checks for production builds.

Dynamic enforcement with opt-out:

Same as dynamic enforcement, but adds sibling functions that opt out of the
checking.

The convention is to mark these opt-out functions with a suffix like
_unchecked or by placing them in a raw submodule.

The unchecked functions can be used judiciously in cases where (1) performance
dictates avoiding checks and (2) the client is otherwise confident that the
inputs are valid.

Destructors never fail (C-DTOR-FAIL)

Destructors are executed on task failure, and in that context a failing
destructor causes the program to abort.

Instead of failing in a destructor, provide a separate method for checking for
clean teardown, e.g. a close method, that returns a Result to signal
problems.

Destructors that may block have alternatives (C-DTOR-BLOCK)

Similarly, destructors should not invoke blocking operations, which can make
debugging much more difficult. Again, consider providing a separate method for
preparing for an infallible, nonblocking teardown.

Debuggability

All public types implement Debug (C-DEBUG)

If there are exceptions, they are rare.

Debug representation is never empty (C-DEBUG-NONEMPTY)

Even for conceptually empty values, the Debug representation should never be
empty.

let empty_str = "";
assert_eq!(format!("{:?}", empty_str), "\"\"");

let empty_vec = Vec::<bool>::new();
assert_eq!(format!("{:?}", empty_vec), "[]");

Future proofing

Structs have private fields (C-STRUCT-PRIVATE)

Making a field public is a strong commitment: it pins down a representation
choice, and prevents the type from providing any validation or maintaining any
invariants on the contents of the field, since clients can mutate it arbitrarily.

Public fields are most appropriate for struct types in the C spirit: compound,
passive data structures. Otherwise, consider providing getter/setter methods and
hiding fields instead.

Newtypes encapsulate implementation details (C-NEWTYPE-HIDE)

A newtype can be used to hide representation details while making precise
promises to the client.

For example, consider a function my_transform that returns a compound iterator
type.

use std::iter::{Enumerate, Skip};

pub fn my_transform<I: Iterator>(input: I) -> Enumerate<Skip<I>> {
    input.skip(3).enumerate()
}

We wish to hide this type from the client, so that the client's view of the
return type is roughly Iterator<Item = (usize, T)>. We can do so using the
newtype pattern:

use std::iter::{Enumerate, Skip};

pub struct MyTransformResult<I>(Enumerate<Skip<I>>);

impl<I: Iterator> Iterator for MyTransformResult<I> {
    type Item = (usize, I::Item);

    fn next(&mut self) -> Option<Self::Item> {
        self.0.next()
    }
}

pub fn my_transform<I: Iterator>(input: I) -> MyTransformResult<I> {
    MyTransformResult(input.skip(3).enumerate())
}

Aside from simplifying the signature, this use of newtypes allows us to promise
less to the client. The client does not know how the result iterator is
constructed or represented, which means the representation can change in the
future without breaking client code.

In the future the same thing can be accomplished more concisely with the impl Trait feature but this is currently unstable.

#![feature(conservative_impl_trait)]

pub fn my_transform<I: Iterator>(input: I) -> impl Iterator<Item = (usize, I::Item)> {
    input.skip(3).enumerate()
}

Necessities

Public dependencies of a stable crate are stable (C-STABLE)

A crate cannot be stable (>=1.0.0) without all of its public dependencies being
stable.

Public dependencies are crates from which types are used in the public API of
the current crate.

pub fn do_my_thing(arg: other_crate::TheirThing) { /* ... */ }

A crate containing this function cannot be stable unless other_crate is also
stable.

Be careful because public dependencies can sneak in at unexpected places.

pub struct Error {
    private: ErrorImpl,
}

enum ErrorImpl {
    Io(io::Error),
    // Should be okay even if other_crate isn't
    // stable, because ErrorImpl is private.
    Dep(other_crate::Error),
}

// Oh no! This puts other_crate into the public API
// of the current crate.
impl From<other_crate::Error> for Error {
    fn from(err: other_crate::Error) -> Self {
        Error { private: ErrorImpl::Dep(err) }
    }
}

Crate and its dependencies have a permissive license (C-PERMISSIVE)

The software produced by the Rust project is dual-licensed, under
either the MIT or Apache 2.0 licenses. Crates that simply need the
maximum compatibility with the Rust ecosystem are recommended to do
the same, in the manner described herein. Other options are described
below.

These API guidelines do not provide a detailed explanation of Rust's
license, but there is a small amount said in the Rust FAQ. These
guidelines are concerned with matters of interoperability with Rust,
and are not comprehensive over licensing options.

To apply the Rust license to your project, define the license field
in your Cargo.toml as:

[package]
name = "..."
version = "..."
authors = ["..."]
license = "MIT/Apache-2.0"

And toward the end of your README.md:

## License

Licensed under either of

 * Apache License, Version 2.0
   ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license
   ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.

Besides the dual MIT/Apache-2.0 license, another common licensing approach
used by Rust crate authors is to apply a single permissive license such as
MIT or BSD. This license scheme is also entirely compatible with Rust's,
because it imposes the minimal restrictions of Rust's MIT license.

Crates that desire perfect license compatibility with Rust are not
recommended to choose only the Apache license. The Apache license,
though it is a permissive license, imposes restrictions beyond the MIT
and BSD licenses that can discourage or prevent their use in some
scenarios, so Apache-only software cannot be used in some situations
where most of the Rust runtime stack can.

The license of a crate's dependencies can affect the restrictions on
distribution of the crate itself, so a permissively-licensed crate
should generally only depend on permissively-licensed crates.

External Links

License

This guidelines document is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in this document by you, as defined in the Apache-2.0 license,
shall be dual licensed as above, without any additional terms or conditions.

@epage epage added this to the 4.0 milestone Dec 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant