diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 8e4ad4a8b1..a77b9d6df9 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -31,10 +31,10 @@ - [Controlling visibility with `pub`](ch07-02-controlling-visibility-with-pub.md) - [Importing names with `use`](ch07-03-importing-names-with-use.md) -- [Basic Collections]() - - [Vectors]() - - [Strings]() - - [`HashMap`]() +- [Fundamental Collections](ch08-00-fundamental-collections.md) + - [Vectors](ch08-01-vectors.md) + - [Strings](ch08-02-strings.md) + - [Hash Maps](ch08-03-hash-maps.md) - [Error Handling]() diff --git a/src/ch08-00-fundamental-collections.md b/src/ch08-00-fundamental-collections.md new file mode 100644 index 0000000000..3ab024a211 --- /dev/null +++ b/src/ch08-00-fundamental-collections.md @@ -0,0 +1,18 @@ +# Fundamental Collections + +Rust's standard library includes a number of really useful data structures +called *collections*. Most other types represent one specific value, but +collections can contain multiple values inside of them. Each collection has +different capabilities and costs, and choosing an appropriate one for the +situation you're in is a skill you'll develop over time. In this chapter, we'll +go over three collections which are used very often in Rust programs: + +* A *vector* allows us to store a variable number of values next to each other. +* A *string* is a collection of characters. We've seen the `String` type + before, but we'll talk about it in depth now. +* A *hash map* allows us to associate a value with a particular key. + +There are more specialized variants of each of these data structures for +particular situations, but these are the most fundamental and common. We're +going to discuss how to create and update each of the collections, as well as +what makes each special. diff --git a/src/ch08-01-vectors.md b/src/ch08-01-vectors.md new file mode 100644 index 0000000000..6b5daac141 --- /dev/null +++ b/src/ch08-01-vectors.md @@ -0,0 +1,215 @@ +## Vectors + +The first type we'll look at is `Vec`, also known as a *vector*. Vectors +allow us to store more than one value in a single data structure that puts all +the values next to each other in memory. + +### Creating a New Vector + +To create a new vector, we can call the `new` function: + +```rust +let v: Vec = Vec::new(); +``` + +Note that we added a type annotation here. Since we don't actually do +anything with the vector, Rust doesn't know what kind of elements we intend to +store. This is an important point. Vectors are homogenous: they may store many +values, but those values must all be the same type. Vectors are generic over +the type stored inside them (we'll talk about Generics more throroughly in +Chapter 10), and the angle brackets here tell Rust that this vector will hold +elements of the `i32` type. + +That said, in real code, we very rarely need to do this type annotation since +Rust can infer the type of value we want to store once we insert values. Let's +look at how to modify a vector next. + +### Updating a Vector + +To put elements in the vector, we can use the `push` method: + +```rust +let mut v = Vec::new(); + +v.push(5); +v.push(6); +v.push(7); +v.push(8); +``` + +Since these numbers are `i32`s, Rust infers the type of data we want to store +in the vector, so we don't need the `` annotation. + +We can improve this code even further. Creating a vector with some initial +values like this is very common, so there's a macro to do it for us: + +```rust +let v = vec![5, 6, 7, 8]; +``` + +This macro does a similar thing to our previous example, but it's much more +convenient. + +### Dropping a Vector Drops its Elements + +Like any other `struct`, a vector will be freed when it goes out of scope: + +```rust +{ + let v = vec![1, 2, 3, 4]; + + // do stuff with v + +} // <- v goes out of scope and is freed here +``` + +When the vector gets dropped, it will also drop all of its contents, so those +integers are going to be cleaned up as well. This may seem like a +straightforward point, but can get a little more complicated once we start to +introduce references to the elements of the vector. Let's tackle that next! + +### Reading Elements of Vectors + +Now that we know how creating and destroying vectors works, knowing how to read +their contents is a good next step. There are two ways to reference a value +stored in a vector. In the following examples of these two ways, we've +annotated the types of the values that are returned from these functions for +extra clarity: + +```rust +let v = vec![1, 2, 3, 4, 5]; + +let third: &i32 = &v[2]; +let third: Option<&i32> = v.get(2); +``` + +First, note that we use the index value of `2` to get the third element: +vectors are indexed by number, starting at zero. Secondly, the two different +ways to get the third element are using `&` and `[]`s and using the `get` +method. The square brackets give us a reference, and `get` gives us an +`Option<&T>`. The reason we have two ways to reference an element is so that we +can choose the behavior we'd like to have if we try to use an index value that +the vector doesn't have an element for: + +```rust,should_panic +let v = vec![1, 2, 3, 4, 5]; + +let does_not_exist = &v[100]; +let does_not_exist = v.get(100); +``` + +With the `[]`s, Rust will cause a `panic!`. With the `get` method, it will +instead return `None` without `panic!`ing. Deciding which way to access +elements in a vector depends on whether we consider an attempted access past +the end of the vector to be an error, in which case we'd want the `panic!` +behavior, or whether this will happen occasionally under normal circumstances +and our code will have logic to handle getting `Some(&element)` or `None`. + +Once we have a valid reference, the borrow checker will enforce the ownership +and borrowing rules we covered in Chapter 4 in order to ensure this and other +references to the contents of the vector stay valid. This means in a function +that owns a `Vec`, we can't return a reference to an element since the `Vec` +will be cleaned up at the end of the function: + +```rust,ignore +fn element() -> String { + let list = vec![String::from("hi"), String::from("bye")]; + list[1] +} +``` + +Trying to compile this will result in the following error: + +```bash +error: cannot move out of indexed content [--explain E0507] + |> +4 |> list[1] + |> ^^^^^^^ cannot move out of indexed content +``` + +Since `list` goes out of scope and gets cleaned up at the end of the function, +the reference `list[1]` cannot be returned because it would outlive `list`. + +Here's another example of code that looks like it should be allowed, but it +won't compile because the references actually aren't valid anymore: + +```rust,ignore +let mut v = vec![1, 2, 3, 4, 5]; + +let first = &v[0]; + +v.push(6); +``` + +Compiling this will give us this error: + +```bash +error: cannot borrow `v` as mutable because it is also borrowed as immutable +[--explain E0502] + |> +5 |> let first = &v[0]; + |> - immutable borrow occurs here +7 |> v.push(6); + |> ^ mutable borrow occurs here +9 |> } + |> - immutable borrow ends here +``` + +This violates one of the ownership rules we covered in Chapter 4: the `push` +method needs to have a mutable borrow to the `Vec`, and we aren't allowed to +have any immutable borrows while we have a mutable borrow. + +Why is it an error to have a reference to the first element in a vector while +we try to add a new item to the end, though? Due to the way vectors work, +adding a new element onto the end might require allocating new memory and +copying the old elements over to the new space if there wasn't enough room to +put all the elements next to each other where the vector was. If this happened, +our reference would be pointing to deallocated memory. For more on this, see +[The Nomicon](https://doc.rust-lang.org/stable/nomicon/vec.html). + +### Using an Enum to Store Multiple Types + +Let's put vectors together with what we learned about enums in Chapter 6. At +the beginning of this section, we said that vectors will only store values that +are all the same type. This can be inconvenient; there are definitely use cases +for needing to store a list of things that might be different types. Luckily, +the variants of an enum are all the same type as each other, so when we're in +this scenario, we can define and use an enum! + +For example, let's say we're going to be getting values for a row in a +spreadsheet. Some of the columns contain integers, some floating point numbers, +and some strings. We can define an enum whose variants will hold the different +value types. All of the enum variants will then be the same type, that of the +enum. Then we can create a vector that, ultimately, holds different types: + +```rust +enum SpreadsheetCell { + Int(i32), + Float(f64), + Text(String), +} + +let row = vec![ + SpreadsheetCell::Int(3), + SpreadsheetCell::Text(String::from("blue")), + SpreadsheetCell::Float(10.12), +]; +``` + +This has the advantage of being explicit about what types are allowed in this +vector. If we allowed any type to be in a vector, there would be a chance that +the vector would hold a type that would cause errors with the operations we +performed on the vector. Using an enum plus a `match` where we access elements +in a vector like this means that Rust will ensure at compile time that we +always handle every possible case. + +Using an enum for storing different types in a vector does imply that we need +to know the set of types we'll want to store at compile time. If that's not the +case, instead of an enum, we can use a trait object. We'll learn about those in +Chapter XX. + +Now that we've gone over some of the most common ways to use vectors, be sure +to take a look at the API documentation for other useful methods defined on +`Vec` by the standard library. For example, in addition to `push` there's a +`pop` method that will remove and return the last element. Let's move on to the +next collection type: `String`! diff --git a/src/ch08-02-strings.md b/src/ch08-02-strings.md new file mode 100644 index 0000000000..801cda8ecb --- /dev/null +++ b/src/ch08-02-strings.md @@ -0,0 +1,370 @@ +## Strings + +We've already talked about strings a bunch in Chapter 4, but let's take a more +in-depth look at them now. + +### Many Kinds of Strings + +Strings are a common place for new Rustaceans to get stuck. This is due to a +combination of three things: Rust's propensity for making sure to expose +possible errors, strings being a more complicated data structure than many +programmers give them credit for, and UTF-8. These things combine in a way that +can seem difficult coming from other languages. + +Before we can dig into those aspects, we need to talk about what exactly we +even mean by the word 'string'. Rust actually only has one string type in the +core language itself: `&str`. We talked about *string slices* in Chapter 4: +they're a reference to some UTF-8 encoded string data stored somewhere else. +String literals, for example, are stored in the binary output of the program, +and are therefore string slices. + +Rust's standard library is what provides the type called `String`. This is a +growable, mutable, owned, UTF-8 encoded string type. When Rustaceans talk about +'strings' in Rust, they usually mean "`String` and `&str`". This chapter is +largely about `String`, and these two types are used heavily in Rust's standard +library. Both `String` and string slices are UTF-8 encoded. + +Rust's standard library also includes a number of other string types, such as +`OsString`, `OsStr`, `CString`, and `CStr`. Library crates may provide even +more options for storing string data. Similarly to the `*String`/`*Str` naming, +they often provide an owned and borrowed variant, just like `String`/`&str`. +These string types may store different encodings or be represented in memory in +a different way, for example. We won't be talking about these other string +types in this chapter; see their API documentation for more about how to use +them and when each is appropriate. + +### Creating a New String + +Let's look at how to do the same operations on `String` as we did with `Vec`, +starting with creating one. Similarly, `String` has `new`: + +```rust +let s = String::new(); +``` + +Often, we'll have some initial data that we'd like to start the string off with. +For that, there's the `to_string` method: + +```rust +let data = "initial contents"; + +let s = data.to_string(); + +// the method also works on a literal directly: +let s = "initial contents".to_string(); +``` + +This form is equivalent to using `to_string`: + +```rust +let s = String::from("Initial contents"); +``` + +Since strings are used for so many things, there are many different generic +APIs that make sense for strings. There are a lot of options, and some of them +can feel redundant because of this, but they all have their place! In this +case, `String::from` and `.to_string` end up doing the exact same thing, so +which you choose is a matter of style. Some people use `String::from` for +literals, and `.to_string` for variable bindings. Most Rust style is pretty +uniform, but this specific question is one of the most debated. + +Remember that strings are UTF-8 encoded, so we can include any properly encoded +data in them: + +```rust +let hello = "السلام عليكم"; +let hello = "Dobrý den"; +let hello = "Hello"; +let hello = "שָׁלוֹם"; +let hello = "नमस्ते"; +let hello = "こんにちは"; +let hello = "안녕하세요"; +let hello = "你好"; +let hello = "Olá"; +let hello = "Здравствуйте"; +let hello = "Hola"; +``` + +### Updating a String + +A `String` can be changed and can grow in size, just like a `Vec` can. + +#### Push + +We can grow a `String` by using the `push_str` method to append another +string: + +```rust +let mut s = String::from("foo"); +s.push_str("bar"); +``` + +`s` will contain "foobar" after these two lines. + +The `push` method will add a `char`: + +```rust +let mut s = String::from("lo"); +s.push('l'); +``` + +`s` will contain "lol" after this point. + +We can make any `String` contain the empty string with the `clear` method: + +```rust +let mut s = String::from("Noooooooooooooooooooooo!"); +s.clear(); +``` + +Now `s` will be the empty string, "". + +#### Concatenation + +Often, we'll want to combine two strings together. One way is to use the `+` +operator: + +```rust +let s1 = String::from("Hello, "); +let s2 = String::from("world!"); +let s3 = s1 + &s2; +``` + +This code will make `s3` contain "Hello, world!" There's some tricky bits here, +though, that come from the type signature of `+` for `String`. The signature +for the `add` method that the `+` operator uses looks something like this: + +```rust,ignore +fn add(self, s: &str) -> String { +``` + +This isn't excatly what the actual signature is in the standard library because +`add` is defined using generics there. Here, we're just looking at what the +signature of the method would be if `add` was defined specifically for +`String`. This signature gives us the clues we need in order to understand the +tricky bits of `+`. + +First of all, `s2` has an `&`. This is because of the `s` argument in the `add` +function: we can only add a `&str` to a `String`, we can't add two `String`s +together. Remember back in Chapter 4 when we talked about how `&String` will +coerce to `&str`: we write `&s2` so that the `String` will coerce to the proper +type, `&str`. + +Secondly, `add` takes ownership of `self`, which we can tell because `self` +does *not* have an `&` in the signature. This means `s1` in the above example +will be moved into the `add` call and no longer be a valid binding after that. +So while `let s3 = s1 + &s2;` looks like it will copy both strings and create a +new one, this statement actually takes ownership of `s1`, appends a copy of +`s2`'s contents, then returns ownership of the result. In other words, it looks +like it's making a lot of copies, but isn't: the implementation is more +efficient than copying. + +If we need to concatenate multiple strings, this behavior of `+` gets +unwieldy: + +```rust +let s1 = String::from("tic"); +let s2 = String::from("tac"); +let s3 = String::from("toe"); + +let s = s1 + "-" + &s2 + "-" + &s3; +``` + +`s` will be "tic-tac-toe" at this point. With all of the `+` and `"` +characters, it gets hard to see what's going on. For more complicated string +combining, we can use the `format!` macro: + +```rust +let s1 = String::from("tic"); +let s2 = String::from("tac"); +let s3 = String::from("toe"); + +let s = format!("{}-{}-{}", s1, s2, s3); +``` + +This code will also set `s` to "tic-tac-toe". The `format!` macro works in the +same way as `println!`, but instead of printing the output to the screen, it +returns a `String` with the contents. This version is much easier to read than +all of the `+`s. + +### Indexing into Strings + +In many other languages, accessing individual characters in a string by +referencing the characters by index is a valid and common operation. In Rust, +however, if we try to access parts of a `String` using indexing syntax, we'll +get an error. That is, this code: + +```rust,ignore +let s1 = String::from("hello"); +let h = s1[0]; +``` + +will result in this error: + +```text +error: the trait bound `std::string::String: std::ops::Index<_>` is not +satisfied [--explain E0277] + |> + |> let h = s1[0]; + |> ^^^^^ +note: the type `std::string::String` cannot be indexed by `_` +``` + +The error and the note tell the story: Rust strings don't support indexing. So +the follow-up question is, why not? In order to answer that, we have to talk a +bit about how Rust stores strings in memory. + +#### Internal Representation + +A `String` is a wrapper over a `Vec`. Let's take a look at some of our +properly-encoded UTF-8 example strings from before. First, this one: + +```rust +let len = "Hola".len(); +``` + +In this case, `len` will be four, which means the `Vec` storing the string +"Hola" is four bytes long: each of these letters takes one byte when encoded in +UTF-8. What about this example, though? + +```rust +let len = "Здравствуйте".len(); +``` + +There are two answers that potentially make sense here: the first is 12, which +is the number of letters that a person would count if we asked someone how long +this string was. The second, though, is what Rust's answer is: 24. This is the +number of bytes that it takes to encode "Здравствуйте" in UTF-8, because each +character takes two bytes of storage. + +By the same token, imagine this invalid Rust code: + +```rust,ignore +let hello = "Здравствуйте"; +let answer = &h[0]; +``` + +What should the value of `answer` be? Should it be `З`, the first letter? When +encoded in UTF-8, the first byte of `З` is `208`, and the second is `151`. So +should `answer` be `208`? `208` is not a valid character on its own, though. +Plus, for latin letters, this would not return the answer most people would +expect: `&"hello"[0]` would then return `104`, not `h`. + +#### Bytes and Scalar Values and Grapheme Clusters! Oh my! + +This leads to another point about UTF-8: there are really three relevant ways +to look at strings, from Rust's perspective: bytes, scalar values, and grapheme +clusters. If we look at the string "नमस्ते", it is ultimately stored as a `Vec` +of `u8` values that looks like this: + +```text +[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135] +``` + +That's 18 bytes. But if we look at them as Unicode scalar values, which are +what Rust's `char` type is, those bytes look like this: + +```text +['न', 'म', 'स', '्', 'त', 'े'] +``` + +There are six `char` values here. Finally, if we look at them as grapheme +clusters, which is the closest thing to what humans would call 'letters', we'd +get this: + +```text +["न", "म", "स्", "ते"] +``` + +Four elements! It turns out that even within 'grapheme cluster', there are +multiple ways of grouping things. Convinced that strings are actually really +complicated yet? + +Another reason that indexing into a `String` to get a character is not available +is that indexing operations are expected to always be fast. This isn't possible +with a `String`, since Rust would have to walk through the contents from the +beginning to the index to determine how many valid characters there were, no +matter how we define "character". + +All of these problems mean that Rust does not implement `[]` for `String`, so +we cannot directly do this. + +### Slicing Strings + +However, indexing the bytes of a string is very useful, and is not expected to +be fast. While we can't use `[]` with a single number, we _can_ use `[]` with +a range to create a string slice from particular bytes: + +```rust +let hello = "Здравствуйте"; + +let s = &hello[0..4]; +``` + +Here, `s` will be a `&str` that contains the first four bytes of the string. +Earlier, we mentioned that each of these characters was two bytes, so that means +that `s` will be "Зд". + +What would happen if we did `&hello[0..1]`? The answer: it will panic at +runtime, in the same way that accessing an invalid index in a vector does: + +```bash +thread 'main' panicked at 'index 0 and/or 1 in `Здравствуйте` do not lie on +character boundary', ../src/libcore/str/mod.rs:1694 +``` + +### Methods for Iterating Over Strings + +If we do need to perform operations on individual characters, the best way to +do that is using the `chars` method. Calling `chars` on "नमस्ते" gives us the six +Rust `char` values: + +```rust +for c in "नमस्ते".chars() { + println!("{}", c); +} +``` + +This code will print: + +```bash +न +म +स +् +त +े +``` + +The `bytes` method returns each raw byte, which might be appropriate for your +domain, but remember that valid UTF-8 characters may be made up of more than +one byte: + +```rust +for b in "नमस्ते".bytes() { + println!("{}", b); +} +``` + +This code will print the 18 bytes that make up this `String`, starting with: + +```bash +224 +164 +168 +224 +// ... etc +``` + +There are crates available on crates.io to get grapheme clusters from `String`s. + +To summarize, strings are complicated. Different programming languages make +different choices about how to present this complexity to the programmer. Rust +has chosen to attempt to make correct handling of `String` data be the default +for all Rust programs, which does mean programmers have to put more thought +into handling UTF-8 data upfront. This tradeoff exposes us to more of the +complexity of strings than we have to handle in other languages, but will +prevent us from having to handle errors involving non-ASCII characters later in +our development lifecycle. + +Let's switch to something a bit less complex: Hash Map! diff --git a/src/ch08-03-hash-maps.md b/src/ch08-03-hash-maps.md new file mode 100644 index 0000000000..397b0f5b30 --- /dev/null +++ b/src/ch08-03-hash-maps.md @@ -0,0 +1,261 @@ +## Hash Maps + +The last of our fundamental collections is the *hash map*. The type `HashMap` stores a mapping of keys of type `K` to values of type `V`. It does this +via a *hashing function*, which determines how it places these keys and values +into memory. Many different programming languges support this kind of data +structure, but often with a different name: hash, map, object, hash table, or +associative array, just to name a few. + +We'll go over the basic API in this chapter, but there are many more goodies +hiding in the functions defined on `HashMap` by the standard library. As always, +check the standard library documentation for more information. + +### Creating a New Hash Map + +We can create an empty `HashMap` with `new`, and add elements with `insert`: + +```rust +use std::collections::HashMap; + +let mut map = HashMap::new(); + +map.insert(1, "hello"); +map.insert(2, "world"); +``` + +Note that we need to `use` the `HashMap` from the collections portion of the +standard library. Of our three fundamental collections, this one is the least +often used, so it has a bit less support from the language. There's no built-in +macro to construct them, for example, and they're not in the prelude, so we +need to add a `use` statement for them. + +Just like vectors, hash maps store their data on the heap. This `HashMap` has +keys of type `i32` and values of type `&str`. Like vectors, hash maps are +homogenous: all of the keys must have the same type, and all of the values must +have the same type. + +If we have a vector of tuples, we can convert it into a hash map with the +`collect` method. The first element in each tuple will be the key, and the +second element will be the value: + +```rust +use std::collections::HashMap; + +let data = vec![(1, "hello"), (2, "world")]; + +let map: HashMap<_, _> = data.into_iter().collect(); +``` + +The type annotation `HashMap<_, _>` is needed here because it's possible to +`collect` into many different data structures, so Rust doesn't know which we +want. For the type parameters for the key and value types, however, we can use +underscores and Rust can infer the types that the hash map contains based on the +types of the data in our vector. + +For types that implement the `Copy` trait like `i32` does, the values are +copied into the hash map. If we insert owned values like `String`, the values +will be moved and the hash map will be the owner of those values: + +```rust +use std::collections::HashMap; + +let field_name = String::from("Favorite color"); +let field_value = String::from("Blue"); + +let mut map = HashMap::new(); +map.insert(field_name, field_value); +// field_name and field_value are invalid at this point +``` + +We would not be able to use the bindings `field_name` and `field_value` after +they have been moved into the hash map with the call to `insert`. + +If we insert references to values, the values themselves will not be moved into +the hash map. The values that the references point to must be valid for at least +as long as the hash map is valid, though. We will talk more about these issues +in the Lifetimes section of Chapter 10. + +### Accessing Values in a Hash Map + +We can get a value out of the hash map by providing its key to the `get` method: + +```rust +use std::collections::HashMap; + +let mut map = HashMap::new(); + +map.insert(1, "hello"); +map.insert(2, "world"); + +let value = map.get(&2); +``` + +Here, `value` will have the value `Some("world")`, since that's the value +associated with the `2` key. "world" is wrapped in `Some` because `get` returns +an `Option`. If there's no value for that key in the hash map, `get` will +return `None`. + +We can iterate over each key/value pair in a hash map in a similar manner as we +do with vectors, using a `for` loop: + +```rust +use std::collections::HashMap; + +let mut map = HashMap::new(); + +map.insert(1, "hello"); +map.insert(2, "world"); + +for (key, value) in &map { + println!("{}: {}", key, value); +} +``` + +This will print: + +```bash +1: hello +2: world +``` + +### Updating a Hash Map + +Since each key can only have one value, when we want to change the data in a +hash map, we have to decide how to handle the case when a key already has a +value assigned. We could choose to replace the old value with the new value. We +could choose to keep the old value and ignore the new value, and only add the +new value if the key *doesn't* already have a value. Or we could change the +existing value. Let's look at how to do each of these! + +#### Overwriting a Value + +If we insert a key and a value, then insert that key with a different value, +the value associated with that key will be replaced. Even though this code +calls `insert` twice, the hash map will only contain one key/value pair, since +we're inserting with the key `1` both times: + +```rust +use std::collections::HashMap; + +let mut map = HashMap::new(); + +map.insert(1, "hello"); +map.insert(1, "Hi There"); + +println!("{:?}", map); +``` + +This will print `{1: "Hi There"}`. + +#### Only Insert If the Key Has No Value + +It's common to want to see if there's some sort of value already stored in the +hash map for a particular key, and if not, insert a value. hash maps have a +special API for this, called `entry`, that takes the key we want to check as an +argument: + +```rust +use std::collections::HashMap; + +let mut map = HashMap::new(); +map.insert(1, "hello"); + +let e = map.entry(2); +``` + +Here, the value bound to `e` is a special enum, `Entry`. An `Entry` represents a +value that might or might not exist. Let's say that we want to see if the key +`2` has a value associated with it. If it doesn't, we want to insert the value +"world". In both cases, we want to return the resulting value that now goes +with `2`. With the entry API, it looks like this: + +```rust +use std::collections::HashMap; + +let mut map = HashMap::new(); + +map.insert(1, "hello"); + +map.entry(2).or_insert("world"); +map.entry(1).or_insert("Hi There"); + +println!("{:?}", map); +``` + +The `or_insert` method on `Entry` does exactly this: returns the value for the +`Entry`'s key if it exists, and if not, inserts its argument as the new value +for the `Entry`'s key and returns that. This is much cleaner than writing the +logic ourselves, and in addition, plays more nicely with the borrow checker. + +This code will print `{1: "hello", 2: "world"}`. The first call to `entry` will +insert the key `2` with the value "world", since `2` doesn't have a value +already. The second call to `entry` will not change the hash map since `1` +already has the value "hello". + +#### Update a Value Based on the Old Value + +Another common use case for hash maps is to look up a key's value then update +it, using the old value. For instance, if we wanted to count how many times +each word appeared in some text, we could use a hash map with the words as keys +and increment the value to keep track of how many times we've seen that word. +If this is the first time we've seen a word, we'll first insert the value `0`. + +```rust +use std::collections::HashMap; + +let text = "hello world wonderful world"; + +let mut map = HashMap::new(); + +for word in text.split_whitespace() { + let count = map.entry(word).or_insert(0); + *count += 1; +} + +println!("{:?}", map); +``` + +This will print `{"world": 2, "hello": 1, "wonderful": 1}`. The `or_insert` +method actually returns a mutable reference (`&mut V`) to the value in the +hash map for this key. Here we store that mutable reference in the `count` +variable binding, so in order to assign to that value we must first dereference +`count` using the asterisk (`*`). The mutable reference goes out of scope at +the end of the `for` loop, so all of these changes are safe and allowed by the +borrowing rules. + +### Hashing Function + +By default, `HashMap` uses a cryptographically secure hashing function that can +provide resistance to Denial of Service (DoS) attacks. This is not the fastest +hashing algorithm out there, but the tradeoff for better security that comes +with the drop in performance is a good default tradeoff to make. If you profile +your code and find that the default hash function is too slow for your +purposes, you can switch to another function by specifying a different +*hasher*. A hasher is an object that implements the `BuildHasher` trait. We'll +be talking about traits and how to implement them in Chapter 10. + +## Summary + +Vectors, strings, and hash maps will take you far in programs where you need to +store, access, and modify data. Some programs you are now equipped to write and +might want to try include: + +* Given a list of integers, use a vector and return their mean (average), + median (when sorted, the value in the middle position), and mode (the value + that occurs most often; a hash map will be helpful here). +* Convert strings to Pig Latin, where the first consonant of each word gets + moved to the end with an added "ay", so "first" becomes "irst-fay". Words that + start with a vowel get an h instead ("apple" becomes "apple-hay"). Remember + about UTF-8 encoding! +* Using a hash map and vectors, create a text interface to allow a user to add + employee names to a department in the company. For example, "Add Sally to + Engineering" or "Add Ron to Sales". Then let the user retrieve a list of all + people in a department or all people in the company by department, sorted + alphabetically. + +The standard library API documentation describes methods these types have that +will be helpful for these exercises! + +We're getting into more complex programs where operations can fail, which means +it's a perfect time to go over error handling next!