datatable-intro-fr.pot

#
msgid ""
msgstr ""

#: d.Rmd:block 1 (header)
msgid ""
"title: \"Introduction to data.table\"\n"
"date: \"`r Sys.Date()`\"\n"
"output:\n"
"markdown::html_format\n"
"vignette: >\n"
"%\\VignetteIndexEntry{Introduction to data.table}\n"
"%\\VignetteEngine{knitr::knitr}\n"
"\\usepackage[utf8]{inputenc}"
msgstr ""

#: d.Rmd:block 3 (paragraph)
msgid ""
"This vignette introduces the `data.table` syntax, its general form, how to "
"*subset* rows, *select and compute* on columns, and perform aggregations *by"
" group*. Familiarity with the `data.frame` data structure from base R is "
"useful, but not essential to follow this vignette."
msgstr ""

#: d.Rmd:block 4 (header)
msgid "Data analysis using `data.table`"
msgstr ""

#: d.Rmd:block 5 (paragraph)
msgid ""
"Data manipulation operations such as *subset*, *group*, *update*, *join*, "
"etc. are all inherently related. Keeping these *related operations together*"
" allows for:"
msgstr ""

#: d.Rmd:block 6 (unordered list)
msgid ""
"*concise* and *consistent* syntax irrespective of the set of operations you "
"would like to perform to achieve your end goal."
msgstr ""

#: d.Rmd:block 6 (unordered list)
msgid ""
"performing analysis *fluidly* without the cognitive burden of having to map "
"each operation to a particular function from a potentially huge set of "
"functions available before performing the analysis."
msgstr ""

#: d.Rmd:block 6 (unordered list)
msgid ""
"*automatically* optimising operations internally and very effectively by "
"knowing precisely the data required for each operation, leading to very fast"
" and memory-efficient code."
msgstr ""

#: d.Rmd:block 7 (paragraph)
msgid ""
"Briefly, if you are interested in reducing *programming* and *compute* time "
"tremendously, then this package is for you. The philosophy that `data.table`"
" adheres to makes this possible. Our goal is to illustrate it through this "
"series of vignettes."
msgstr ""

#: d.Rmd:block 8 (header)
msgid "Data {#data}"
msgstr ""

#: d.Rmd:block 9 (paragraph)
msgid ""
"In this vignette, we will use [NYC-"
"flights14](https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv)"
" data obtained from the [flights](https://github.com/arunsrinivasan/flights)"
" package (available on GitHub only). It contains On-Time flights data from "
"the Bureau of Transportation Statistics for all the flights that departed "
"from New York City airports in 2014 (inspired by "
"[nycflights13](https://github.com/tidyverse/nycflights13)). The data is "
"available only for Jan-Oct'14."
msgstr ""

#: d.Rmd:block 10 (paragraph)
msgid ""
"We can use `data.table`'s fast-and-friendly file reader `fread` to load "
"`flights` directly as follows:"
msgstr ""

#: d.Rmd:block 13 (paragraph)
msgid ""
"Aside: `fread` accepts `http` and `https` URLs directly, as well as "
"operating system commands such as `sed` and `awk` output. See `?fread` for "
"examples."
msgstr ""

#: d.Rmd:block 14 (header)
msgid "Introduction"
msgstr ""

#: d.Rmd:block 15 (paragraph)
msgid "In this vignette, we will"
msgstr ""

#: d.Rmd:block 16 (ordered list)
msgid ""
"Start with the basics - what is a `data.table`, its general form, how to "
"*subset* rows, how to *select and compute* on columns;"
msgstr ""

#: d.Rmd:block 16 (ordered list)
msgid "Then we will look at performing data aggregations by group"
msgstr ""

#: d.Rmd:block 17 (header)
msgid "1. Basics {#basics-1}"
msgstr ""

#: d.Rmd:block 18 (header)
msgid "a) What is `data.table`? {#what-is-datatable-1a}"
msgstr ""

#: d.Rmd:block 19 (paragraph)
msgid ""
"`data.table` is an R package that provides **an enhanced version** of a "
"`data.frame`, the standard data structure for storing data in `base` R. In "
"the [Data](#data) section above, we saw how to create a `data.table` using "
"`fread()`, but alternatively we can also create one using the `data.table()`"
" function. Here is an example:"
msgstr ""

#: d.Rmd:block 21 (paragraph)
msgid ""
"You can also convert existing objects to a `data.table` using `setDT()` (for"
" `data.frame` and `list` structures) or `as.data.table()` (for other "
"structures). For more details pertaining to the difference (goes beyond the "
"scope of this vignette), please see `?setDT` and `?as.data.table`."
msgstr ""

#: d.Rmd:block 22 (header)
msgid "Note that:"
msgstr ""

#: d.Rmd:block 23 (unordered list)
msgid ""
"Row numbers are printed with a `:` in order to visually separate the row "
"number from the first column."
msgstr ""

#: d.Rmd:block 23 (unordered list)
msgid ""
"When the number of rows to print exceeds the global option "
"`datatable.print.nrows` (default = `r "
"getOption(\"datatable.print.nrows\")`), it automatically prints only the top"
" 5 and bottom 5 rows (as can be seen in the [Data](#data) section). For a "
"large `data.frame`, you may have found yourself waiting around while larger "
"tables print-and-page, sometimes seemingly endlessly. This restriction helps"
" with that, and you can query the default number like so:"
msgstr ""

#: d.Rmd:block 23 (unordered list)
msgid ""
"`data.table` doesn't set or use *row names*, ever. We will see why in the "
"*\"Keys and fast binary search based subset\"* vignette."
msgstr ""

#: d.Rmd:block 24 (header)
msgid ""
"b) General form - in what way is a `data.table` *enhanced*? {#enhanced-1b}"
msgstr ""

#: d.Rmd:block 25 (paragraph)
msgid ""
"In contrast to a `data.frame`, you can do *a lot more* than just subsetting "
"rows and selecting columns within the frame of a `data.table`, i.e., within "
"`[ ... ]` (NB: we might also refer to writing things inside `DT[...]` as "
"\"querying `DT`\", as an analogy or in relevance to SQL). To understand it "
"we will have to first look at the *general form* of the `data.table` syntax,"
" as shown below:"
msgstr ""

#: d.Rmd:block 27 (paragraph)
msgid ""
"Users with an SQL background might perhaps immediately relate to this "
"syntax."
msgstr ""

#: d.Rmd:block 28 (header)
msgid "The way to read it (out loud) is:"
msgstr ""

#: d.Rmd:block 29 (paragraph)
msgid ""
"Take `DT`, subset/reorder rows using `i`, then calculate `j`, grouped by "
"`by`."
msgstr ""

#: d.Rmd:block 30 (paragraph)
msgid ""
"Let's begin by looking at `i` and `j` first - subsetting rows and operating "
"on columns."
msgstr ""

#: d.Rmd:block 31 (header)
msgid "c) Subset rows in `i` {#subset-i-1c}"
msgstr ""

#: d.Rmd:block 32 (header)
msgid ""
"-- Get all the flights with \"JFK\" as the origin airport in the month of "
"June."
msgstr ""

#: d.Rmd:block 34 (unordered list)
msgid ""
"Within the frame of a `data.table`, columns can be referred to *as if they "
"are variables*, much like in SQL or Stata. Therefore, we simply refer to "
"`origin` and `month` as if they are variables. We do not need to add the "
"prefix `flights$` each time. Nevertheless, using `flights$origin` and "
"`flights$month` would work just fine."
msgstr ""

#: d.Rmd:block 34 (unordered list)
msgid ""
"The *row indices* that satisfy the condition `origin == \"JFK\" & month == "
"6L` are computed, and since there is nothing else left to do, all columns "
"from `flights` at rows corresponding to those *row indices* are simply "
"returned as a `data.table`."
msgstr ""

#: d.Rmd:block 34 (unordered list)
msgid ""
"A comma after the condition in `i` is not required. But `flights[origin == "
"\"JFK\" & month == 6L, ]` would work just fine. In a `data.frame`, however, "
"the comma is necessary."
msgstr ""

#: d.Rmd:block 35 (header)
msgid "-- Get the first two rows from `flights`. {#subset-rows-integer}"
msgstr ""

#: d.Rmd:block 37 (unordered list)
msgid ""
"In this case, there is no condition. The row indices are already provided in"
" `i`. We therefore return a `data.table` with all columns from `flights` at "
"rows for those *row indices*."
msgstr ""

#: d.Rmd:block 38 (header)
msgid ""
"-- Sort `flights` first by column `origin` in *ascending* order, and then by"
" `dest` in *descending* order:"
msgstr ""

#: d.Rmd:block 39 (paragraph)
msgid "We can use the R function `order()` to accomplish this."
msgstr ""

#: d.Rmd:block 41 (header)
msgid "`order()` is internally optimised"
msgstr ""

#: d.Rmd:block 42 (unordered list)
msgid ""
"We can use \"-\" on `character` columns within the frame of a `data.table` "
"to sort in decreasing order."
msgstr ""

#: d.Rmd:block 42 (unordered list)
msgid ""
"In addition, `order(...)` within the frame of a `data.table` uses "
"`data.table`'s internal fast radix order `forder()`. This sort provided such"
" a compelling improvement over R's `base::order` that the R project adopted "
"the `data.table` algorithm as its default sort in 2016 for R 3.3.0 (for "
"reference, check `?sort` and the [R Release "
"NEWS](https://cran.r-project.org/doc/manuals/r-release/NEWS.pdf))."
msgstr ""

#: d.Rmd:block 43 (paragraph)
msgid ""
"We will discuss `data.table`'s fast order in more detail in the "
"*`data.table` internals* vignette."
msgstr ""

#: d.Rmd:block 44 (header)
msgid "d) Select column(s) in `j` {#select-j-1d}"
msgstr ""

#: d.Rmd:block 45 (header)
msgid "-- Select `arr_delay` column, but return it as a *vector*."
msgstr ""

#: d.Rmd:block 47 (unordered list)
msgid ""
"Since columns can be referred to as if they are variables within the frame "
"of a `data.table`, we directly refer to the *variable* we want to subset. "
"Since we want *all the rows*, we simply skip `i`."
msgstr ""

#: d.Rmd:block 47 (unordered list)
msgid "It returns *all* the rows for the column `arr_delay`."
msgstr ""

#: d.Rmd:block 48 (header)
msgid "-- Select `arr_delay` column, but return as a `data.table` instead."
msgstr ""

#: d.Rmd:block 50 (unordered list)
msgid ""
"We wrap the *variables* (column names) within `list()`, which ensures that a"
" `data.table` is returned. In the case of a single column name, not wrapping"
" with `list()` returns a vector instead, as seen in the [previous "
"example](#select-j-1d)."
msgstr ""

#: d.Rmd:block 50 (unordered list)
msgid ""
"`data.table` also allows wrapping columns with `.()` instead of `list()`. It"
" is an *alias* to `list()`; they both mean the same. Feel free to use "
"whichever you prefer; we have noticed most users seem to prefer `.()` for "
"conciseness, so we will continue to use `.()` hereafter."
msgstr ""

#: d.Rmd:block 51 (paragraph)
msgid ""
"A `data.table` (and a `data.frame` too) is internally a `list` as well, with"
" the stipulation that each element has the same length and the `list` has a "
"`class` attribute. Allowing `j` to return a `list` enables converting and "
"returning `data.table` very efficiently."
msgstr ""

#: d.Rmd:block 52 (header)
msgid "Tip: {#tip-1}"
msgstr ""

#: d.Rmd:block 53 (paragraph)
msgid ""
"As long as `j-expression` returns a `list`, each element of the list will be"
" converted to a column in the resulting `data.table`. This makes `j` quite "
"powerful, as we will see shortly. It is also very important to understand "
"this for when you'd like to make more complicated queries!!"
msgstr ""

#: d.Rmd:block 54 (header)
msgid "-- Select both `arr_delay` and `dep_delay` columns."
msgstr ""

#: d.Rmd:block 56 (unordered list)
msgid "Wrap both columns within `.()`, or `list()`. That's it."
msgstr ""

#: d.Rmd:block 57 (header)
msgid ""
"-- Select both `arr_delay` and `dep_delay` columns *and* rename them to "
"`delay_arr` and `delay_dep`."
msgstr ""

#: d.Rmd:block 58 (paragraph)
msgid ""
"Since `.()` is just an alias for `list()`, we can name columns as we would "
"while creating a `list`."
msgstr ""

#: d.Rmd:block 60 (header)
msgid "e) Compute or *do* in `j`"
msgstr ""

#: d.Rmd:block 61 (header)
msgid "-- How many trips have had total delay < 0?"
msgstr ""

#: d.Rmd:block 63 (header)
msgid "What's happening here?"
msgstr ""

#: d.Rmd:block 64 (unordered list)
msgid ""
"`data.table`'s `j` can handle more than just *selecting columns* - it can "
"handle *expressions*, i.e., *computing on columns*. This shouldn't be "
"surprising, as columns can be referred to as if they are variables. Then we "
"should be able to *compute* by calling functions on those variables. And "
"that's what precisely happens here."
msgstr ""

#: d.Rmd:block 65 (header)
msgid "f) Subset in `i` *and* do in `j`"
msgstr ""

#: d.Rmd:block 66 (header)
msgid ""
"-- Calculate the average arrival and departure delay for all flights with "
"\"JFK\" as the origin airport in the month of June."
msgstr ""

#: d.Rmd:block 68 (unordered list)
msgid ""
"We first subset in `i` to find matching *row indices* where `origin` airport"
" equals `\"JFK\"`, and `month` equals `6L`. We *do not* subset the *entire* "
"`data.table` corresponding to those rows *yet*."
msgstr ""

#: d.Rmd:block 68 (unordered list)
msgid ""
"Now, we look at `j` and find that it uses only *two columns*. And what we "
"have to do is to compute their `mean()`. Therefore, we subset just those "
"columns corresponding to the matching rows, and compute their `mean()`."
msgstr ""

#: d.Rmd:block 69 (paragraph)
msgid ""
"Because the three main components of the query (`i`, `j` and `by`) are "
"*together* inside `[...]`, `data.table` can see all three and optimise the "
"query altogether *before evaluation*, rather than optimizing each "
"separately. We are able to therefore avoid the entire subset (i.e., "
"subsetting the columns *besides* `arr_delay` and `dep_delay`), for both "
"speed and memory efficiency."
msgstr ""

#: d.Rmd:block 70 (header)
msgid ""
"-- How many trips have been made in 2014 from \"JFK\" airport in the month "
"of June?"
msgstr ""

#: d.Rmd:block 72 (paragraph)
msgid ""
"The function `length()` requires an input argument. We just need to compute "
"the number of rows in the subset. We could have used any other column as the"
" input argument to `length()`. This approach is reminiscent of `SELECT "
"COUNT(dest) FROM flights WHERE origin = 'JFK' AND month = 6` in SQL."
msgstr ""

#: d.Rmd:block 73 (paragraph)
msgid ""
"This type of operation occurs quite frequently, especially while grouping "
"(as we will see in the next section), to the point where `data.table` "
"provides a *special symbol* `.N` for it."
msgstr ""

#: d.Rmd:block 74 (header)
msgid "g) Handle non-existing elements in `i`"
msgstr ""

#: d.Rmd:block 75 (header)
msgid "-- What happens when querying for non-existing elements?"
msgstr ""

#: d.Rmd:block 76 (paragraph)
msgid ""
"When querying a `data.table` for elements that do not exist, the behavior "
"differs based on the method used."
msgstr ""

#: d.Rmd:block 78 (unordered list)
msgid "**Key-based subsetting: `dt[\"d\"]`**"
msgstr ""

#: d.Rmd:block 78 (unordered list)
msgid ""
"This performs a right join on the key column `x`, resulting in a row with "
"`d` and `NA` for columns not found. When using `setkeyv`, the table is "
"sorted by the specified keys and an internal index is created, enabling "
"binary search for efficient subsetting."
msgstr ""

#: d.Rmd:block 78 (unordered list)
msgid "**Logical subsetting: `dt[x == \"d\"]`**"
msgstr ""

#: d.Rmd:block 78 (unordered list)
msgid ""
"This performs a standard subset operation that does not find any matching "
"rows and thus returns an empty `data.table`."
msgstr ""

#: d.Rmd:block 78 (unordered list)
msgid "**Exact match using `nomatch=NULL`**"
msgstr ""

#: d.Rmd:block 78 (unordered list)
msgid ""
"For exact matches without `NA` for non-existing elements, use "
"`nomatch=NULL`:"
msgstr ""

#: d.Rmd:block 79 (paragraph)
msgid ""
"Understanding these behaviors can help prevent confusion when dealing with "
"non-existing elements in your data."
msgstr ""

#: d.Rmd:block 80 (header)
msgid "Special symbol `.N`: {#special-N}"
msgstr ""

#: d.Rmd:block 81 (paragraph)
msgid ""
"`.N` is a special built-in variable that holds the number of observations "
"*in the current group*. It is particularly useful when combined with `by` as"
" we'll see in the next section. In the absence of group by operations, it "
"simply returns the number of rows in the subset."
msgstr ""

#: d.Rmd:block 82 (paragraph)
msgid ""
"Now that we now, we can now accomplish the same task by using `.N` as "
"follows:"
msgstr ""

#: d.Rmd:block 84 (unordered list)
msgid ""
"Once again, we subset in `i` to get the *row indices* where `origin` airport"
" equals *\"JFK\"*, and `month` equals *6*."
msgstr ""

#: d.Rmd:block 84 (unordered list)
msgid ""
"We see that `j` uses only `.N` and no other columns. Therefore, the entire "
"subset is not materialised. We simply return the number of rows in the "
"subset (which is just the length of row indices)."
msgstr ""

#: d.Rmd:block 84 (unordered list)
msgid ""
"Note that we did not wrap `.N` with `list()` or `.()`. Therefore, a vector "
"is returned."
msgstr ""

#: d.Rmd:block 85 (paragraph)
msgid ""
"We could have accomplished the same operation by doing `nrow(flights[origin "
"== \"JFK\" & month == 6L])`. However, it would have to subset the entire "
"`data.table` first corresponding to the *row indices* in `i` *and then* "
"return the rows using `nrow()`, which is unnecessary and inefficient. We "
"will cover this and other optimisation aspects in detail under the "
"*`data.table` design* vignette."
msgstr ""

#: d.Rmd:block 86 (header)
msgid ""
"h) Great! But how can I refer to columns by names in `j` (like in a "
"`data.frame`)? {#refer_j}"
msgstr ""

#: d.Rmd:block 87 (paragraph)
msgid ""
"If you're writing out the column names explicitly, there's no difference "
"compared to a `data.frame` (since v1.9.8)."
msgstr ""

#: d.Rmd:block 88 (header)
msgid ""
"-- Select both `arr_delay` and `dep_delay` columns the `data.frame` way."
msgstr ""

#: d.Rmd:block 90 (paragraph)
msgid ""
"If you've stored the desired columns in a character vector, there are two "
"options: Using the `..` prefix, or using the `with` argument."
msgstr ""

#: d.Rmd:block 91 (header)
msgid "-- Select columns named in a variable using the `..` prefix"
msgstr ""

#: d.Rmd:block 93 (paragraph)
msgid ""
"For those familiar with the Unix terminal, the `..` prefix should be "
"reminiscent of the \"up-one-level\" command, which is analogous to what's "
"happening here -- the `..` signals to `data.table` to look for the "
"`select_cols` variable \"up-one-level\", i.e., within the global environment"
" in this case."
msgstr ""

#: d.Rmd:block 94 (header)
msgid "-- Select columns named in a variable using `with = FALSE`"
msgstr ""

#: d.Rmd:block 96 (paragraph)
msgid ""
"The argument is named `with` after the R function `with()` because of "
"similar functionality. Suppose you have a `data.frame` `DF` and you'd like "
"to subset all rows where `x > 1`. In `base` R you can do the following:"
msgstr ""

#: d.Rmd:block 98 (unordered list)
msgid ""
"Using `with()` in (2) allows using `DF`'s column `x` as if it were a "
"variable."
msgstr ""

#: d.Rmd:block 98 (unordered list)
msgid ""
"Hence, the argument name `with` in `data.table`. Setting `with = FALSE` "
"disables the ability to refer to columns as if they are variables, thereby "
"restoring the \"`data.frame` mode\"."
msgstr ""

#: d.Rmd:block 98 (unordered list)
msgid "We can also *deselect* columns using `-` or `!`. For example:"
msgstr ""

#: d.Rmd:block 98 (unordered list)
msgid ""
"From `v1.9.5+`, we can also select by specifying start and end column names,"
" e.g., `year:day` to select the first three columns."
msgstr ""

#: d.Rmd:block 98 (unordered list)
msgid "This is particularly handy while working interactively."
msgstr ""

#: d.Rmd:block 99 (paragraph)
msgid ""
"`with = TRUE` is the default in `data.table` because we can do much more by "
"allowing `j` to handle expressions - especially when combined with `by`, as "
"we'll see in a moment."
msgstr ""

#: d.Rmd:block 100 (header)
msgid "2. Aggregations"
msgstr ""

#: d.Rmd:block 101 (paragraph)
msgid ""
"We've already seen `i` and `j` from `data.table`'s general form in the "
"previous section. In this section, we'll see how they can be combined "
"together with `by` to perform operations *by group*. Let's look at some "
"examples."
msgstr ""

#: d.Rmd:block 102 (header)
msgid "a) Grouping using `by`"
msgstr ""

#: d.Rmd:block 103 (header)
msgid ""
"-- How can we get the number of trips corresponding to each origin airport?"
msgstr ""

#: d.Rmd:block 105 (unordered list)
msgid ""
"We know `.N` [is a special variable](#special-N) that holds the number of "
"rows in the current group. Grouping by `origin` obtains the number of rows, "
"`.N`, for each group."
msgstr ""

#: d.Rmd:block 105 (unordered list)
msgid ""
"By doing `head(flights)` you can see that the origin airports occur in the "
"order *\"JFK\"*, *\"LGA\"*, and *\"EWR\"*. The original order of grouping "
"variables is preserved in the result. *This is important to keep in mind!*"
msgstr ""

#: d.Rmd:block 105 (unordered list)
msgid ""
"Since we did not provide a name for the column returned in `j`, it was named"
" `N` automatically by recognising the special symbol `.N`."
msgstr ""

#: d.Rmd:block 105 (unordered list)
msgid ""
"`by` also accepts a character vector of column names. This is particularly "
"useful for coding programmatically, e.g., designing a function with the "
"grouping columns (in the form of a `character` vector) as a function "
"argument."
msgstr ""

#: d.Rmd:block 105 (unordered list)
msgid ""
"When there's only one column or expression to refer to in `j` and `by`, we "
"can drop the `.()` notation. This is purely for convenience. We could "
"instead do:"
msgstr ""

#: d.Rmd:block 105 (unordered list)
msgid "We'll use this convenient form wherever applicable hereafter."
msgstr ""

#: d.Rmd:block 106 (header)
msgid ""
"-- How can we calculate the number of trips for each origin airport for "
"carrier code `\"AA\"`? {#origin-.N}"
msgstr ""

#: d.Rmd:block 107 (paragraph)
msgid "The unique carrier code `\"AA\"` corresponds to *American Airlines Inc.*"
msgstr ""

#: d.Rmd:block 109 (unordered list)
msgid ""
"We first obtain the row indices for the expression `carrier == \"AA\"` from "
"`i`."
msgstr ""

#: d.Rmd:block 109 (unordered list)
msgid ""
"Using those *row indices*, we obtain the number of rows while grouped by "
"`origin`. Once again no columns are actually materialised here, because the "
"`j-expression` does not require any columns to be actually subsetted and is "
"therefore fast and memory efficient."
msgstr ""

#: d.Rmd:block 110 (header)
msgid ""
"-- How can we get the total number of trips for each `origin, dest` pair for"
" carrier code `\"AA\"`? {#origin-dest-.N}"
msgstr ""

#: d.Rmd:block 112 (unordered list)
msgid ""
"`by` accepts multiple columns. We just provide all the columns by which to "
"group by. Note the use of `.()` again in `by` -- again, this is just "
"shorthand for `list()`, and `list()` can be used here as well. Again, we'll "
"stick with `.()` in this vignette."
msgstr ""

#: d.Rmd:block 113 (header)
msgid ""
"-- How can we get the average arrival and departure delay for each "
"`orig,dest` pair for each month for carrier code `\"AA\"`? {#origin-dest-"
"month}"
msgstr ""

#: d.Rmd:block 115 (unordered list)
msgid ""
"Since we did not provide column names for the expressions in `j`, they were "
"automatically generated as `V1` and `V2`."
msgstr ""

#: d.Rmd:block 115 (unordered list)
msgid ""
"Once again, note that the input order of grouping columns is preserved in "
"the result."
msgstr ""

#: d.Rmd:block 116 (paragraph)
msgid ""
"Now what if we would like to order the result by those grouping columns "
"`origin`, `dest` and `month`?"
msgstr ""

#: d.Rmd:block 117 (header)
msgid "b) Sorted `by`: `keyby`"
msgstr ""

#: d.Rmd:block 118 (paragraph)
msgid ""
"`data.table` retaining the original order of groups is intentional and by "
"design. There are cases when preserving the original order is essential. But"
" at times we would like to automatically sort by the variables in our "
"grouping."
msgstr ""

#: d.Rmd:block 119 (header)
msgid "-- So how can we directly order by all the grouping variables?"
msgstr ""

#: d.Rmd:block 121 (unordered list)
msgid ""
"All we did was change `by` to `keyby`. This automatically orders the result "
"by the grouping variables in increasing order. In fact, due to the internal "
"implementation of `by` first requiring a sort before recovering the original"
" table's order, `keyby` is typically faster than `by` because it doesn't "
"require this second step."
msgstr ""

#: d.Rmd:block 122 (paragraph)
msgid ""
"**Keys:** Actually `keyby` does a little more than *just ordering*. It also "
"*sets a key* after ordering by setting an `attribute` called `sorted`."
msgstr ""

#: d.Rmd:block 123 (paragraph)
msgid ""
"We'll learn more about `keys` in the *Keys and fast binary search based "
"subset* vignette; for now, all you have to know is that you can use `keyby` "
"to automatically order the result by the columns specified in `by`."
msgstr ""

#: d.Rmd:block 124 (header)
msgid "c) Chaining"
msgstr ""

#: d.Rmd:block 125 (paragraph)
msgid ""
"Let's reconsider the task of [getting the total number of trips for each "
"`origin, dest` pair for carrier *\"AA\"*](#origin-dest-.N)."
msgstr ""

#: d.Rmd:block 127 (header)
msgid ""
"-- How can we order `ans` using the columns `origin` in ascending order, and"
" `dest` in descending order?"
msgstr ""

#: d.Rmd:block 128 (paragraph)
msgid ""
"We can store the intermediate result in a variable, and then use "
"`order(origin, -dest)` on that variable. It seems fairly straightforward."
msgstr ""

#: d.Rmd:block 130 (unordered list)
msgid ""
"Recall that we can use `-` on a `character` column in `order()` within the "
"frame of a `data.table`. This is possible due to `data.table`'s internal "
"query optimisation."
msgstr ""

#: d.Rmd:block 130 (unordered list)
msgid ""
"Also recall that `order(...)` with the frame of a `data.table` is "
"*automatically optimised* to use `data.table`'s internal fast radix order "
"`forder()` for speed."
msgstr ""

#: d.Rmd:block 131 (paragraph)
msgid ""
"But this requires having to assign the intermediate result and then "
"overwriting that result. We can do one better and avoid this intermediate "
"assignment to a temporary variable altogether by *chaining* expressions."
msgstr ""

#: d.Rmd:block 133 (unordered list)
msgid ""
"We can tack expressions one after another, *forming a chain* of operations, "
"i.e., `DT[ ... ][ ... ][ ... ]`."
msgstr ""

#: d.Rmd:block 133 (unordered list)
msgid "Or you can also chain them vertically:"
msgstr ""

#: d.Rmd:block 134 (header)
msgid "d) Expressions in `by`"
msgstr ""

#: d.Rmd:block 135 (header)
msgid "-- Can `by` accept *expressions* as well or does it just take columns?"
msgstr ""

#: d.Rmd:block 136 (paragraph)
msgid ""
"Yes it does. As an example, if we would like to find out how many flights "
"started late but arrived early (or on time), started and arrived late etc..."
msgstr ""

#: d.Rmd:block 138 (unordered list)
msgid ""
"The last row corresponds to `dep_delay > 0 = TRUE` and `arr_delay > 0 = "
"FALSE`. We can see that `r flights[!is.na(arr_delay) & !is.na(dep_delay), "
".N, .(dep_delay>0, arr_delay>0)][, N[4L]]` flights started late but arrived "
"early (or on time)."
msgstr ""

#: d.Rmd:block 138 (unordered list)
msgid ""
"Note that we did not provide any names to `by-expression`. Therefore, names "
"have been automatically assigned in the result. As with `j`, you can name "
"these expressions as you would for elements of any `list`, like for e.g. "
"`DT[, .N, .(dep_delayed = dep_delay>0, arr_delayed = arr_delay>0)]`."
msgstr ""

#: d.Rmd:block 138 (unordered list)
msgid ""
"You can provide other columns along with expressions, for example: `DT[, .N,"
" by = .(a, b>0)]`."
msgstr ""

#: d.Rmd:block 139 (header)
msgid "e) Multiple columns in `j` - `.SD`"
msgstr ""

#: d.Rmd:block 140 (header)
msgid "-- Do we have to compute `mean()` for each column individually?"
msgstr ""

#: d.Rmd:block 141 (paragraph)
msgid ""
"It is of course not practical to have to type `mean(myCol)` for every column"
" one by one. What if you had 100 columns to average `mean()`?"
msgstr ""

#: d.Rmd:block 142 (paragraph)
msgid ""
"How can we do this efficiently and concisely? To get there, refresh on [this"
" tip](#tip-1) - *\"As long as the `j`-expression returns a `list`, each "
"element of the `list` will be converted to a column in the resulting "
"`data.table`\"*. If we can refer to the *data subset for each group* as a "
"variable *while grouping*, we can then loop through all the columns of that "
"variable using the already- or soon-to-be-familiar base function `lapply()`."
" No new names to learn specific to `data.table`."
msgstr ""

#: d.Rmd:block 143 (header)
msgid "Special symbol `.SD`: {#special-SD}"
msgstr ""

#: d.Rmd:block 144 (paragraph)
msgid ""
"`data.table` provides a *special* symbol called `.SD`. It stands for "
"**S**ubset of **D**ata. It by itself is a `data.table` that holds the data "
"for *the current group* defined using `by`."
msgstr ""

#: d.Rmd:block 145 (paragraph)
msgid ""
"Recall that a `data.table` is internally a `list` as well with all its "
"columns of equal length."
msgstr ""

#: d.Rmd:block 146 (paragraph)
msgid ""
"Let's use the [`data.table` `DT` from before](#what-is-datatable-1a) to get "
"a glimpse of what `.SD` looks like."
msgstr ""

#: d.Rmd:block 148 (unordered list)
msgid ""
"`.SD` contains all the columns *except the grouping columns* by default."
msgstr ""

#: d.Rmd:block 148 (unordered list)
msgid ""
"It is also generated by preserving the original order - data corresponding "
"to `ID = \"b\"`, then `ID = \"a\"`, and then `ID = \"c\"`."
msgstr ""

#: d.Rmd:block 149 (paragraph)
msgid ""
"To compute on (multiple) columns, we can then simply use the base R function"
" `lapply()`."
msgstr ""

#: d.Rmd:block 151 (unordered list)
msgid ""
"`.SD` holds the rows corresponding to columns `a`, `b` and `c` for that "
"group. We compute the `mean()` on each of these columns using the already-"
"familiar base function `lapply()`."
msgstr ""

#: d.Rmd:block 151 (unordered list)
msgid ""
"Each group returns a list of three elements containing the mean value which "
"will become the columns of the resulting `data.table`."
msgstr ""

#: d.Rmd:block 151 (unordered list)
msgid ""
"Since `lapply()` returns a `list`, so there is no need to wrap it with an "
"additional `.()` (if necessary, refer to [this tip](#tip-1))."
msgstr ""

#: d.Rmd:block 152 (paragraph)
msgid ""
"We are almost there. There is one little thing left to address. In our "
"`flights` `data.table`, we only wanted to calculate the `mean()` of the two "
"columns `arr_delay` and `dep_delay`. But `.SD` would contain all the columns"
" other than the grouping variables by default."
msgstr ""

#: d.Rmd:block 153 (header)
msgid ""
"-- How can we specify just the columns we would like to compute the `mean()`"
" on?"
msgstr ""

#: d.Rmd:block 154 (header)
msgid ".SDcols"
msgstr ""

#: d.Rmd:block 155 (paragraph)
msgid ""
"Using the argument `.SDcols`. It accepts either column names or column "
"indices. For example, `.SDcols = c(\"arr_delay\", \"dep_delay\")` ensures "
"that `.SD` contains only these two columns for each group."
msgstr ""

#: d.Rmd:block 156 (paragraph)
msgid ""
"Similar to [part g)](#refer_j), you can also specify the columns to remove "
"instead of columns to keep using `-` or `!`. Additionally, you can select "
"consecutive columns as `colA:colB` and deselect them as `!(colA:colB)` or "
"`-(colA:colB)`."
msgstr ""

#: d.Rmd:block 157 (paragraph)
msgid ""
"Now let us try to use `.SD` along with `.SDcols` to get the `mean()` of "
"`arr_delay` and `dep_delay` columns grouped by `origin`, `dest` and `month`."
msgstr ""

#: d.Rmd:block 159 (header)
msgid "f) Subset `.SD` for each group:"
msgstr ""

#: d.Rmd:block 160 (header)
msgid "-- How can we return the first two rows for each `month`?"
msgstr ""

#: d.Rmd:block 162 (unordered list)
msgid ""
"`.SD` is a `data.table` that holds all the rows for *that group*. We simply "
"subset the first two rows as we have seen [here](#subset-rows-integer) "
"already."
msgstr ""

#: d.Rmd:block 162 (unordered list)
msgid ""
"For each group, `head(.SD, 2)` returns the first two rows as a `data.table`,"
" which is also a `list`, so we do not have to wrap it with `.()`."
msgstr ""

#: d.Rmd:block 163 (header)
msgid "g) Why keep `j` so flexible?"
msgstr ""

#: d.Rmd:block 164 (paragraph)
msgid ""
"So that we have a consistent syntax and keep using already existing (and "
"familiar) base functions instead of learning new functions. To illustrate, "
"let us use the `data.table` `DT` that we created at the very beginning under"
" the section [What is a data.table?](#what-is-datatable-1a)."
msgstr ""

#: d.Rmd:block 165 (header)
msgid "-- How can we concatenate columns `a` and `b` for each group in `ID`?"
msgstr ""

#: d.Rmd:block 167 (unordered list)
msgid ""
"That's it. There is no special syntax required. All we need to know is the "
"base function `c()` which concatenates vectors and [the tip from "
"before](#tip-1)."
msgstr ""

#: d.Rmd:block 168 (header)
msgid ""
"-- What if we would like to have all the values of column `a` and `b` "
"concatenated, but returned as a list column?"
msgstr ""

#: d.Rmd:block 170 (unordered list)
msgid ""
"Here, we first concatenate the values with `c(a,b)` for each group, and wrap"
" that with `list()`. So for each group, we return a list of all concatenated"
" values."
msgstr ""

#: d.Rmd:block 170 (unordered list)
msgid ""
"Note that those commas are for display only. A list column can contain any "
"object in each cell, and in this example, each cell is itself a vector and "
"some cells contain longer vectors than others."
msgstr ""

#: d.Rmd:block 171 (paragraph)
msgid ""
"Once you start internalising usage in `j`, you will realise how powerful the"
" syntax can be. A very useful way to understand it is by playing around, "
"with the help of `print()`."
msgstr ""

#: d.Rmd:block 172 (paragraph)
msgid "For example:"
msgstr ""

#: d.Rmd:block 174 (paragraph)
msgid ""
"In (1), for each group, a vector is returned, with length = 6,4,2 here. "
"However, (2) returns a list of length 1 for each group, with its first "
"element holding vectors of length 6,4,2. Therefore, (1) results in a length "
"of `6+4+2 =`r 6+4+2``, whereas (2) returns `1+1+1=`r 1+1+1``."
msgstr ""

#: d.Rmd:block 175 (header)
msgid "Summary"
msgstr ""

#: d.Rmd:block 176 (paragraph)
msgid "The general form of `data.table` syntax is:"
msgstr ""

#: d.Rmd:block 178 (paragraph)
msgid "We have seen so far that,"
msgstr ""

#: d.Rmd:block 179 (header)
msgid "Using `i`:"
msgstr ""

#: d.Rmd:block 180 (unordered list)
msgid ""
"We can subset rows similar to a `data.frame`- except you don't have to use "
"`DT$` repetitively since columns within the frame of a `data.table` are seen"
" as if they are *variables*."
msgstr ""

#: d.Rmd:block 180 (unordered list)
msgid ""
"We can also sort a `data.table` using `order()`, which internally uses "
"data.table's fast order for better performance."
msgstr ""

#: d.Rmd:block 181 (paragraph)
msgid ""
"We can do much more in `i` by keying a `data.table`, which allows for "
"blazing fast subsets and joins. We will see this in the *\"Keys and fast "
"binary search based subsets\"* and *\"Joins and rolling joins\"* vignette."
msgstr ""

#: d.Rmd:block 182 (header)
msgid "Using `j`:"
msgstr ""

#: d.Rmd:block 183 (ordered list)
msgid "Select columns the `data.table` way: `DT[, .(colA, colB)]`."
msgstr ""

#: d.Rmd:block 183 (ordered list)
msgid "Select columns the `data.frame` way: `DT[, c(\"colA\", \"colB\")]`."
msgstr ""

#: d.Rmd:block 183 (ordered list)
msgid "Compute on columns: `DT[, .(sum(colA), mean(colB))]`."
msgstr ""

#: d.Rmd:block 183 (ordered list)
msgid "Provide names if necessary: `DT[, .(sA =sum(colA), mB = mean(colB))]`."
msgstr ""

#: d.Rmd:block 183 (ordered list)
msgid "Combine with `i`: `DT[colA > value, sum(colB)]`."
msgstr ""

#: d.Rmd:block 184 (header)
msgid "Using `by`:"
msgstr ""

#: d.Rmd:block 185 (unordered list)
msgid ""
"Using `by`, we can group by columns by specifying a *list of columns* or a "
"*character vector of column names* or even *expressions*. The flexibility of"
" `j`, combined with `by` and `i`, makes for a very powerful syntax."
msgstr ""

#: d.Rmd:block 185 (unordered list)
msgid "`by` can handle multiple columns and also *expressions*."
msgstr ""

#: d.Rmd:block 185 (unordered list)
msgid ""
"We can `keyby` grouping columns to automatically sort the grouped result."
msgstr ""

#: d.Rmd:block 185 (unordered list)
msgid ""
"We can use `.SD` and `.SDcols` in `j` to operate on multiple columns using "
"already familiar base functions. Here are some examples:"
msgstr ""

#: d.Rmd:block 185 (unordered list)
msgid ""
"`DT[, lapply(.SD, fun), by = ..., .SDcols = ...]` - applies `fun` to all "
"columns specified in `.SDcols` while grouping by the columns specified in "
"`by`."
msgstr ""

#: d.Rmd:block 185 (unordered list)
msgid ""
"`DT[, head(.SD, 2), by = ...]` - return the first two rows for each group."
msgstr ""

#: d.Rmd:block 185 (unordered list)
msgid ""
"`DT[col > val, head(.SD, 1), by = ...]` - combine `i` along with `j` and "
"`by`."
msgstr ""

#: d.Rmd:block 186 (header)
msgid "And remember the tip:"
msgstr ""

#: d.Rmd:block 187 (paragraph)
msgid ""
"As long as `j` returns a `list`, each element of the list will become a "
"column in the resulting `data.table`."
msgstr ""

#: d.Rmd:block 188 (paragraph)
msgid ""
"We will see how to *add/update/delete* columns *by reference* and how to "
"combine them with `i` and `by` in the next vignette."
msgstr ""