|
29 | 29 | "is the same thing as a monthly failure rate of 1.0 (100%), and is the same thing as a daily\n",
|
30 | 30 | "failure rate of 0.0333 (3.33%).\n",
|
31 | 31 | "\n",
|
| 32 | + "If you're running a shop that requires 10 widgets, and the failure rate is 12,\n", |
| 33 | + "you'll go through a lot of widgets, and you'll have to keep getting replacements.\n", |
| 34 | + "Over the span of a year, you can expect to buy 120 new widgets as replacements so you\n", |
| 35 | + "can always have 10 running.\n", |
| 36 | + "\n", |
32 | 37 | "## Probability of Failure\n",
|
33 | 38 | "\n",
|
34 | 39 | "Assuming that the probability of separate failures is independent, the probability of failure \n",
|
|
57 | 62 | "\n",
|
58 | 63 | "If you add up the infinite sequence of probabilites, it will add up to 1.0.\n",
|
59 | 64 | "\n",
|
| 65 | + "In practice, looking at one of the widgets in your shop, this means that there's a 13.5% chance\n", |
| 66 | + "you won't have to replace it in the year. There's a 27% chance you'll replace it once, a\n", |
| 67 | + "27% chance you'll replace it twice, an 18% chance you'll replace it three times, and so on.\n", |
| 68 | + "\n", |
60 | 69 | "To calculate the probability of zero failures, you can simplify the formula:\n",
|
61 | 70 | "\n",
|
62 | 71 | "$$\\text{probability of 0 failures} = e^{-\\lambda} \\: \\frac{\\lambda^0}{0!} \\: = \\: \\: e^{-\\lambda}$$\n",
|
|
144 | 153 | "* $P$ is the probability of a shard failing at least once in $R$ days\n",
|
145 | 154 | "* $D$ is the durability of data over $R$ days: not too many shards are lost\n",
|
146 | 155 | "\n",
|
| 156 | + "With erasure coding, your data remains inact as long as you don't lose \n", |
| 157 | + "more shards than there are parity shards. If you do lose more, there\n", |
| 158 | + "is no way to recover the data.\n", |
| 159 | + "\n", |
147 | 160 | "One of the assumptions we make is that it takes $R$ days to repair a failed\n",
|
148 | 161 | "shard. Let's start with a simpler problem and look at the data durability\n",
|
149 | 162 | "over a period of $R$ days. For a data loss to happen in this time period,\n",
|
|
0 commit comments