Skip to content

Latest commit

 

History

History
77 lines (65 loc) · 5.86 KB

Probability.md

File metadata and controls

77 lines (65 loc) · 5.86 KB
  • Probability is not always intuitive

  • Probability is a mathematical description of randomness and uncertainty. It is a way to measure or quantify uncertainty.

  • Probability is the underlying foundation for the methods of statistical inference. Probability can be used to quantify how much we expect random samples (collected as part of statistics) to vary.

  • Probability can answer the question like "How likely is it that our sample estimate is no more than 3% from the true percentage of all U.S. adults who are in favor of the death penalty?”

  • Volunteer sample - is biased

    • Determine the musical preferences of all students at your university, and assuming it is entire population
    • We cannot generalize to any larger group at all.
    • Volunteer samples tend to be comprised of individuals who have a particularly strong opinion about an issue, and are looking for an opportunity to voice it.
  • Convenience sample - is biased

    • Stand outside the Student Union, across from the Fine Arts Building, and ask students passing by to respond to your question about musical preference.
    • Checking only music students is not entire population
  • Sampling frame - biased

    • Ask your professors for email rosters of all the students in your classes. Randomly sample some addresses, and email those students with your question about musical preference.
    • list of potential individuals to be sampled—does not match the population of interest.
  • Systematic sampling - Unbiased - Not subject to any bias

    • Obtain a student directory with email addresses of all the university's students, and send the music poll to every 50th name on the list.
    • If individuals are sampled completely at random, and without replacement, then each group of a given size is just as likely to be selected as all the other groups of that size. This is called a simple random sample (SRS).
  • Probability sampling plan (or technique)

    • Simple Random Sampling
    • Cluster Sampling
      • Suppose that the city has 10 hospitals. Choose one of the 10 hospitals at random and interview all the nurses in that hospital regarding their job satisfaction. This is an example of cluster sampling, in which the hospitals are the clusters.
    • Stratified Sampling
      • Choose a random sample of 50 nurses from each of the 10 hospitals and interview these 50 * 10 = 500 regarding their job satisfaction. This is an example of stratified sampling, in which each hospital is a stratum.
  • Law of large numbers - The relative frequency of an event does indeed approach the theoretical probability of that event as the number of repetitions increases. This is called the Law of Large Numbers.

  • The Law of Large Numbers states that as the number of trials increases, the relative frequency becomes the actual probability. So, using this law, as the number of trials increases, the empirical probability gets closer and closer to the theoretical probability.

  • "How many times do I need to repeat the random experiment in order for the relative frequency to be, say, within .001 of the actual probability of the event?"

  • Relative Frequency - (definition) The probability of an event (A) is the relative frequency with which the event occurs in a long series of trials.

  • For a "fair" coin (one that is not unevenly weighted, and does not have identical images on both sides)

  • Aleatory vs Epistemic (errors)

  • Determine probability: Theoretical (Classical) and Empirical (Observational).

Discrete and Continious

  • For example, the variable “number of times a college student changes major” is a discrete random variable. The (exact) weight of a person is a continuous random variable.
  • Probability distribution = Probability model
  • The outcomes described by the model are random. This means that individual outcomes are uncertain, but there is a regular, predictable distribution of outcomes in a large number of repetitions.

Stadard deviation

  • Number that describes how much frequenceis could stay away from actual means.
  • Higher the frequency, data also available far from mean

##Histogram

  • for Probability - The heights of all the rectangles in the histogram must sum to 1. This meant that the area was also 1.
  • As the number of intervals increases, the width of the bars becomes narrower and narrower, and the graph approaches a smooth curve and looks like normal curve.

Probability density curve.

  • Probability distribution of a continuous random variable is represented by a probability density curve.
  • Area under Probability density curve is 1
  • P(X<9)=P(X≤9) as P(X=x)=0 for continious
  • P(a < X < b) = Integration of (f(x)) limit a to b
  • Sample distribution
    • such as pregnancy lengths, shoe sizes, foot lengths, and other human physical characteristics exhibit these properties: symmetry indicates that the variable is just as likely to take a value a certain distance below its mean as it is to take a value that same distance above its mean
  • Bell shaped - Closer to the mean is most likely
  • Normal curve ~ Common curve ~ Natural curve
  • mean (μ) and standard deviation (σ)
  • Standard Deviation Rule (or the 68-95-99.7 rule)
  • Normal random variable follow normal distribution (it is obviously continous variable)
  • Quartile P(X) and P(X)
    • Std-deviation of -0.68 and +0.68 (for 50%)
  • Std-deviation of -0.10 and +0.10 (for 10% lower and highest) a = -1.29 b = 1.29

Standard normal variable

  • z-score = (x−μ)/σ or (value - mean)/standard deviation
  • z-scores allow us to compare values of different normal random variables
  • The normal table provides probabilities that a standardized normal random variable Z would take a value less than or equal to a particular value z*.
  • Quartile P(X) and P(X)
    • Std-deviation of -0.68 and +0.68 (for 50%)

Some samples

  • Length (in days) of human pregnancies is a normal random variable (X) with mean 266, standard deviation 16.
    • 266 - 16 = 250282, 234298, 218~314
    • May-15~Feb-4-Result: 265 days