Use cases for various tasks

agarie edited this page Mar 24, 2013

Correlation of sets of random numbers

  1. Generate two sets of 1000 random numbers each, uniformly distributed between 0.0 and 1.0.
  2. Get Pearson's correlation (r), r^2, slope, and the y-intercept.
  3. Plot an x-y scatterplot and include the regression line. Include r^2 and the line equation (slope & y-intercept) somewhere on the plot.


# modified from:
x = runif(1000, 0.0, 1.0)  # random uniform distribution
y = runif(1000, 0.0, 1.0)
pearsons_r = cor(x,y)
r_squared = pearsons_r^2
fit = lm(y~x)    # notice the order of variables
y_intercept = as.numeric(fit$coeff[1])
slope = as.numeric(fit$coeff[2])
abline(lm(y~x)) # again, notice the order
function_string = paste("f(x) = ", slope, "x + ", y_intercept,  sep="")
r_sq_string = paste("r^2 =", r_squared)
display_string = paste(function_string, r_sq_string, sep="\n")
mtext(display_string, side=3, adj=1)  # top right outside of the margin


%Get two sets of random numbers
n = 1000;
x = rand(n,1);
y = rand(n,1);

pearsons_r = corr(x,y);
r_squared = pearsons_r^2;

vals = polyfit(x,y,1);
slope = vals(1);
intercept = vals(2);


clf %clear figure
hold on
text(0,.8,sprintf('r^2=%.10f\n%.4f x + %.4f',r_squared,slope,intercept),'FontSize',14,'FontWeight','bold');
hold off

sciruby (take 1)

Other thoughts on how this should/could look are most welcome. This is just imagine-ware right now

require 'sciruby'  # which requires at least 'sciruby/narray'?? and 'sciruby/stats'??
(x,y) = { NArray.float(1000).random!(1.0) }  # how would this look in NArray v0.7? What about with GSL backend?
# how does Statsample do this?
pearsons_r = Stats.pearsons_r(x,y)  # should this be combined with following line into one call?
slope, intercept = Stats.slope_intercept(x,y)
# fill in with rubyvis-like plotting???
# ... lots of nifty plotting and labeling here ...


require 'statsample'
include Statsample::Shorthand
a=Statsample.new_scale(1000) {rand}
b=Statsample.new_scale(1000) {rand},b)
puts r.summary # Retrieves r, t and p
puts "r:#{sr.r}, r^2:#{sr.r2}, a:#{sr.a}, b:#{sr.b}" # I have to implement summary on simple regression
scatterplot(a,b,:show_regression=>true, :label=>"r^2:#{sr.r2}\nslope=#{sr.a}+#{sr.b}b") # :show_regression and label not implemented yet

sciruby (take 2)

# other thoughts?

Multiline plot

This use case is here because this should be a simple thing, yet it is so incredibly awkward to accomplish in R.


d1x <- c(1,2,3,4,5)
d1y <- c(4,5,2,3,3)

d2x <- c(1.5,3,7,9,12.5)
d2y <- c(8,11,8,9,15)

d3x <- c(3,3,5,5,9)
d3y <- c(-2,3,10,9,-1)

xrange <- range(d1x,d2x,d3x)
yrange <- range(d1y,d2y,d3y)

plotcolors = c("pink", "orangered", "blue")

plot(d1x, d1y, ylim=yrange, xlim=xrange, type="o", col=plotcolors[1])
lines(d2x, d2y, type="o", col=plotcolors[2])
lines(d3x, d3y, type="o", col=plotcolors[3])

# note that you must choose optimal placement
# or dig into the labcurve (Hmisc package) or emptyspace (plotrix package)
# for automatic placement
legend("topleft", c("dataset 1", "dataset 2", "dataset 3"), col=plotcolors)  
# (not getting the legend to show colors on my machine...any help on this??)
