-
Notifications
You must be signed in to change notification settings - Fork 80
Use cases for various tasks
agarie edited this page Mar 24, 2013
·
1 revision
- Generate two sets of 1000 random numbers each, uniformly distributed between 0.0 and 1.0.
- Get Pearson's correlation (r), r^2, slope, and the y-intercept.
- Plot an x-y scatterplot and include the regression line. Include r^2 and the line equation (slope & y-intercept) somewhere on the plot.
# modified from: http://sudoit.blogspot.com/2010/08/regression-line-r2-pearsons-correlation.html
x = runif(1000, 0.0, 1.0) # random uniform distribution
y = runif(1000, 0.0, 1.0)
pearsons_r = cor(x,y)
r_squared = pearsons_r^2
fit = lm(y~x) # notice the order of variables
y_intercept = as.numeric(fit$coeff[1])
slope = as.numeric(fit$coeff[2])
plot(x,y)
abline(lm(y~x)) # again, notice the order
function_string = paste("f(x) = ", slope, "x + ", y_intercept, sep="")
r_sq_string = paste("r^2 =", r_squared)
display_string = paste(function_string, r_sq_string, sep="\n")
mtext(display_string, side=3, adj=1) # top right outside of the margin
%Get two sets of random numbers
n = 1000;
x = rand(n,1);
y = rand(n,1);
pearsons_r = corr(x,y);
r_squared = pearsons_r^2;
vals = polyfit(x,y,1);
slope = vals(1);
intercept = vals(2);
x2=0:.01:1;
y2=slope*x2+intercept;
clf %clear figure
hold on
scatter(x,y);
plot(x2,y2);
text(0,.8,sprintf('r^2=%.10f\n%.4f x + %.4f',r_squared,slope,intercept),'FontSize',14,'FontWeight','bold');
hold off
Other thoughts on how this should/could look are most welcome. This is just imagine-ware right now
require 'sciruby' # which requires at least 'sciruby/narray'?? and 'sciruby/stats'??
(x,y) = 2.times.map { NArray.float(1000).random!(1.0) } # how would this look in NArray v0.7? What about with GSL backend?
# how does Statsample do this?
pearsons_r = Stats.pearsons_r(x,y) # should this be combined with following line into one call?
slope, intercept = Stats.slope_intercept(x,y)
# fill in with rubyvis-like plotting???
# ... lots of nifty plotting and labeling here ...
require 'statsample'
include Statsample::Shorthand
a=Statsample.new_scale(1000) {rand}
b=Statsample.new_scale(1000) {rand}
r=Statsample::Bivariate.Pearson.new(a,b)
puts r.summary # Retrieves r, t and p
sr=Statsample::Regression::Simple.new_from_vectors(a,b)
puts "r:#{sr.r}, r^2:#{sr.r2}, a:#{sr.a}, b:#{sr.b}" # I have to implement summary on simple regression
scatterplot(a,b,:show_regression=>true, :label=>"r^2:#{sr.r2}\nslope=#{sr.a}+#{sr.b}b") # :show_regression and label not implemented yet
# other thoughts?
This use case is here because this should be a simple thing, yet it is so incredibly awkward to accomplish in R.
d1x <- c(1,2,3,4,5)
d1y <- c(4,5,2,3,3)
d2x <- c(1.5,3,7,9,12.5)
d2y <- c(8,11,8,9,15)
d3x <- c(3,3,5,5,9)
d3y <- c(-2,3,10,9,-1)
xrange <- range(d1x,d2x,d3x)
yrange <- range(d1y,d2y,d3y)
plotcolors = c("pink", "orangered", "blue")
plot(d1x, d1y, ylim=yrange, xlim=xrange, type="o", col=plotcolors[1])
lines(d2x, d2y, type="o", col=plotcolors[2])
lines(d3x, d3y, type="o", col=plotcolors[3])
# note that you must choose optimal placement
# or dig into the labcurve (Hmisc package) or emptyspace (plotrix package)
# for automatic placement
legend("topleft", c("dataset 1", "dataset 2", "dataset 3"), col=plotcolors)
# (not getting the legend to show colors on my machine...any help on this??)