Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Example showing how to write a least-squares polynomial fit in Julia #3325

Closed
wants to merge 1 commit into from

Conversation

brorson
Copy link

@brorson brorson commented Jun 7, 2013

I have written a simple program which performs least squares fitting. This is meant to be a stand-alone example of how to use Juila to solve a simple, but not trivial mathematical problem. it is aimed at Julia beginners. I envision this could become a classroom example of how to use Julia.

This example follows the methods discussed on Wikipedia:
http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29

enough that it may be used as a pedogogical example
of how to write a Julia program.  This example follows
the discussion on Wikipedia:
http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29
@johnmyleswhite
Copy link
Member

This seems a little elaborate. I'd also like to make clear to newcomers that least-squares fitting in Julia should be done using the GLM package.

@brorson
Copy link
Author

brorson commented Jun 7, 2013

Hi John,

Fair enough, but my goal is to create free-standing examples which can
be used by undergrads who have never seen a math language before. I
agree using an industrial-strength package is the right thing for the
pros to do. However, as learning aids I thought a reasonably simple
least-squares fitter would be illustrative to students.

I created a few of these examples last year after talking to Alan E.
For a variety of reasons I haven't been in a position to get my
examples into the code base until very recently. Maybe there could be
a directory called "pedagogy" parallel to "examples" where such
examples can go? Or maybe under examples?

You guys can choose. But please let me know if you want more of these
(or don't) since I have a few of them stacked up.

Cheers,

Stuart

On Fri, 7 Jun 2013, John Myles White wrote:

This seems a little elaborate. I'd also like to make clear to newcomers that least-squares fitting in Julia should be done using the GLM package.


Reply to this email directly or view it on GitHub:
#3325 (comment)

@dmbates
Copy link
Member

dmbates commented Jun 7, 2013

I started out writing a comment on all the things that are bad about this example but decided not to post it because it sounded too confrontational. One thing I particularly value about the Julia community is the cordiality.

I appreciate your desire to contribute an example. However, Julia is a programming language for technical computing and doing technical computing well involves choosing good algorithms and being aware of the stability, accuracy and speed of different approaches. Nowhere is this more important than in numerical linear algebra. Simply creating the model matrix, X, and getting the coefficients as

b = X\y

will choose a much better algorithm (QR decomposition with column pivoting) than what you have chosen (LU decomposition of a symmetric, positive-definite matrix). And it is much simpler.

The second point is that polynomial regression using powers of x as columns of the model matrix is a notoriously ill-conditioned problem. If you really want to do polynomial regression it would be better to form an orthogonal polynomial basis.

The example does not illustrate good programming practice nor does it illustrate good algorithm choice. I would recommend against including it.

@brorson
Copy link
Author

brorson commented Jun 7, 2013

Fair enough. I'll see if I can polish up the example and the
re-submit the pull request.

Thanks

Stuart

On Fri, 7 Jun 2013, dmbates wrote:

I started out writing a comment on all the things that are bad about this example but decided not to post it because it sounded too confrontational. One thing I particularly value about the Julia community is the cordiality.

I appreciate your desire to contribute an example. However, Julia is a programming language for technical computing and doing technical computing well involves choosing good algorithms and being aware of the stability, accuracy and speed of different approaches. Nowhere is this more important than in numerical linear algebra. Simply creating the model matrix, X, and getting the coefficients as

b = X\y

will choose a much better algorithm (QR decomposition with column pivoting) than what you have chosen (LU decomposition of a symmetric, positive-definite matrix). And it is much simpler.

The second point is that polynomial regression using powers of x as columns of the model matrix is a notoriously ill-conditioned problem. If you really want to do polynomial regression it would be better to form an orthogonal polynomial basis.

The example does not illustrate good programming practice nor does it illustrate good algorithm choice. I would recommend against including it.


Reply to this email directly or view it on GitHub:
#3325 (comment)

@JeffBezanson
Copy link
Member

Thanks Stuart, this is nicely presented.

I can't comment on the math, but from my perspective there is only one problem, which is the program structure. This code should define at least one function that encapsulates the thing being computed. That way we can see what the core algorithm is and what its inputs are, separated from setting up the example problem being solved and the printing code. One of the biggest problems you see all the time in (mostly) matlab code is this style of doing everything at the top level in scripts.

@brorson
Copy link
Author

brorson commented Jun 7, 2013

Thanks. I can change the example to encapsulate two fcns:

  1. A fitter which takes the (x, y) pairs and emits a coefficient
    vector for the fit polynomial.
  2. A predictor which takes the coefficient vector and an x, and
    returns the coresponding y based on the fit.

Depending upon how you like the examples presented, I can then either
put a driver under the tests directory, or just put some comments in
the code explaining how to use it. Comments are probably easier, and
make the example free-standing.

I'll take a look at the numerical issues raised by dmbates and do
something to address them too. Particularly the point about simply
using b = X\y is relevant.

Cheers,

Stuart

On Fri, 7 Jun 2013, Jeff Bezanson wrote:

Thanks Stuart, this is nicely presented.

I can't comment on the math, but from my perspective there is only one problem, which is the program structure. This code should define at least one function that encapsulates the thing being computed. That way we can see what the core algorithm is and what its inputs are, separated from setting up the example problem being solved and the printing code. One of the biggest problems you see all the time in (mostly) matlab code is this style of doing everything at the top level in scripts.


Reply to this email directly or view it on GitHub:
#3325 (comment)

@dmbates
Copy link
Member

dmbates commented Jun 7, 2013

If you want to show polynomial regression using powers you could show some of the in-built mechanisms for creating matrices.

julia> x = [-15.:15.];

julia> println(x)
[-15.0, -14.0, -13.0, -12.0, -11.0, -10.0, -9.0  …  8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0]

julia> srand(1234)

julia> y = 0.4randn(31) + 4 + x.*(3 + x.*(2 + x));

julia> println(y)
[-2965.65, -2390.36, -1894.2, -1472.36, -1117.65  …  1610.06, 2056.31, 2577.88, 3181.76, 3873.49]

julia> hcat(ones(length(x)), x, x.*x, x.*x.*x)  # one way of generating the model matrix
31x4 Float64 Array:
 1.0  -15.0  225.0  -3375.0
 1.0  -14.0  196.0  -2744.0
 1.0  -13.0  169.0  -2197.0
 1.0  -12.0  144.0  -1728.0
 1.0  -11.0  121.0  -1331.0
 1.0  -10.0  100.0  -1000.0
 1.0   -9.0   81.0   -729.0
 1.0   -8.0   64.0   -512.0
 1.0   -7.0   49.0   -343.0
 1.0   -6.0   36.0   -216.0
 ⋮                         
 1.0    6.0   36.0    216.0
 1.0    7.0   49.0    343.0
 1.0    8.0   64.0    512.0
 1.0    9.0   81.0    729.0
 1.0   10.0  100.0   1000.0
 1.0   11.0  121.0   1331.0
 1.0   12.0  144.0   1728.0
 1.0   13.0  169.0   2197.0
 1.0   14.0  196.0   2744.0
 1.0   15.0  225.0   3375.0

julia> [xx^j for xx in x, j in 0:3]   # using a comprehension
31x4 Any Array:
 1.0  -15.0  225.0  -3375.0
 1.0  -14.0  196.0  -2744.0
 1.0  -13.0  169.0  -2197.0
 1.0  -12.0  144.0  -1728.0
 1.0  -11.0  121.0  -1331.0
 1.0  -10.0  100.0  -1000.0
 1.0   -9.0   81.0   -729.0
 1.0   -8.0   64.0   -512.0
 1.0   -7.0   49.0   -343.0
 1.0   -6.0   36.0   -216.0
 ⋮                         
 1.0    6.0   36.0    216.0
 1.0    7.0   49.0    343.0
 1.0    8.0   64.0    512.0
 1.0    9.0   81.0    729.0
 1.0   10.0  100.0   1000.0
 1.0   11.0  121.0   1331.0
 1.0   12.0  144.0   1728.0
 1.0   13.0  169.0   2197.0
 1.0   14.0  196.0   2744.0
 1.0   15.0  225.0   3375.0

julia> polyreg(x::Vector,y::Vector,k::Integer) = [xx^j for xx in x, j in 0:k]\y
polyreg(x::Array{T,1},y::Array{T,1},k::Integer) at none:1

julia> polyreg(x,y,3)
4-element Float64 Array:
 4.00831
 2.9788 
 1.99939
 1.00006

@ViralBShah
Copy link
Member

On a slightly different note, we have a lot of cleanup to do on our existing examples. Some are perhaps not good examples, some no longer work, etc. Issue #2598 is tracking the cleaning up of examples.

@brorson It would be great if you can help out with that effort. One specific task is also to add all the examples to the tests, so that we ensure they are always working. The second thing that would be useful would be to document the code better, and perhaps even include the examples in the manual.

The shootout benchmark in perf/benchmark also needs to be integrated in such a manner with the rest of the codebase.

@brorson
Copy link
Author

brorson commented Jun 8, 2013

Hi Viral,

Yup, getting a nice set of examples together is part of the plan, as
is hacking the documentation. As for putting examples in the docs,
yes, I think the best documentation is example code. Example code
should span the range from totally trivial to fairly complicated. The
examples in the example directory are very good. However, IMO, they
would be even better if they had lots of comments in them explaining
what is going on. I don't mind adding comments to existing
examples, as well as adding new examples. Finally, embedding the
right examples from the examples directory into the docs is part of
the plan.

Another thing I envision doing is writing up some "case studies" or
"application notes". Those would be documents which take a particular
problem (or computation) and then present code solving the problem (or
performing the computation) along with a detailed explanation about
how the code works and why various techniques were used. The least
squares example was meant to be a start in that direction, which is
why I didn't call a library fcn. Perhaps the way to fix that is to
have several implementations of the algorithm, the first a simple
implementation from the ground up, and then another one which uses
something from a Julia package, then finally perhaps an example in
which an external library is called. The goal is not to provide the
most optimal implementation of the algorithm, but rather to illustrate
how to use Julia. (And of course, point out the pros/cons of each
approach, such as which is the preferred Julian way to do it.)

Let me see if I can figure out how to adopt issue 2598 in GitHub. I
am traveling for the next few days, but over time I think I'll be able
to push a few interesting things to Julia.

Cheers,

Stuart

On Fri, 7 Jun 2013, Viral B. Shah wrote:

On a slightly different note, we have a lot of cleanup to do on our existing examples. Some are perhaps not good examples, some no longer work, etc. Issue #2598 is tracking the cleaning up of examples.

@brorson It would be great if you can help out with that effort. One specific task is also to add all the examples to the tests, so that we ensure they are always working. The second thing that would be useful would be to document the code better, and perhaps even include the examples in the manual.


Reply to this email directly or view it on GitHub:
#3325 (comment)

@ViralBShah
Copy link
Member

Thanks Stuart. This sounds really exciting - really looking forward to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants