diff --git a/README.md b/README.md
index ae6088d..0500b6d 100644
--- a/README.md
+++ b/README.md
@@ -1,273 +1,80 @@
-CIS 565 Project3 : CUDA Pathtracer
-===================
-
-Fall 2014
-
-Due Wed, 10/8 (submit without penalty until Sun, 10/12)
-
-## INTRODUCTION
-In this project, you will implement a CUDA based pathtracer capable of
-generating pathtraced rendered images extremely quickly. Building a pathtracer can be viewed as a generalization of building a raytracer, so for those of you who have taken 460/560, the basic concept should not be very new to you. For those of you that have not taken
-CIS460/560, raytracing is a technique for generating images by tracing rays of
-light through pixels in an image plane out into a scene and following the way
-the rays of light bounce and interact with objects in the scene. More
-information can be found here:
-http://en.wikipedia.org/wiki/Ray_tracing_(graphics). Pathtracing is a generalization of this technique by considering more than just the contribution of direct lighting to a surface.
-
-Since in this class we are concerned with working in generating actual images
-and less so with mundane tasks like file I/O, this project includes basecode
-for loading a scene description file format, described below, and various other
-things that generally make up the render "harness" that takes care of
-everything up to the rendering itself. The core renderer is left for you to
-implement.  Finally, note that while this basecode is meant to serve as a
-strong starting point for a CUDA pathtracer, you are not required to use this
-basecode if you wish, and you may also change any part of the basecode
-specification as you please, so long as the final rendered result is correct.
-
-## CONTENTS
-The Project3 root directory contains the following subdirectories:
-	
-* src/ contains the source code for the project. Both the Windows Visual Studio
-  solution and the OSX and Linux makefiles reference this folder for all 
-  source; the base source code compiles on Linux, OSX and Windows without 
-  modification.  If you are building on OSX, be sure to uncomment lines 4 & 5 of
-  the CMakeLists.txt in order to make sure CMake builds against clang.
-* data/scenes/ contains an example scene description file.
-* renders/ contains an example render of the given example scene file. 
-* windows/ contains a Windows Visual Studio 2010 project and all dependencies
-  needed for building and running on Windows 7. If you would like to create a
-  Visual Studio 2012 or 2013 projects, there are static libraries that you can
-  use for GLFW that are in external/bin/GLFW (Visual Studio 2012 uses msvc110, 
-  and Visual Studio 2013 uses msvc120)
-* external/ contains all the header, static libraries and built binaries for
-  3rd party libraries (i.e. glm, GLEW, GLFW) that we use for windowing and OpenGL
-  extensions
-
-## RUNNING THE CODE
-The main function requires a scene description file (that is provided in data/scenes). 
-The main function reads in the scene file by an argument as such :
-'scene=[sceneFileName]'
-
-If you are using Visual Studio, you can set this in the Debugging > Command Arguments section
-in the Project properties.
-
-## REQUIREMENTS
-In this project, you are given code for:
-
-* Loading, reading, and storing the scene scene description format
-* Example functions that can run on both the CPU and GPU for generating random
-  numbers, spherical intersection testing, and surface point sampling on cubes
-* A class for handling image operations and saving images
-* Working code for CUDA-GL interop
-
-You will need to implement the following features:
-
-* Raycasting from a camera into a scene through a pixel grid
-* Diffuse surfaces
-* Perfect specular reflective surfaces
-* Cube intersection testing
-* Sphere surface point sampling
-* Stream compaction optimization
-
-You are also required to implement at least 2 of the following features:
-
-* Texture mapping 
-* Bump mapping
-* Depth of field
-* Refraction, i.e. glass
-* OBJ Mesh loading and rendering
-* Interactive camera
-* Motion blur
-* Subsurface scattering
-
-The 'extra features' list is not comprehensive.  If you have a particular feature
-you would like to implement (e.g. acceleration structures, etc.) please contact us 
-first!
-
-For each 'extra feature' you must provide the following analysis :
-* overview write up of the feature
-* performance impact of the feature
-* if you did something to accelerate the feature, why did you do what you did
-* compare your GPU version to a CPU version of this feature (you do NOT need to 
-  implement a CPU version)
-* how can this feature be further optimized (again, not necessary to implement it, but
-  should give a roadmap of how to further optimize and why you believe this is the next
-  step)
-
-## BASE CODE TOUR
-You will be working in three files: raytraceKernel.cu, intersections.h, and
-interactions.h. Within these files, areas that you need to complete are marked
-with a TODO comment. Areas that are useful to and serve as hints for optional
-features are marked with TODO (Optional). Functions that are useful for
-reference are marked with the comment LOOK.
-
-* raytraceKernel.cu contains the core raytracing CUDA kernel. You will need to
-  complete:
-    * cudaRaytraceCore() handles kernel launches and memory management; this
-      function already contains example code for launching kernels,
-      transferring geometry and cameras from the host to the device, and transferring
-      image buffers from the host to the device and back. You will have to complete
-      this function to support passing materials and lights to CUDA.
-    * raycastFromCameraKernel() is a function that you need to implement. This
-      function once correctly implemented should handle camera raycasting. 
-    * raytraceRay() is the core raytracing CUDA kernel; all of your pathtracing
-      logic should be implemented in this CUDA kernel. raytraceRay() should
-      take in a camera, image buffer, geometry, materials, and lights, and should
-      trace a ray through the scene and write the resultant color to a pixel in the
-      image buffer.
-
-* intersections.h contains functions for geometry intersection testing and
-  point generation. You will need to complete:
-    * boxIntersectionTest(), which takes in a box and a ray and performs an
-      intersection test. This function should work in the same way as
-      sphereIntersectionTest().
-    * getRandomPointOnSphere(), which takes in a sphere and returns a random
-      point on the surface of the sphere with an even probability distribution.
-      This function should work in the same way as getRandomPointOnCube(). You can
-      (although do not necessarily have to) use this to generate points on a sphere
-      to use a point lights, or can use this for area lighting.
-
-* interactions.h contains functions for ray-object interactions that define how
-  rays behave upon hitting materials and objects. You will need to complete:
-    * getRandomDirectionInSphere(), which generates a random direction in a
-      sphere with a uniform probability. This function works in a fashion
-      similar to that of calculateRandomDirectionInHemisphere(), which generates a
-      random cosine-weighted direction in a hemisphere.
-    * calculateBSDF(), which takes in an incoming ray, normal, material, and
-      other information, and returns an outgoing ray. You can either implement
-      this function for ray-surface interactions, or you can replace it with your own
-      function(s).
-
-You will also want to familiarize yourself with:
-
-* sceneStructs.h, which contains definitions for how geometry, materials,
-  lights, cameras, and animation frames are stored in the renderer. 
-* utilities.h, which serves as a kitchen-sink of useful functions
-
-## NOTES ON GLM
-This project uses GLM, the GL Math library, for linear algebra. You need to
-know two important points on how GLM is used in this project:
-
-* In this project, indices in GLM vectors (such as vec3, vec4), are accessed
-  via swizzling. So, instead of v[0], v.x is used, and instead of v[1], v.y is
-  used, and so on and so forth.
-* GLM Matrix operations work fine on NVIDIA Fermi cards and later, but
-  pre-Fermi cards do not play nice with GLM matrices. As such, in this project,
-  GLM matrices are replaced with a custom matrix struct, called a cudaMat4, found
-  in cudaMat4.h. A custom function for multiplying glm::vec4s and cudaMat4s is
-  provided as multiplyMV() in intersections.h.
-
-## SCENE FORMAT
-This project uses a custom scene description format.
-Scene files are flat text files that describe all geometry, materials,
-lights, cameras, render settings, and animation frames inside of the scene.
-Items in the format are delimited by new lines, and comments can be added at
-the end of each line preceded with a double-slash.
-
-Materials are defined in the following fashion:
-
-* MATERIAL (material ID)								//material header
-* RGB (float r) (float g) (float b)					//diffuse color
-* SPECX (float specx)									//specular exponent
-* SPECRGB (float r) (float g) (float b)				//specular color
-* REFL (bool refl)									//reflectivity flag, 0 for
-  no, 1 for yes
-* REFR (bool refr)									//refractivity flag, 0 for
-  no, 1 for yes
-* REFRIOR (float ior)									//index of refraction
-  for Fresnel effects
-* SCATTER (float scatter)								//scatter flag, 0 for
-  no, 1 for yes
-* ABSCOEFF (float r) (float b) (float g)				//absorption
-  coefficient for scattering
-* RSCTCOEFF (float rsctcoeff)							//reduced scattering
-  coefficient
-* EMITTANCE (float emittance)							//the emittance of the
-  material. Anything >0 makes the material a light source.
-
-Cameras are defined in the following fashion:
-
-* CAMERA 												//camera header
-* RES (float x) (float y)								//resolution
-* FOVY (float fovy)										//vertical field of
-  view half-angle. the horizonal angle is calculated from this and the
-  reslution
-* ITERATIONS (float interations)							//how many
-  iterations to refine the image, only relevant for supersampled antialiasing,
-  depth of field, area lights, and other distributed raytracing applications
-* FILE (string filename)									//file to output
-  render to upon completion
-* frame (frame number)									//start of a frame
-* EYE (float x) (float y) (float z)						//camera's position in
-  worldspace
-* VIEW (float x) (float y) (float z)						//camera's view
-  direction
-* UP (float x) (float y) (float z)						//camera's up vector
-
-Objects are defined in the following fashion:
-* OBJECT (object ID)										//object header
-* (cube OR sphere OR mesh)								//type of object, can
-  be either "cube", "sphere", or "mesh". Note that cubes and spheres are unit
-  sized and centered at the origin.
-* material (material ID)									//material to
-  assign this object
-* frame (frame number)									//start of a frame
-* TRANS (float transx) (float transy) (float transz)		//translation
-* ROTAT (float rotationx) (float rotationy) (float rotationz)		//rotation
-* SCALE (float scalex) (float scaley) (float scalez)		//scale
-
-An example scene file setting up two frames inside of a Cornell Box can be
-found in the scenes/ directory.
-
-For meshes, note that the base code will only read in .obj files. For more 
-information on the .obj specification see http://en.wikipedia.org/wiki/Wavefront_.obj_file.
-
-An example of a mesh object is as follows:
-
-OBJECT 0
-mesh tetra.obj
-material 0
-frame 0
-TRANS       0 5 -5
-ROTAT       0 90 0
-SCALE       .01 10 10 
-
-Check the Google group for some sample .obj files of varying complexity.
-
-## THIRD PARTY CODE POLICY
-* Use of any third-party code must be approved by asking on our Google Group.  
-  If it is approved, all students are welcome to use it.  Generally, we approve 
-  use of third-party code that is not a core part of the project.  For example, 
-  for the ray tracer, we would approve using a third-party library for loading 
-  models, but would not approve copying and pasting a CUDA function for doing 
-  refraction.
-* Third-party code must be credited in README.md.
-* Using third-party code without its approval, including using another
-  student's code, is an academic integrity violation, and will result in you
-  receiving an F for the semester.
-
-## SELF-GRADING
-* On the submission date, email your grade, on a scale of 0 to 100, to Harmony,
-  harmoli+cis565@seas.upenn.com, with a one paragraph explanation.  Be concise and
-  realistic.  Recall that we reserve 30 points as a sanity check to adjust your
-  grade.  Your actual grade will be (0.7 * your grade) + (0.3 * our grade).  We
-  hope to only use this in extreme cases when your grade does not realistically
-  reflect your work - it is either too high or too low.  In most cases, we plan
-  to give you the exact grade you suggest.
-* Projects are not weighted evenly, e.g., Project 0 doesn't count as much as
-  the path tracer.  We will determine the weighting at the end of the semester
-  based on the size of each project.
-
-## SUBMISSION
-Please change the README to reflect the answers to the questions we have posed
-above.  Remember:
-* this is a renderer, so include images that you've made!
-* be sure to back your claims for optimization with numbers and comparisons
-* if you reference any other material, please provide a link to it
-* you wil not e graded on how fast your path tracer runs, but getting close to
-  real-time is always nice
-* if you have a fast GPU renderer, it is good to show case this with a video to
-  show interactivity.  If you do so, please include a link.
-
-Be sure to open a pull request and to send Harmony your grade and why you
-believe this is the grade you should get.
+Vulcan
+============
+
+Vulcan is a GPU accelerated, physically-based path tracer written in CUDA and C++.  You can look [here] for a more complete breakdown and development blog. 
+
+Features
+--------
+
+Here's a quick down break down of the features:
+  - Supports triangle soup meshes, as well as cube and sphere primitives.
+  - Supports a variety of simple BRDFs - Lambertian diffuse, Blinn-Phong specularity, Fresnel reflection / refraction, Oren-Nayar rough diffuse, Cook-Torrance rough specular
+  - Supports a few super sampling techniques - motion blur, depth of field, and anti-aliasing.
+
+![Base PT Image][base pt image]
+
+This image shows 3 spheres floating in a Cornell box.  All 3 have simple diffuse BRDFs.  The red sphere is exhibiting slight motion blur.
+
+
+![Base PT Image][ct fres]
+
+This scene has the same geometry, but the spheres now have more interesting BRDFs. The red sphere now exhibits Cook-Torrance specularity, which gives a rougher appearance than the more common Blinn-Phong model.  The other 2 spheres are exhibiting Fresnel reflection/refraction.
+
+![Base PT Image][fres]
+
+This image is a visual explanation of Fresnel reflection/refraction.  The left sphere is pure refraction, while the right sphere is pure reflection.  The center sphere combines the two.  The combination is somewhat mathematical, but it involves analysing the incoming and outgoing ray direction.
+
+![Base PT Image][obj]
+
+Here's an image to show triangular meshes being rendered in the system.  The white spheres are lights.
+
+Overall, the functionality of the renderer is quite limited, but I think it's a good start for further study, and I learned a lot about GPU programming through working on this project.
+
+##Performance
+
+Now, I'm going to talk a bit about the performance of the renderer, as well as the impact of some of the features on that performance.
+
+###Stream Compaction
+
+I used a technique called stream compaction to speed up the renderer.  Without going into unnecessary detail, stream compaction involves thinking about data in a slightly different way.  If you're not familiar with how a GPU works, you'll need to know that a GPU has a large number of lightweight threads.  That means that you can do a lot of (simple) operations at the same time.  That's why things that run on the GPU have the reputation of being very fast.
+
+An easy way to think about rendering on the GPU is to think of each pixel as a single thread.  That works pretty well, because at the end of the day, each pixel can be rendered individually, without any information from other pixels.  That's good, because inter-thread communication can be slow.  There is one problem with this methodology, however.  Imagine 2 threads running, and 1 doesn't hit anything, while the other gets trapped between 2 mirrors.  The first thread has finished its work, while the second will have to do a lot of work before finishing.  This is not an efficient use of your resources!  A smart thing to do would be to recognize that the first thread is done, and put it to work helping the second.  But how do you do that?  One way is to use a stream compaction!
+
+A better way is to think of each thread as a single bounce of a ray through the scene.  That way, when you run into the case described above, you can use your resources in a more complete way.
+
+#### Stream compaction performace impact : 5.5x speedup!
+
+That's great, but I could still push the performance of this technique further.  There are better stream compaction algorithm out there, and I also imagine that there are implementations that are much faster than mine.
+
+---------
+
+###Super Sampling (DOF, Anti-Aliasing, Motion Blur)
+
+Depth of Field, Anti-Aliasing, and Motion Blur can be accomplished in a similar fashion.  Instead of taking a single sample per frame, you take multiple samples!  For DOF, you sample the camera's position.  This attempts to simulate how a real camera would achieve depth of field (although it is a rather crude attempt).  For anti-aliasing, you sample the pixel position.  And for motion blur, you sample an object position in space.  You may be asking, won't all those extra samples be really expensive?  Luckily for us, we're already taking multiple samples every frame!  To eliminate the noise inherent to naive path tracers, we have to run the algorithm many times per frame!
+
+#### Super Sampling performace impact : Free!
+
+That's great!  You get these things for free!  These would also be free in a CPU implementation.
+
+###Triangle Mesh Primitives
+
+The nice thing about triangle meshes is that you can render more interesting objects than cubes and spheres!  However, that interest comes with a price.  Each triangle intersection test takes about as long as a test for a sphere or cube.  And your meshes will usually have many triangles!  That means that your renderer has a lot more work to do.  One simple speed up to try to combat this problem is to use boudning boxes, which prevent your renderer from testing against all of those triangle when it doesn't have to.
+
+#### Triangle Mesh Primitives performace impact : Depends, scene complexity is generally much higher.
+
+One glaring shortcoming of this renderer is the lack of a sophisticated acceleration structure, like a KD-Tree of BVH.  On the CPU, you run into the exact same problem of scene complexity.
+
+###Cook-Torrance, Oren-Nayar, Fresnel
+
+The first 2 BRDFs are fancier progressions of the more common Lambertian diffuse and Blinn-Phong specular.  Fresnel is a combination of reflection and refractino.  All 3 are a bit slower, as they are more complicated to compute.  However, given the other major bottlenecks present, the minor slowdowns presented by these are not a major concern.
+
+
+#### Cook-Torrance, Oren-Nayar, Fresnel performace impact : Slight slowdown.
+
+These BRDFs would be slightly faster on the CPU, as they all have branches that would benefit from the CPUs branch predictors.
+
+[base pt image]:http://3.bp.blogspot.com/-WENKFpXCcew/Uj4BJwYMTFI/AAAAAAAABeM/6RbhNzNDNfI/s320/pt.bmp
+[ct fres]:http://4.bp.blogspot.com/-spe4bKN8_Og/Uj38S9UgVeI/AAAAAAAABdw/P3HqOcest9s/s400/pathTrace.bmp
+[fres]:http://1.bp.blogspot.com/-4llLdG19UOQ/Uj4BJzeYdfI/AAAAAAAABeI/DSnifT4AbhQ/s400/comp.bmp
+[obj]:http://3.bp.blogspot.com/-sN4J1YlqzBk/UlnmSjjTEbI/AAAAAAAABjw/DrQV4XTGyNs/s640/a_raw.1.bmp
+[here]:http://blog.jeremynewlin.info/search/label/vulcan
\ No newline at end of file
diff --git a/src/glslUtility.cpp b/src/glslUtility.cpp
new file mode 100644
index 0000000..a88bc4b
--- /dev/null
+++ b/src/glslUtility.cpp
@@ -0,0 +1,154 @@
+// GLSL Utility: A utility class for loading GLSL shaders, for Patrick Cozzi's CIS565: GPU Computing at the University of Pennsylvania
+// Written by Varun Sampath and Patrick Cozzi, Copyright (c) 2012 University of Pennsylvania
+
+#include "glslUtility.h"
+
+#include <iostream>
+#include <fstream>
+#include <string>
+#include <stdlib.h>
+
+using std::ios;
+
+namespace glslUtility {
+
+	typedef struct {
+		GLuint vertex;
+		GLuint fragment;
+	} shaders_t;
+
+	char* loadFile(const char *fname, GLint &fSize)
+	{
+		// file read based on example in cplusplus.com tutorial
+		std::ifstream file (fname, ios::in|ios::binary|ios::ate);
+		if (file.is_open())
+		{
+			unsigned int size = (unsigned int)file.tellg();
+			fSize = size;
+			char *memblock = new char [size];
+			file.seekg (0, ios::beg);
+			file.read (memblock, size);
+			file.close();
+			//std::cout << "file " << fname << " loaded" << std::endl;
+    		return memblock;
+		}
+
+		std::cout << "Unable to open file " << fname << std::endl;
+		exit(1);
+	}
+
+	// printShaderInfoLog
+	// From OpenGL Shading Language 3rd Edition, p215-216
+	// Display (hopefully) useful error messages if shader fails to compile
+	void printShaderInfoLog(GLint shader)
+	{
+		int infoLogLen = 0;
+		int charsWritten = 0;
+		GLchar *infoLog;
+
+		glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &infoLogLen);
+
+		if (infoLogLen > 1)
+		{
+			infoLog = new GLchar[infoLogLen];
+			// error check for fail to allocate memory omitted
+			glGetShaderInfoLog(shader,infoLogLen, &charsWritten, infoLog);
+			//std::cout << "InfoLog:" << std::endl << infoLog << std::endl;
+			delete [] infoLog;
+		}
+	}
+
+	void printLinkInfoLog(GLint prog) 
+	{
+		int infoLogLen = 0;
+		int charsWritten = 0;
+		GLchar *infoLog;
+
+		glGetProgramiv(prog, GL_INFO_LOG_LENGTH, &infoLogLen);
+
+		if (infoLogLen > 1)
+		{
+			infoLog = new GLchar[infoLogLen];
+			// error check for fail to allocate memory omitted
+			glGetProgramInfoLog(prog,infoLogLen, &charsWritten, infoLog);
+			//std::cout << "InfoLog:" << std::endl << infoLog << std::endl;
+			delete [] infoLog;
+		}
+	}
+
+	shaders_t loadShaders(const char * vert_path, const char * frag_path) {
+		GLuint f, v;
+
+		char *vs,*fs;
+
+		v = glCreateShader(GL_VERTEX_SHADER);
+		f = glCreateShader(GL_FRAGMENT_SHADER);	
+
+		// load shaders & get length of each
+		GLint vlen;
+		GLint flen;
+		vs = loadFile(vert_path,vlen);
+		fs = loadFile(frag_path,flen);
+
+		const char * vv = vs;
+		const char * ff = fs;
+
+		glShaderSource(v, 1, &vv,&vlen);
+		glShaderSource(f, 1, &ff,&flen);
+
+		GLint compiled;
+
+		glCompileShader(v);
+		glGetShaderiv(v, GL_COMPILE_STATUS, &compiled);
+		if (!compiled)
+		{
+			std::cout << "Vertex shader not compiled." << std::endl;
+		} 
+		printShaderInfoLog(v);
+
+		glCompileShader(f);
+		glGetShaderiv(f, GL_COMPILE_STATUS, &compiled);
+		if (!compiled)
+		{
+			std::cout << "Fragment shader not compiled." << std::endl;
+		} 
+		printShaderInfoLog(f);
+
+		shaders_t out; out.vertex = v; out.fragment = f;
+
+		delete [] vs; // dont forget to free allocated memory, or else really bad things start happening
+		delete [] fs; // we allocated this in the loadFile function...
+
+		return out;
+	}
+
+	void attachAndLinkProgram( GLuint program, shaders_t shaders) {
+		glAttachShader(program, shaders.vertex);
+		glAttachShader(program, shaders.fragment);
+
+		glLinkProgram(program);
+		GLint linked;
+		glGetProgramiv(program,GL_LINK_STATUS, &linked);
+		if (!linked) 
+		{
+			std::cout << "Program did not link." << std::endl;
+		}
+		printLinkInfoLog(program);
+	}
+
+    GLuint createProgram(const char *vertexShaderPath, const char *fragmentShaderPath, const char *attributeLocations[], GLuint numberOfLocations)
+    {
+	    glslUtility::shaders_t shaders = glslUtility::loadShaders(vertexShaderPath, fragmentShaderPath);
+	
+	    GLuint program = glCreateProgram();
+
+		for (GLuint i = 0; i < numberOfLocations; ++i)
+		{
+            glBindAttribLocation(program, i, attributeLocations[i]);
+		}
+
+	    glslUtility::attachAndLinkProgram(program, shaders);
+
+        return program;
+    }
+}
diff --git a/src/glslUtility.h b/src/glslUtility.h
new file mode 100644
index 0000000..3308659
--- /dev/null
+++ b/src/glslUtility.h
@@ -0,0 +1,20 @@
+// GLSL Utility: A utility class for loading GLSL shaders, for Patrick Cozzi's CIS565: GPU Computing at the University of Pennsylvania
+// Written by Varun Sampath and Patrick Cozzi, Copyright (c) 2012 University of Pennsylvania
+
+#ifndef GLSLUTILITY_H_
+#define GLSLUTILITY_H_
+
+#ifdef __APPLE__
+	#include <GL/glfw.h>
+#else
+	#include <GL/glew.h>
+#endif
+
+namespace glslUtility
+{
+
+GLuint createProgram(const char *vertexShaderPath, const char *fragmentShaderPath, const char *attributeLocations[], GLuint numberOfLocations);
+
+}
+ 
+#endif
\ No newline at end of file
diff --git a/src/image.cpp b/src/image.cpp
index 1e0e072..d4089ce 100644
--- a/src/image.cpp
+++ b/src/image.cpp
@@ -4,9 +4,8 @@
 //       Yining Karl Li's TAKUA Render, a massively parallel pathtracing renderer: http://www.yiningkarlli.com
 
 #include <iostream>
-#include <stb_image/stb_image_write.h>
-
 #include "image.h"
+#include "stb_image/stb_image_write.h"
 #include "utilities.h"
 
 image::image(int x, int y){
@@ -39,7 +38,7 @@ image::~image(){
 //------------------------
 
 float image::applyGamma(float f){
-    //apply gamma correction, use simple power law gamma for now.
+    //apply gamma correction, use simple power law gamma for now. TODO: sRGB
     return pow(f/float(gamma.divisor), gamma.gamma);
 }
 
diff --git a/src/interactions.h b/src/interactions.h
index 7bf6fab..bbca9f3 100644
--- a/src/interactions.h
+++ b/src/interactions.h
@@ -18,7 +18,7 @@ struct AbsorptionAndScatteringProperties{
     float reducedScatteringCoefficient;
 };
 
-// Forward declaration
+//forward declaration
 __host__ __device__ bool calculateScatterAndAbsorption(ray& r, float& depth, AbsorptionAndScatteringProperties& currentAbsorptionAndScattering, glm::vec3& unabsorbedColor, material m, float randomFloatForScatteringDistance, float randomFloat2, float randomFloat3);
 __host__ __device__ glm::vec3 getRandomDirectionInSphere(float xi1, float xi2);
 __host__ __device__ glm::vec3 calculateTransmission(glm::vec3 absorptionCoefficient, float distance);
@@ -27,29 +27,29 @@ __host__ __device__ glm::vec3 calculateReflectionDirection(glm::vec3 normal, glm
 __host__ __device__ Fresnel calculateFresnel(glm::vec3 normal, glm::vec3 incident, float incidentIOR, float transmittedIOR, glm::vec3 reflectionDirection, glm::vec3 transmissionDirection);
 __host__ __device__ glm::vec3 calculateRandomDirectionInHemisphere(glm::vec3 normal, float xi1, float xi2);
 
-// TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
+//TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
 __host__ __device__ glm::vec3 calculateTransmission(glm::vec3 absorptionCoefficient, float distance) {
   return glm::vec3(0,0,0);
 }
 
-// TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
+//TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
 __host__ __device__ bool calculateScatterAndAbsorption(ray& r, float& depth, AbsorptionAndScatteringProperties& currentAbsorptionAndScattering,
                                                         glm::vec3& unabsorbedColor, material m, float randomFloatForScatteringDistance, float randomFloat2, float randomFloat3){
   return false;
 }
 
-// TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
+//TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
 __host__ __device__ glm::vec3 calculateTransmissionDirection(glm::vec3 normal, glm::vec3 incident, float incidentIOR, float transmittedIOR) {
   return glm::vec3(0,0,0);
 }
 
-// TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
+//TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
 __host__ __device__ glm::vec3 calculateReflectionDirection(glm::vec3 normal, glm::vec3 incident) {
   //nothing fancy here
   return glm::vec3(0,0,0);
 }
 
-// TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
+//TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
 __host__ __device__ Fresnel calculateFresnel(glm::vec3 normal, glm::vec3 incident, float incidentIOR, float transmittedIOR, glm::vec3 reflectionDirection, glm::vec3 transmissionDirection) {
   Fresnel fresnel;
 
@@ -58,16 +58,16 @@ __host__ __device__ Fresnel calculateFresnel(glm::vec3 normal, glm::vec3 inciden
   return fresnel;
 }
 
-// LOOK: This function demonstrates cosine weighted random direction generation in a sphere!
+//LOOK: This function demonstrates cosine weighted random direction generation in a sphere!
 __host__ __device__ glm::vec3 calculateRandomDirectionInHemisphere(glm::vec3 normal, float xi1, float xi2) {
     
-    // Crucial difference between this and calculateRandomDirectionInSphere: THIS IS COSINE WEIGHTED!
+    //crucial difference between this and calculateRandomDirectionInSphere: THIS IS COSINE WEIGHTED!
     
     float up = sqrt(xi1); // cos(theta)
     float over = sqrt(1 - up * up); // sin(theta)
     float around = xi2 * TWO_PI;
     
-    // Find a direction that is not the normal based off of whether or not the normal's components are all equal to sqrt(1/3) or whether or not at least one component is less than sqrt(1/3). Learned this trick from Peter Kutz.
+    //Find a direction that is not the normal based off of whether or not the normal's components are all equal to sqrt(1/3) or whether or not at least one component is less than sqrt(1/3). Learned this trick from Peter Kutz.
     
     glm::vec3 directionNotNormal;
     if (abs(normal.x) < SQRT_OF_ONE_THIRD) {
@@ -78,7 +78,7 @@ __host__ __device__ glm::vec3 calculateRandomDirectionInHemisphere(glm::vec3 nor
       directionNotNormal = glm::vec3(0, 0, 1);
     }
     
-    // Use not-normal direction to generate two perpendicular directions
+    //Use not-normal direction to generate two perpendicular directions
     glm::vec3 perpendicularDirection1 = glm::normalize(glm::cross(normal, directionNotNormal));
     glm::vec3 perpendicularDirection2 = glm::normalize(glm::cross(normal, perpendicularDirection1));
     
@@ -86,16 +86,20 @@ __host__ __device__ glm::vec3 calculateRandomDirectionInHemisphere(glm::vec3 nor
     
 }
 
-// TODO: IMPLEMENT THIS FUNCTION
-// Now that you know how cosine weighted direction generation works, try implementing 
-// non-cosine (uniform) weighted random direction generation.
-// This should be much easier than if you had to implement calculateRandomDirectionInHemisphere.
+//TODO: IMPLEMENT THIS FUNCTION
+//Now that you know how cosine weighted direction generation works, try implementing non-cosine (uniform) weighted random direction generation.
+//This should be much easier than if you had to implement calculateRandomDirectionInHemisphere.
+//TODO: IMPLEMENT THIS FUNCTION
+//Generates a random point on a given sphere
 __host__ __device__ glm::vec3 getRandomDirectionInSphere(float xi1, float xi2) {
-  return glm::vec3(0,0,0);
-}
+  float x = 2*3.14159*xi1;
+  float y = 2*xi2-1;
+  float z = sqrt(1-y*y);
 
-// TODO (PARTIALLY OPTIONAL): IMPLEMENT THIS FUNCTION
-// Returns 0 if diffuse scatter, 1 if reflected, 2 if transmitted.
+  return glm::vec3(y,z*cos(x), z*sin(x));
+}
+//TODO (PARTIALLY OPTIONAL): IMPLEMENT THIS FUNCTION
+//returns 0 if diffuse scatter, 1 if reflected, 2 if transmitted.
 __host__ __device__ int calculateBSDF(ray& r, glm::vec3 intersect, glm::vec3 normal, glm::vec3 emittedColor,
                                        AbsorptionAndScatteringProperties& currentAbsorptionAndScattering,
                                        glm::vec3& color, glm::vec3& unabsorbedColor, material m){
diff --git a/src/intersections.h b/src/intersections.h
index c9eafb6..134a62c 100644
--- a/src/intersections.h
+++ b/src/intersections.h
@@ -6,14 +6,13 @@
 #ifndef INTERSECTIONS_H
 #define INTERSECTIONS_H
 
-#include <glm/glm.hpp>
-#include <thrust/random.h>
-
 #include "sceneStructs.h"
 #include "cudaMat4.h"
+#include "glm/glm.hpp"
 #include "utilities.h"
+#include <thrust/random.h>
 
-// Some forward declarations
+//Some forward declarations
 __host__ __device__ glm::vec3 getPointOnRay(ray r, float t);
 __host__ __device__ glm::vec3 multiplyMV(cudaMat4 m, glm::vec4 v);
 __host__ __device__ glm::vec3 getSignOfRay(ray r);
@@ -22,7 +21,7 @@ __host__ __device__ float boxIntersectionTest(staticGeom sphere, ray r, glm::vec
 __host__ __device__ float sphereIntersectionTest(staticGeom sphere, ray r, glm::vec3& intersectionPoint, glm::vec3& normal);
 __host__ __device__ glm::vec3 getRandomPointOnCube(staticGeom cube, float randomSeed);
 
-// Handy dandy little hashing function that provides seeds for random number generation
+//Handy dandy little hashing function that provides seeds for random number generation
 __host__ __device__ unsigned int hash(unsigned int a){
     a = (a+0x7ed55d16) + (a<<12);
     a = (a^0xc761c23c) ^ (a>>19);
@@ -33,23 +32,23 @@ __host__ __device__ unsigned int hash(unsigned int a){
     return a;
 }
 
-// Quick and dirty epsilon check
+//Quick and dirty epsilon check
 __host__ __device__ bool epsilonCheck(float a, float b){
-    if(fabs(fabs(a)-fabs(b)) < EPSILON){
+    if(fabs(fabs(a)-fabs(b))<EPSILON){
         return true;
     }else{
         return false;
     }
 }
 
-// Self explanatory
+//Self explanatory
 __host__ __device__ glm::vec3 getPointOnRay(ray r, float t){
-  return r.origin + float(t - .0001f) * glm::normalize(r.direction);
+  return r.origin + float(t-.0001)*glm::normalize(r.direction);
 }
 
-// LOOK: This is a custom function for multiplying cudaMat4 4x4 matrixes with vectors.
-// This is a workaround for GLM matrix multiplication not working properly on pre-Fermi NVIDIA GPUs.
-// Multiplies a cudaMat4 matrix and a vec4 and returns a vec3 clipped from the vec4
+//LOOK: This is a custom function for multiplying cudaMat4 4x4 matrixes with vectors.
+//This is a workaround for GLM matrix multiplication not working properly on pre-Fermi NVIDIA GPUs.
+//Multiplies a cudaMat4 matrix and a vec4 and returns a vec3 clipped from the vec4
 __host__ __device__ glm::vec3 multiplyMV(cudaMat4 m, glm::vec4 v){
   glm::vec3 r(1,1,1);
   r.x = (m.x.x*v.x)+(m.x.y*v.y)+(m.x.z*v.z)+(m.x.w*v.w);
@@ -58,27 +57,116 @@ __host__ __device__ glm::vec3 multiplyMV(cudaMat4 m, glm::vec4 v){
   return r;
 }
 
-// Gets 1/direction for a ray
+//Gets 1/direction for a ray
 __host__ __device__ glm::vec3 getInverseDirectionOfRay(ray r){
   return glm::vec3(1.0/r.direction.x, 1.0/r.direction.y, 1.0/r.direction.z);
 }
 
-// Gets sign of each component of a ray's inverse direction
+//Gets sign of each component of a ray's inverse direction
 __host__ __device__ glm::vec3 getSignOfRay(ray r){
   glm::vec3 inv_direction = getInverseDirectionOfRay(r);
   return glm::vec3((int)(inv_direction.x < 0), (int)(inv_direction.y < 0), (int)(inv_direction.z < 0));
 }
 
-// TODO: IMPLEMENT THIS FUNCTION
-// Cube intersection test, return -1 if no intersection, otherwise, distance to intersection
-__host__ __device__ float boxIntersectionTest(staticGeom box, ray r, glm::vec3& intersectionPoint, glm::vec3& normal){
+__host__ __device__ float boxIntersectionTest(staticGeom box, ray r, glm::vec3& intersectionPoint, glm::vec3& normal,
+                                              bool& inside){
+	ray objectSpaceR;
+	objectSpaceR.origin = multiplyMV(box.inverseTransform, glm::vec4(r.origin,1.0f));
+	objectSpaceR.direction = glm::normalize(multiplyMV(box.inverseTransform, glm::vec4(r.direction,0.0f)));
+	
+	glm::vec3 lll = -glm::vec3(.5,.5,.5);
+	glm::vec3 hhh =  glm::vec3(.5,.5,.5);
+	
+	glm::vec4 l = glm::vec4(lll,0.0f);
+	glm::vec4 h = glm::vec4(hhh,0.0f);
 
-    return -1;
+	float t1,t2;
+	float tNear=-10000000;
+	float tFar = 10000000;
+	float temp;
+
+  if ((objectSpaceR.direction.x)==0){
+    if (objectSpaceR.origin.x<l.x || objectSpaceR.origin.x>h.x){
+      return -1;
+    }
+  }
+
+  t1 = (l.x-objectSpaceR.origin.x)/objectSpaceR.direction.x;
+  t2 = (h.x-objectSpaceR.origin.x)/objectSpaceR.direction.x;
+
+  if (t1>t2){
+   temp=t1; t1=t2; t2=temp;
+  }
+  if (t1>tNear) tNear=t1;
+  if (t2<tFar) tFar=t2;
+  if (tNear>tFar || tFar<0) return -1;
+
+//////////////////////////////////////////////////////////////////////////
+
+  if ((objectSpaceR.direction.y)==0){
+    if (objectSpaceR.origin.y<l.y || objectSpaceR.origin.y>h.y){
+      return -1;
+    }
+  }
+
+  t1 = (l.y-objectSpaceR.origin.y)/objectSpaceR.direction.y;
+  t2 = (h.y-objectSpaceR.origin.y)/objectSpaceR.direction.y;
+
+  if (t1>t2){
+   temp=t1; t1=t2; t2=temp;
+  }
+  if (t1>tNear) tNear=t1;
+  if (t2<tFar) tFar=t2;
+  if (tNear>tFar || tFar<0) return -1;
+
+//////////////////////////////////////////////////////////////////////////
+
+  if ((objectSpaceR.direction.z)==0){
+    if (objectSpaceR.origin.z<l.z || objectSpaceR.origin.z>h.z){
+      return -1;
+    }
+  }
+
+  t1 = (l.z-objectSpaceR.origin.z)/objectSpaceR.direction.z;
+  t2 = (h.z-objectSpaceR.origin.z)/objectSpaceR.direction.z;
+
+  if (t1>t2){
+   temp=t1; t1=t2; t2=temp;
+  }
+  if (t1>tNear) tNear=t1;
+  if (t2<tFar) tFar=t2;
+  if (tNear>tFar || tFar<0) return -1;
+
+	intersectionPoint = objectSpaceR.origin + tNear*objectSpaceR.direction;
+
+  // inside = abs(intersectionPoint.x)<=0.5001f &&abs(intersectionPoint.y)<=0.5001f &&abs(intersectionPoint.z)<=0.5001f;
+  inside = tNear<0 || tFar<0;
+
+	float EPS = 0.01;
+
+	if(abs(intersectionPoint.x - l.x) < EPS) 
+		   normal = glm::vec3(-1,0,0);
+	else if(abs(intersectionPoint.x - h.x) < EPS) 
+		  normal = glm::vec3(1,0,0);
+	else if(abs(intersectionPoint.y - l.y) < EPS) 
+		  normal = glm::vec3(0,-1,0);
+	else if(abs(intersectionPoint.y - h.y) < EPS) 
+		  normal = glm::vec3(0,1,0);
+	else if(abs(intersectionPoint.z - l.z) < EPS) 
+		  normal = glm::vec3(0,0,1);
+	else if(abs(intersectionPoint.z - h.z) < EPS) 
+		  normal = glm::vec3(0,0,-1);
+
+	intersectionPoint = multiplyMV(box.transform, glm::vec4(intersectionPoint,1));
+	normal = glm::normalize(multiplyMV(box.transform, glm::vec4(normal,0)));
+	
+	return glm::length(intersectionPoint-r.origin);
 }
 
-// LOOK: Here's an intersection test example from a sphere. Now you just need to figure out cube and, optionally, triangle.
-// Sphere intersection test, return -1 if no intersection, otherwise, distance to intersection
-__host__ __device__ float sphereIntersectionTest(staticGeom sphere, ray r, glm::vec3& intersectionPoint, glm::vec3& normal){
+//LOOK: Here's an intersection test example from a sphere. Now you just need to figure out cube and, optionally, triangle.
+//Sphere intersection test, return -1 if no intersection, otherwise, distance to intersection
+__host__ __device__ float sphereIntersectionTest(staticGeom sphere, ray r, glm::vec3& intersectionPoint, glm::vec3& normal,
+                                                 bool& inside){
   
   float radius = .5;
         
@@ -107,6 +195,8 @@ __host__ __device__ float sphereIntersectionTest(staticGeom sphere, ray r, glm::
       t = max(t1, t2);
   }
 
+  inside = t1<0 || t2<0;
+
   glm::vec3 realIntersectionPoint = multiplyMV(sphere.transform, glm::vec4(getPointOnRay(rt, t), 1.0));
   glm::vec3 realOrigin = multiplyMV(sphere.transform, glm::vec4(0,0,0,1));
 
@@ -116,7 +206,85 @@ __host__ __device__ float sphereIntersectionTest(staticGeom sphere, ray r, glm::
   return glm::length(r.origin - realIntersectionPoint);
 }
 
-// Returns x,y,z half-dimensions of tightest bounding box
+__host__ __device__ float meshIntersectionTest(const staticGeom& mesh, glm::vec3* meshTris, glm::vec3* meshVerts, ray r, glm::vec3& intersectionPoint, glm::vec3& normal,
+                                                 bool& inside){
+ 
+  float minDist = 1000000;
+  float result = -1;
+
+  glm::vec3 ro = multiplyMV(mesh.inverseTransform, glm::vec4(r.origin,1.0f));
+  glm::vec3 rd = glm::normalize(multiplyMV(mesh.inverseTransform, glm::vec4(r.direction,0.0f)));
+
+  ray rt; rt.origin = ro; rt.direction = rd;
+
+  glm::vec3 l = mesh.obj.bl;
+  glm::vec3 h = mesh.obj.bh;
+  float t1,t2;
+  float tNear=-10000000;
+  float tFar = 10000000;
+  float temp;
+
+  if ((rt.direction.x)==0){
+    if (rt.origin.x<l.x || rt.origin.x>h.x){
+      return -1;
+    }
+  }
+
+  t1 = (l.x-rt.origin.x)/rt.direction.x;
+  t2 = (h.x-rt.origin.x)/rt.direction.x;
+
+  if (t1>t2){
+   temp=t1; t1=t2; t2=temp;
+  }
+  if (t1>tNear) tNear=t1;
+  if (t2<tFar) tFar=t2;
+  if (tNear>tFar || tFar<0) return -1;
+
+ for (int i=0; i<mesh.obj.numTris; i++){
+    glm::vec3 face = meshTris[i];
+    int v0I = face.x;
+    int v1I = face.y;
+    int v2I = face.z;
+
+    glm::vec3 v0 = meshVerts[int(v0I)];
+    glm::vec3 v1 = meshVerts[int(v1I)];
+    glm::vec3 v2 = meshVerts[int(v2I)];
+
+    glm::vec3 u = glm::normalize(v1 - v0);
+    glm::vec3 v = glm::normalize(v2 - v0);
+
+    glm::vec3 n = glm::normalize(glm::cross(u,v));
+    if (abs(glm::length(n))<0.0001){
+      continue;
+    }
+
+    float t = -glm::dot(rt.origin-v1, n)/glm::dot(rt.direction,n);
+    glm::vec3 x = rt.origin + t*rt.direction;
+
+    bool test1 = glm::dot(glm::cross((v1-v0),(x-v0)),n) >= 0;
+    if (!test1) continue;
+    bool test2 = glm::dot(glm::cross((v2-v1),(x-v1)),n) >= 0;
+    if (!test2) continue;
+    bool test3 = glm::dot(glm::cross((v0-v2),(x-v2)),n) >= 0;
+    if (!test3) continue;
+
+    if (t<minDist && t>0){
+      minDist = t;
+
+      glm::vec3 realIntersectionPoint = multiplyMV(mesh.transform, glm::vec4(getPointOnRay(rt, t), 1.0));
+      intersectionPoint = realIntersectionPoint;
+
+      normal = multiplyMV(mesh.transform, glm::vec4(n,0));
+ 
+      result = glm::length(intersectionPoint - r.origin);
+
+      inside = glm::dot(rt.origin, intersectionPoint)>0.0f;
+    }
+  }
+  return result;
+}
+
+//returns x,y,z half-dimensions of tightest bounding box
 __host__ __device__ glm::vec3 getRadiuses(staticGeom geom){
     glm::vec3 origin = multiplyMV(geom.transform, glm::vec4(0,0,0,1));
     glm::vec3 xmax = multiplyMV(geom.transform, glm::vec4(.5,0,0,1));
@@ -128,43 +296,43 @@ __host__ __device__ glm::vec3 getRadiuses(staticGeom geom){
     return glm::vec3(xradius, yradius, zradius);
 }
 
-// LOOK: Example for generating a random point on an object using thrust.
-// Generates a random point on a given cube
+//LOOK: Example for generating a random point on an object using thrust.
+//Generates a random point on a given cube
 __host__ __device__ glm::vec3 getRandomPointOnCube(staticGeom cube, float randomSeed){
 
     thrust::default_random_engine rng(hash(randomSeed));
     thrust::uniform_real_distribution<float> u01(0,1);
     thrust::uniform_real_distribution<float> u02(-0.5,0.5);
 
-    // Get surface areas of sides
+    //get surface areas of sides
     glm::vec3 radii = getRadiuses(cube);
     float side1 = radii.x * radii.y * 4.0f; //x-y face
     float side2 = radii.z * radii.y * 4.0f; //y-z face
     float side3 = radii.x * radii.z* 4.0f; //x-z face
     float totalarea = 2.0f * (side1+side2+side3);
     
-    // Pick random face, weighted by surface area
+    //pick random face, weighted by surface area
     float russianRoulette = (float)u01(rng);
     
     glm::vec3 point = glm::vec3(.5,.5,.5);
     
     if(russianRoulette<(side1/totalarea)){
-        // x-y face
+        //x-y face
         point = glm::vec3((float)u02(rng), (float)u02(rng), .5);
     }else if(russianRoulette<((side1*2)/totalarea)){
-        // x-y-back face
+        //x-y-back face
         point = glm::vec3((float)u02(rng), (float)u02(rng), -.5);
     }else if(russianRoulette<(((side1*2)+(side2))/totalarea)){
-        // y-z face
+        //y-z face
         point = glm::vec3(.5, (float)u02(rng), (float)u02(rng));
     }else if(russianRoulette<(((side1*2)+(side2*2))/totalarea)){
-        // y-z-back face
+        //y-z-back face
         point = glm::vec3(-.5, (float)u02(rng), (float)u02(rng));
     }else if(russianRoulette<(((side1*2)+(side2*2)+(side3))/totalarea)){
-        // x-z face
+        //x-z face
         point = glm::vec3((float)u02(rng), .5, (float)u02(rng));
     }else{
-        // x-z-back face
+        //x-z-back face
         point = glm::vec3((float)u02(rng), -.5, (float)u02(rng));
     }
     
@@ -174,13 +342,6 @@ __host__ __device__ glm::vec3 getRandomPointOnCube(staticGeom cube, float random
        
 }
 
-// TODO: IMPLEMENT THIS FUNCTION
-// Generates a random point on a given sphere
-__host__ __device__ glm::vec3 getRandomPointOnSphere(staticGeom sphere, float randomSeed){
-
-  return glm::vec3(0,0,0);
-}
-
 #endif
 
 
diff --git a/src/main.cpp b/src/main.cpp
index 05eae28..9cece93 100755
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -6,14 +6,21 @@
 //       Yining Karl Li's TAKUA Render, a massively parallel pathtracing renderer: http://www.yiningkarlli.com
 
 #include "main.h"
-#include <cstring>
-#define GLEW_STATIC
 
 //-------------------------------
 //-------------MAIN--------------
 //-------------------------------
 
 int main(int argc, char** argv){
+
+  #ifdef __APPLE__
+	  // Needed in OSX to force use of OpenGL3.2 
+	  glfwOpenWindowHint(GLFW_OPENGL_VERSION_MAJOR, 3);
+	  glfwOpenWindowHint(GLFW_OPENGL_VERSION_MINOR, 2);
+	  glfwOpenWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
+	  glfwOpenWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
+  #endif
+
   // Set up pathtracer stuff
   bool loadedScene = false;
   finishedRender = false;
@@ -21,6 +28,9 @@ int main(int argc, char** argv){
   targetFrame = 0;
   singleFrameMode = false;
 
+  srand(time(NULL));
+  total_time = 0;
+
   // Load scene file
   for(int i=1; i<argc; i++){
     string header; string data;
@@ -46,39 +56,47 @@ int main(int argc, char** argv){
   width = renderCam->resolution[0];
   height = renderCam->resolution[1];
 
-  if(targetFrame >= renderCam->frames){
+  if(targetFrame>=renderCam->frames){
     cout << "Warning: Specified target frame is out of range, defaulting to frame 0." << endl;
     targetFrame = 0;
   }
 
-  // Initialize CUDA and GL components
-  if (init(argc, argv)) {
-    // GLFW main loop
-    mainLoop();
-  }
+  // Launch CUDA/GL
 
-  return 0;
-}
+  #ifdef __APPLE__
+	init();
+  #else
+	init(argc, argv);
+  #endif
 
-void mainLoop() {
-  while(!glfwWindowShouldClose(window)){
-    glfwPollEvents();
-    runCuda();
+  initCuda();
 
-    string title = "CIS565 Render | " + utilityCore::convertIntToString(iterations) + " Iterations";
-		glfwSetWindowTitle(window, title.c_str());
-    
-    glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo);
-    glBindTexture(GL_TEXTURE_2D, displayImage);
-    glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
-    glClear(GL_COLOR_BUFFER_BIT);   
+  initVAO();
+  initTextures();
 
-    // VAO, shader program, and texture already bound
-    glDrawElements(GL_TRIANGLES, 6,  GL_UNSIGNED_SHORT, 0);
-    glfwSwapBuffers(window);
-  }
-  glfwDestroyWindow(window);
-  glfwTerminate();
+  GLuint passthroughProgram;
+  passthroughProgram = initShader("shaders/passthroughVS.glsl", "shaders/passthroughFS.glsl");
+
+  glUseProgram(passthroughProgram);
+  glActiveTexture(GL_TEXTURE0);
+
+  #ifdef __APPLE__
+	  // send into GLFW main loop
+	  while(1){
+		display();
+		if (glfwGetKey(GLFW_KEY_ESC) == GLFW_PRESS || !glfwGetWindowParam( GLFW_OPENED )){
+				shut_down(0);
+		}
+	  }
+
+	  glfwTerminate();
+  #else
+	  glutDisplayFunc(display);
+	  glutKeyboardFunc(keyboard);
+
+	  glutMainLoop();
+  #endif
+  return 0;
 }
 
 //-------------------------------
@@ -89,45 +107,73 @@ void runCuda(){
 
   // Map OpenGL buffer object for writing from CUDA on a single GPU
   // No data is moved (Win & Linux). When mapped to CUDA, OpenGL should not use this buffer
-  
-  if(iterations < renderCam->iterations){
+
+  if(iterations<renderCam->iterations){
+
     uchar4 *dptr=NULL;
     iterations++;
     cudaGLMapBufferObject((void**)&dptr, pbo);
   
-    // pack geom and material arrays
+    //pack geom and material arrays
     geom* geoms = new geom[renderScene->objects.size()];
     material* materials = new material[renderScene->materials.size()];
-    
-    for (int i=0; i < renderScene->objects.size(); i++) {
+    mesh meshTest;
+
+    for(int i=0; i<renderScene->objects.size(); i++){
       geoms[i] = renderScene->objects[i];
+      if (geoms[i].type==MESH){
+        meshTest = geoms[i].obj;
+      }
     }
-    for (int i=0; i < renderScene->materials.size(); i++) {
+    for(int i=0; i<renderScene->materials.size(); i++){
       materials[i] = renderScene->materials[i];
     }
-  
+
     // execute the kernel
-    cudaRaytraceCore(dptr, renderCam, targetFrame, iterations, materials, renderScene->materials.size(), geoms, renderScene->objects.size() );
+    if (iterations==1){
+      cout<<endl<<"FRAME "<<targetFrame<<endl;
+      if (targetFrame>0){
+        cudaFreeCPU();
+      }
+        cudaInit(dptr, renderCam, targetFrame, iterations, materials, renderScene->materials.size(), geoms, renderScene->objects.size());
+    }
+    clock_t timer;
+    timer = clock();
+    // testStream(dptr, renderCam, targetFrame, iterations, materials, renderScene->materials.size(), geoms, renderScene->objects.size());
+    cudaRaytraceCoreStream(dptr, renderCam, targetFrame, iterations, renderScene->materials.size(), geoms, renderScene->objects.size());
+    // cudaRaytraceCore(dptr, renderCam, targetFrame, iterations, materials, renderScene->materials.size(), geoms, renderScene->objects.size());
+    timer = clock()-timer;
+    
+    total_time = total_time + float(timer)/CLOCKS_PER_SEC;
+
+    // cout<<float(timer)/CLOCKS_PER_SEC<<endl;
     
+    delete [] geoms;
+    delete [] materials;
+
     // unmap buffer object
     cudaGLUnmapBufferObject(pbo);
-  } else {
 
-    if (!finishedRender) {
-      // output image file
+  }else{
+
+    if(!finishedRender){
+      //output image file
       image outputImage(renderCam->resolution.x, renderCam->resolution.y);
 
-      for (int x=0; x < renderCam->resolution.x; x++) {
-        for (int y=0; y < renderCam->resolution.y; y++) {
+      for(int x=0; x<renderCam->resolution.x; x++){
+        for(int y=0; y<renderCam->resolution.y; y++){
           int index = x + (y * renderCam->resolution.x);
           outputImage.writePixelRGB(renderCam->resolution.x-1-x,y,renderCam->image[index]);
         }
       }
-      
+
+      cout<<"total time ("<<renderCam->iterations<<" iterations): "<<total_time<<endl;
+      cout<<"average time per iteration: "<<total_time/renderCam->iterations<<endl;
+
       gammaSettings gamma;
       gamma.applyGamma = true;
-      gamma.gamma = 1.0;
-      gamma.divisor = 1.0; 
+      gamma.gamma = 1.0/2.2;
+      gamma.divisor = renderCam->iterations;
       outputImage.setGammaSettings(gamma);
       string filename = renderCam->imageName;
       string s;
@@ -138,99 +184,172 @@ void runCuda(){
       utilityCore::replaceString(filename, ".png", "."+s+".png");
       outputImage.saveImageRGB(filename);
       cout << "Saved frame " << s << " to " << filename << endl;
+
+      gamma.gamma = 1.0/1.0;
+      outputImage.setGammaSettings(gamma);
+      filename = renderCam->imageName;
+      utilityCore::replaceString(filename, ".bmp", "_raw."+s+".bmp");
+      utilityCore::replaceString(filename, ".png", "_raw."+s+".png");
+      outputImage.saveImageRGB(filename);
+      cout << "Saved frame " << s << " to " << filename << endl;
+
       finishedRender = true;
-      if (singleFrameMode==true) {
+      if(singleFrameMode==true){
         cudaDeviceReset(); 
         exit(0);
       }
     }
-    if (targetFrame < renderCam->frames-1) {
-
-      // clear image buffer and move onto next frame
+    if(targetFrame<renderCam->frames-1){
+      //clear image buffer and move onto next frame
       targetFrame++;
       iterations = 0;
       for(int i=0; i<renderCam->resolution.x*renderCam->resolution.y; i++){
         renderCam->image[i] = glm::vec3(0,0,0);
       }
-      cudaDeviceReset(); 
+      // cudaDeviceReset(); 
       finishedRender = false;
     }
   }
+  
 }
 
-//-------------------------------
-//----------SETUP STUFF----------
-//-------------------------------
+#ifdef __APPLE__
 
-bool init(int argc, char* argv[]) {
-  glfwSetErrorCallback(errorCallback);
+	void display(){
+		runCuda();
 
-  if (!glfwInit()) {
-      return false;
-  }
+		string title = "CIS565 Render | " + utilityCore::convertIntToString(iterations) + " Iterations";
+		glfwSetWindowTitle(title.c_str());
 
-  width = 800;
-  height = 800;
-  window = glfwCreateWindow(width, height, "CIS 565 Pathtracer", NULL, NULL);
-  if (!window){
-      glfwTerminate();
-      return false;
-  }
-  glfwMakeContextCurrent(window);
-  glfwSetKeyCallback(window, keyCallback);
+		glBindBuffer( GL_PIXEL_UNPACK_BUFFER, pbo);
+		glBindTexture(GL_TEXTURE_2D, displayImage);
+		glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, 
+			  GL_RGBA, GL_UNSIGNED_BYTE, NULL);
 
-  // Set up GL context
-  glewExperimental = GL_TRUE;
-  if(glewInit()!=GLEW_OK){
-    return false;
-  }
+		glClear(GL_COLOR_BUFFER_BIT);   
 
-  // Initialize other stuff
-  initVAO();
-  initTextures();
-  initCuda();
-  initPBO();
-  
-  GLuint passthroughProgram;
-  passthroughProgram = initShader();
+		// VAO, shader program, and texture already bound
+		glDrawElements(GL_TRIANGLES, 6,  GL_UNSIGNED_SHORT, 0);
 
-  glUseProgram(passthroughProgram);
-  glActiveTexture(GL_TEXTURE0);
+		glfwSwapBuffers();
+	}
 
-  return true;
-}
+#else
 
-void initPBO(){
-  // set up vertex data parameter
-  int num_texels = width*height;
-  int num_values = num_texels * 4;
-  int size_tex_data = sizeof(GLubyte) * num_values;
-    
-  // Generate a buffer ID called a PBO (Pixel Buffer Object)
-  glGenBuffers(1, &pbo);
+	void display(){
+		runCuda();
+
+		string title = "565Raytracer | " + utilityCore::convertIntToString(iterations) + " Iterations";
+		glutSetWindowTitle(title.c_str());
+
+		glBindBuffer( GL_PIXEL_UNPACK_BUFFER, pbo);
+		glBindTexture(GL_TEXTURE_2D, displayImage);
+		glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, 
+			  GL_RGBA, GL_UNSIGNED_BYTE, NULL);
+
+		glClear(GL_COLOR_BUFFER_BIT);   
+
+		// VAO, shader program, and texture already bound
+		glDrawElements(GL_TRIANGLES, 6,  GL_UNSIGNED_SHORT, 0);
+
+		glutPostRedisplay();
+		glutSwapBuffers();
+	}
+
+	void keyboard(unsigned char key, int x, int y)
+	{
+		std::cout << key << std::endl;
+		switch (key) 
+		{
+		   case(27):
+			   exit(1);
+			   break;
+		}
+	}
+
+#endif
 
-  // Make this the current UNPACK buffer (OpenGL is state-based)
-  glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo);
 
-  // Allocate data for the buffer. 4-channel 8-bit image
-  glBufferData(GL_PIXEL_UNPACK_BUFFER, size_tex_data, NULL, GL_DYNAMIC_COPY);
-  cudaGLRegisterBufferObject(pbo);
 
+
+//-------------------------------
+//----------SETUP STUFF----------
+//-------------------------------
+
+#ifdef __APPLE__
+	void init(){
+
+		if (glfwInit() != GL_TRUE){
+			shut_down(1);      
+		}
+
+		// 16 bit color, no depth, alpha or stencil buffers, windowed
+		if (glfwOpenWindow(width, height, 5, 6, 5, 0, 0, 0, GLFW_WINDOW) != GL_TRUE){
+			shut_down(1);
+		}
+
+		// Set up vertex array object, texture stuff
+		initVAO();
+		initTextures();
+	}
+#else
+	void init(int argc, char* argv[]){
+		glutInit(&argc, argv);
+		glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGBA);
+		glutInitWindowSize(width, height);
+		glutCreateWindow("565Raytracer");
+
+		// Init GLEW
+		glewInit();
+		GLenum err = glewInit();
+		if (GLEW_OK != err)
+		{
+			/* Problem: glewInit failed, something is seriously wrong. */
+			std::cout << "glewInit failed, aborting." << std::endl;
+			exit (1);
+		}
+
+		initVAO();
+		initTextures();
+	}
+#endif
+
+void initPBO(GLuint* pbo){
+  if (pbo) {
+    // set up vertex data parameter
+    int num_texels = width*height;
+    int num_values = num_texels * 4;
+    int size_tex_data = sizeof(GLubyte) * num_values;
+    
+    // Generate a buffer ID called a PBO (Pixel Buffer Object)
+    glGenBuffers(1,pbo);
+    // Make this the current UNPACK buffer (OpenGL is state-based)
+    glBindBuffer(GL_PIXEL_UNPACK_BUFFER, *pbo);
+    // Allocate data for the buffer. 4-channel 8-bit image
+    glBufferData(GL_PIXEL_UNPACK_BUFFER, size_tex_data, NULL, GL_DYNAMIC_COPY);
+    cudaGLRegisterBufferObject( *pbo );
+  }
 }
 
 void initCuda(){
-  cudaGLSetGLDevice(0);
+  // Use device with highest Gflops/s
+  cudaGLSetGLDevice( compat_getMaxGflopsDeviceId() );
+
+  initPBO(&pbo);
 
   // Clean up on program exit
   atexit(cleanupCuda);
+
+  runCuda();
 }
 
 void initTextures(){
-    glGenTextures(1, &displayImage);
+    glGenTextures(1,&displayImage);
     glBindTexture(GL_TEXTURE_2D, displayImage);
     glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
     glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
-    glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
+    glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_BGRA,
+        GL_UNSIGNED_BYTE, NULL);
 }
 
 void initVAO(void){
@@ -269,18 +388,18 @@ void initVAO(void){
     glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);
 }
 
-GLuint initShader() {
-  const char *attribLocations[] = { "Position", "Texcoords" };
-  GLuint program = glslUtility::createDefaultProgram(attribLocations, 2);
-  GLint location;
-  
-  //glUseProgram(program);
-  if ((location = glGetUniformLocation(program, "u_image")) != -1)
-  {
-    glUniform1i(location, 0);
-  }
+GLuint initShader(const char *vertexShaderPath, const char *fragmentShaderPath){
+    GLuint program = glslUtility::createProgram(vertexShaderPath, fragmentShaderPath, attributeLocations, 2);
+    GLint location;
+
+    glUseProgram(program);
+    
+    if ((location = glGetUniformLocation(program, "u_image")) != -1)
+    {
+        glUniform1i(location, 0);
+    }
 
-  return program;
+    return program;
 }
 
 //-------------------------------
@@ -308,17 +427,17 @@ void deleteTexture(GLuint* tex){
     glDeleteTextures(1, tex);
     *tex = (GLuint)NULL;
 }
-
-//------------------------------
-//-------GLFW CALLBACKS---------
-//------------------------------
-
-void errorCallback(int error, const char* description){
-    fputs(description, stderr);
-}
-
-void keyCallback(GLFWwindow* window, int key, int scancode, int action, int mods){
-    if(key == GLFW_KEY_ESCAPE && action == GLFW_PRESS){
-        glfwSetWindowShouldClose(window, GL_TRUE);
+ 
+void shut_down(int return_code){
+  #ifdef __APPLE__
+	glfwTerminate();
+  #endif
+  for (int i=0; i<renderScene->objects.size(); i++){
+    if (renderScene->objects[i].type == MESH){
+      delete [] renderScene->objects[i].obj.tris;
+      delete [] renderScene->objects[i].obj.verts;
     }
+  }
+  cudaFreeCPU();
+  exit(return_code);
 }
diff --git a/src/main.h b/src/main.h
index 8bd2aed..f85141f 100644
--- a/src/main.h
+++ b/src/main.h
@@ -8,24 +8,32 @@
 #ifndef MAIN_H
 #define MAIN_H
 
-#include <GL/glew.h>
-#include <GLFW/glfw3.h>
+#ifdef __APPLE__
+	#include <GL/glfw.h>
+#else
+	#include <GL/glew.h>
+	#include <GL/glut.h>
+#endif
 
+#include <stdlib.h>
 #include <cuda_runtime.h>
 #include <cuda_gl_interop.h>
-#include <fstream>
-#include <glm/glm.hpp>
-#include <glslUtil/glslUtility.hpp>
+#include <string>
 #include <iostream>
 #include <sstream>
-#include <stdlib.h>
-#include <string>
-
+#include <fstream>
+#include "glslUtility.h"
 #include "sceneStructs.h"
+#include "glm/glm.hpp"
 #include "image.h"
 #include "raytraceKernel.h"
 #include "utilities.h"
 #include "scene.h"
+#include <time.h>
+
+ #include <cutil_inline.h>
+ #include <cutil_gl_inline.h>
+#define compat_getMaxGflopsDeviceId() cutGetMaxGflopsDeviceId()
 
 using namespace std;
 
@@ -39,6 +47,7 @@ int targetFrame;
 int iterations;
 bool finishedRender;
 bool singleFrameMode;
+float total_time;
 
 //-------------------------------
 //------------GL STUFF-----------
@@ -46,17 +55,15 @@ bool singleFrameMode;
 
 GLuint positionLocation = 0;
 GLuint texcoordsLocation = 1;
-GLuint pbo;
+const char *attributeLocations[] = { "Position", "Tex" };
+GLuint pbo = (GLuint)NULL;
 GLuint displayImage;
 
-GLFWwindow *window;
-
 //-------------------------------
 //----------CUDA STUFF-----------
 //-------------------------------
 
-int width; 
-int height;
+int width=800; int height=800;
 
 //-------------------------------
 //-------------MAIN--------------
@@ -80,12 +87,18 @@ void runCuda();
 //-------------------------------
 //----------SETUP STUFF----------
 //-------------------------------
-bool init(int argc, char* argv[]);
-void initPBO();
+
+#ifdef __APPLE__
+	void init();
+#else
+	void init(int argc, char* argv[]);
+#endif
+
+void initPBO(GLuint* pbo);
 void initCuda();
 void initTextures();
 void initVAO();
-GLuint initShader();
+GLuint initShader(const char *vertexShaderPath, const char *fragmentShaderPath);
 
 //-------------------------------
 //---------CLEANUP STUFF---------
@@ -94,12 +107,6 @@ GLuint initShader();
 void cleanupCuda();
 void deletePBO(GLuint* pbo);
 void deleteTexture(GLuint* tex);
-
-//------------------------------
-//-------GLFW CALLBACKS---------
-//------------------------------
-void mainLoop();
-void errorCallback(int error, const char *description);
-void keyCallback(GLFWwindow *window, int key, int scancode, int action, int mods);
+void shut_down(int return_code);
 
 #endif
diff --git a/src/raytraceKernel.cu b/src/raytraceKernel.cu
index 9c7bc7d..fb92bc5 100644
--- a/src/raytraceKernel.cu
+++ b/src/raytraceKernel.cu
@@ -8,13 +8,65 @@
 #include <stdio.h>
 #include <cuda.h>
 #include <cmath>
-
 #include "sceneStructs.h"
 #include "glm/glm.hpp"
 #include "utilities.h"
 #include "raytraceKernel.h"
 #include "intersections.h"
 #include "interactions.h"
+#include <vector>
+#include <iostream>
+
+struct intersection{
+  int objectID; //staticGeom object;
+  glm::vec3 intersectionPoint;
+  glm::vec3 normal;
+  float distance;
+  bool inside;
+};
+
+struct rayBounce{
+  bool alive;
+  ray r;
+  int index;
+  glm::vec3 color;
+  intersection hit;
+
+  rayBounce(){
+    alive=true;
+    index=0;
+    color=glm::vec3(1,1,1);
+  }
+
+  rayBounce(int i){
+    alive=true;
+    index=i;
+    color=glm::vec3(1,1,1);
+  }
+};
+
+cameraData cam;
+glm::vec3* cudaimage;
+staticGeom* geomList;
+int nLights;
+staticGeom* cudageoms;
+material* cudaMats;
+staticGeom* lightList;
+staticGeom* cudaLights;
+glm::vec3** cudamesh;
+glm::vec3* cudameshverts;
+glm::vec3* cudameshtris;
+glm::vec3** meshList;
+int nGeoms;
+
+rayBounce *raysA, *raysB, *cpuRays;
+int *indicesA, *indicesB, *cpuIndices;
+
+#if CUDA_VERSION >= 5000
+    #include <helper_math.h>
+#else
+    #include <cutil_math.h>
+#endif
 
 void checkCUDAError(const char *msg) {
   cudaError_t err = cudaGetLastError();
@@ -24,26 +76,6 @@ void checkCUDAError(const char *msg) {
   }
 } 
 
-// LOOK: This function demonstrates how to use thrust for random number generation on the GPU!
-// Function that generates static.
-__host__ __device__ glm::vec3 generateRandomNumberFromThread(glm::vec2 resolution, float time, int x, int y){
-  int index = x + (y * resolution.x);
-   
-  thrust::default_random_engine rng(hash(index*time));
-  thrust::uniform_real_distribution<float> u01(0,1);
-
-  return glm::vec3((float) u01(rng), (float) u01(rng), (float) u01(rng));
-}
-
-// TODO: IMPLEMENT THIS FUNCTION
-// Function that does the initial raycast from the camera
-__host__ __device__ ray raycastFromCameraKernel(glm::vec2 resolution, float time, int x, int y, glm::vec3 eye, glm::vec3 view, glm::vec3 up, glm::vec2 fov){
-  ray r;
-  r.origin = glm::vec3(0,0,0);
-  r.direction = glm::vec3(0,0,-1);
-  return r;
-}
-
 //Kernel that blacks out a given image buffer
 __global__ void clearImage(glm::vec2 resolution, glm::vec3* image){
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
@@ -55,7 +87,7 @@ __global__ void clearImage(glm::vec2 resolution, glm::vec3* image){
 }
 
 //Kernel that writes the image to the OpenGL PBO directly.
-__global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3* image){
+__global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3* image, float time){
   
   int x = (blockIdx.x * blockDim.x) + threadIdx.x;
   int y = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -64,9 +96,9 @@ __global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3*
   if(x<=resolution.x && y<=resolution.y){
 
       glm::vec3 color;
-      color.x = image[index].x*255.0;
-      color.y = image[index].y*255.0;
-      color.z = image[index].z*255.0;
+      color.x = image[index].x*255.0/time;
+      color.y = image[index].y*255.0/time;
+      color.z = image[index].z*255.0/time;
 
       if(color.x>255){
         color.x = 255;
@@ -88,78 +120,1170 @@ __global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3*
   }
 }
 
-// TODO: IMPLEMENT THIS FUNCTION
-// Core raytracer kernel
+//Function that generates static.
+__host__ __device__ glm::vec3 generateRandomNumberFromThread(glm::vec2 resolution, float time, int x, int y){
+  int index = x + (y * resolution.x);
+   
+  thrust::default_random_engine rng(hash(index*time));
+  thrust::uniform_real_distribution<float> u01(0,1);
+
+  return glm::vec3((float) u01(rng), (float) u01(rng), (float) u01(rng));
+}
+
+//Function that generates static.
+__host__ __device__ glm::vec3 generateRandomNumberFromThread(float seed){
+  thrust::default_random_engine rng(hash(seed));
+  thrust::uniform_real_distribution<float> u01(0,1);
+
+  return glm::vec3((float) u01(rng), (float) u01(rng), (float) u01(rng));
+}
+
+//Function that does the initial raycast from the camera
+__host__ __device__ ray raycastFromCameraKernel(glm::vec2 resolution, float time, int x, int y, glm::vec3 eye, glm::vec3 view, glm::vec3 up, glm::vec2 fov,
+                        float focusDistance, float dof){
+  ray r;
+  r.origin = eye;
+
+  glm::vec3 A = glm::cross(view,up);
+  glm::vec3 B = glm::cross(A,view);
+  float tempH = glm::length(view)*glm::tan(glm::radians(fov.x))/glm::length(A);
+  glm::vec3 H = A*tempH;
+  glm::vec3 V = B*(glm::length(view)*glm::tan(glm::radians(fov.y))/glm::length(B));
+
+  glm::vec3 d = eye+view + float(2*(x/(resolution.x-1))-1)*H + float(1-2*(y/(resolution.y-1)))*V;
+  r.direction = glm::normalize(d-eye);
+
+  //dof
+  glm::vec3 aimed = eye + focusDistance*r.direction;
+  glm::vec3 random = generateRandomNumberFromThread(resolution, time, x,y);
+  glm::vec3 start = eye + random*dof;
+  r.direction = glm::normalize(aimed-start);
+  r.origin    = start;
+
+  //anti
+  glm::vec3 jitter = generateRandomNumberFromThread(resolution, time, x,y);
+  r.direction.x+=(jitter.x-1)*.0025;
+  r.direction.y+=(jitter.y-1)*.0025;
+  r.direction.z+=(jitter.z-1)*.0025;
+  return r;
+}
+
+__host__ __device__ ray reflect(const ray& in, const intersection& hit){
+  ray out;
+  out.origin = in.origin;
+
+  out.direction = glm::normalize(in.direction - 2.0f*glm::dot(hit.normal,in.direction)*hit.normal);
+  out.origin = hit.intersectionPoint + 0.001f*out.direction;
+
+  return out;
+}
+
+__host__ __device__ ray refract(const ray& in, const intersection& hit, float n1, float n2){
+  
+  ray out;
+  
+  if (hit.inside){
+    float temp = n1;
+    n1 = n2;
+    n2 = temp;
+  }
+
+  float n = n1/n2;
+  float cosThetaI = glm::dot((in.direction),(hit.normal));
+  float sinThetaT2 = n*n*(1-cosThetaI*cosThetaI);
+  float cosThetaT = sqrt(1-sinThetaT2);
+  
+  out.origin = hit.intersectionPoint;
+  out.direction = n*in.direction + (n*cosThetaI - cosThetaT)*hit.normal;
+  out.direction = glm::normalize(out.direction);
+  out.origin += 0.001f*out.direction;
+
+  return out;
+}
+
+__host__ __device__ ray diffuseReflection(const intersection& hit, glm::vec2 resolution, float time, int x, int y){
+  ray out;
+  //reflect ray in random hemisphere direction
+  glm::vec3 seed = generateRandomNumberFromThread(resolution, time, x,y);
+  out.direction = calculateRandomDirectionInHemisphere(hit.normal, seed.x, seed.y);
+  out.origin = hit.intersectionPoint + 0.001f*out.direction;
+  return out;
+}
+
+__host__ __device__ bool intersect(ray& r, staticGeom* geoms, int numberOfGeoms, intersection& hit, glm::vec3** cudaMesh){
+  float distance = 100000000, check = -1;
+  glm::vec3 ip, n;
+  bool inside = false;
+  for (int i=0; i<numberOfGeoms; i++){
+    if (geoms[i].type==SPHERE){
+      check = sphereIntersectionTest(geoms[i], r, ip, n, inside);
+      if (check!=-1 && check<distance){
+        distance = check;
+
+        hit.objectID = i;
+        hit.intersectionPoint = ip;
+        hit.normal = n;
+        hit.distance = distance;
+        hit.inside = inside;
+      }
+    }
+    else if (geoms[i].type==CUBE){
+      check = boxIntersectionTest(geoms[i], r, ip, n, inside);
+      if (check!=-1 && check<distance){
+        distance = check;
+
+        hit.objectID = i;
+        hit.intersectionPoint = ip;
+        hit.normal = n;
+        hit.distance = distance;
+        hit.inside = inside;
+      }
+    }
+    else if (geoms[i].type==MESH){
+      check = meshIntersectionTest(geoms[i], cudaMesh[i*2], cudaMesh[i*2+1], r, ip, n, inside);
+      if (check!=-1 && check<distance){
+        distance = check;
+
+        hit.objectID = i;
+        hit.intersectionPoint = ip;
+        hit.normal = n;
+        hit.distance = distance;
+        hit.inside = inside;
+      }
+    }
+  }
+  if (distance<100000000){
+    return true;
+  }
+  else{
+    hit.distance = -1;
+    hit.intersectionPoint = glm::vec3(0,0,0);
+    hit.normal = glm::vec3(0,0,0);
+    hit.objectID = -1;
+    return false;
+  }
+}
+
+__host__ __device__ float schlick(const intersection& hit, const ray& in, float n1, float n2){
+  if (hit.inside){
+    float temp = n1;
+    n1 = n2;
+    n2 = temp;
+  }
+
+  float n = n1/n2;
+  float cosThetaI = -glm::dot((in.direction),(hit.normal));
+  float sinThetaT2 = n*n*(1-cosThetaI*cosThetaI);
+  float cosThetaT = sqrt(1-sinThetaT2);
+
+  bool tir = sinThetaT2>=1;
+
+  float R;
+  float R0 = ((n1-n2)/(n1+n2))*((n1-n2)/(n1+n2));
+
+  if (n1<=n2){
+    float tempVal = 1-cosThetaI;
+    R = R0 + (1-R0)*tempVal*tempVal*tempVal*tempVal*tempVal;
+  }
+  else if (!tir){
+    float tempVal = 1-cosThetaT;
+    R = R0 + (1-R0)*tempVal*tempVal*tempVal*tempVal*tempVal;
+  }
+  else{
+    R = 1;
+  }
+
+  return R;
+}
+
+__host__ __device__ float cookTorrance(const glm::vec3& eye, const intersection& hit, const glm::vec3& lightPos,
+                                       const material& mat, const ray& in){
+  //cook torrance
+  glm::vec3 EYE = glm::normalize(eye-hit.intersectionPoint);
+  glm::vec3 L = glm::normalize(lightPos-hit.intersectionPoint);
+  glm::vec3 H = glm::normalize(L + EYE);
+  glm::vec3 N = hit.normal;
+
+  //Beckmann distribution
+  float alpha = acos(glm::dot(N, H));
+  float m = mat.specularExponent;
+  float cosAlpha = cos(alpha);
+
+  float D = 100*exp(-(alpha*alpha)/(m*m));
+
+  float F = schlick(hit, in, 1.000293, mat.indexOfRefraction);
+
+  float G = min(1.0f, 2*glm::dot(H,N)*glm::dot(EYE,N)/glm::dot(EYE,H));
+  G = min(G, 2*glm::dot(H,N)*glm::dot(L,N)/glm::dot(EYE,H));
+
+  float result = D*F*G/(4*PI*glm::dot(EYE,N));
+  result = glm::clamp(result, 0.0f, 1.0f);
+  return result;
+}
+
+__host__ __device__ glm::vec3 diffuseWithShadow(const intersection& hit, staticGeom* geoms, int nGeoms, material* mats, int nMats, staticGeom* lights, int nLight,
+                    const glm::vec2& resolution, float time, int x, int y, const glm::vec3& view, const ray& in, const glm::vec3& eye,
+                    glm::vec3** cudaMesh){
+  glm::vec3 out(0,0,0);
+  ray r;
+  intersection shadowHit;
+  for (int i=0; i<nLight; i++){
+    r.origin = hit.intersectionPoint;
+    glm::vec3 endPoint = getRandomPointOnCube(lights[i], time*(x + (y * resolution.x)));
+    r.direction = glm::normalize(endPoint-hit.intersectionPoint);
+    r.origin += 0.001f*r.direction;
+    if (intersect(r, geoms, nGeoms, shadowHit, cudaMesh) && mats[geoms[shadowHit.objectID].materialid].emittance>0){
+      float kd = glm::dot(hit.normal, glm::normalize(lights[i].translation-hit.intersectionPoint));
+      kd = glm::clamp(kd,0.0f,1.0f);
+
+      //roughness for Oren-Nayar and Cook-Torrance
+      float s = mats[geoms[hit.objectID].materialid].roughness;
+
+      float A = 1-0.5f*s*s/(s*s+0.33f);
+      float B = 0.45f*s*s/(s*s+0.09f);
+
+      glm::vec3 toLight = glm::normalize(lights[i].translation-hit.intersectionPoint);
+      glm::vec3 toEye   = glm::normalize(eye-hit.intersectionPoint);
+
+      float thetaI = acos(kd);
+      float thetaR = acos(glm::dot(toEye,hit.normal));
+
+      float phiI = atan(toLight.y/toLight.x);
+      float phiR = atan(toEye.y/toEye.x);
+
+      float a = max(thetaR,thetaI);
+      float b = min(thetaR,thetaI);
+
+      kd = kd * (A + (B*max(0.0f, cos(phiI-phiR))*sin(a)*tan(b)));
+
+      out += kd*mats[geoms[hit.objectID].materialid].color;
+
+      // float dotBP = glm::dot(hit.normal, glm::normalize(r.direction+glm::normalize(-view)));
+      // dotBP = glm::pow(glm::clamp(dotBP,0.0f,1.0f),40.0f);
+
+      if (s>0){
+        float kSpec = cookTorrance(eye, hit, endPoint, mats[geoms[hit.objectID].materialid], in);
+        kSpec = glm::clamp(kSpec,0.0f,1.0f);
+        glm::vec3 s = mats[geoms[hit.objectID].materialid].specularColor*kSpec;
+        out += s;
+      }
+    }
+    else{
+      // return (mats[shadowHit.object.materialid].color);
+    }
+  }
+  return out;
+}
+
+__host__ __device__ glm::vec3 diffuse(const intersection& hit, staticGeom* geoms, int nGeoms, material* mats, int nMats, staticGeom* lights, int nLight,
+                    const glm::vec2& resolution, float time, int x, int y, const glm::vec3& view, const ray& in, const glm::vec3& eye){
+  glm::vec3 out(0,0,0);
+  ray r;
+  intersection shadowHit;
+  for (int i=0; i<nLight; i++){
+    r.origin = hit.intersectionPoint;
+    glm::vec3 endPoint = getRandomPointOnCube(lights[i], time*(x + (y * resolution.x)));
+    r.direction = glm::normalize(endPoint-hit.intersectionPoint);
+    r.origin += 0.001f*r.direction;
+    float kd = glm::dot(hit.normal, glm::normalize(lights[i].translation-hit.intersectionPoint));
+    kd = glm::clamp(kd,0.0f,1.0f);
+    
+    //roughness for Oren-Nayar and Cook-Torrance
+    float s = mats[geoms[hit.objectID].materialid].roughness;
+
+    float A = 1-0.5f*s*s/(s*s+0.33f);
+    float B = 0.45f*s*s/(s*s+0.09f);
+
+    glm::vec3 toLight = glm::normalize(lights[i].translation-hit.intersectionPoint);
+    glm::vec3 toEye   = glm::normalize(eye-hit.intersectionPoint);
+
+    float thetaI = acos(kd);
+    float thetaR = acos(glm::dot(toEye,hit.normal));
+
+    float phiI = atan(toLight.y/toLight.x);
+    float phiR = atan(toEye.y/toEye.x);
+
+    float a = max(thetaR,thetaI);
+    float b = min(thetaR,thetaI);
+
+    kd = kd * (A + (B*max(0.0f, cos(phiI-phiR))*sin(a)*tan(b)));
+
+    out += kd*mats[geoms[hit.objectID].materialid].color;
+
+    // float dotBP = glm::dot(hit.normal, glm::normalize(r.direction+glm::normalize(-view)));
+    // dotBP = glm::pow(glm::clamp(dotBP,0.0f,1.0f),40.0f);
+
+    if (s>0){
+      float kSpec = cookTorrance(eye, hit, endPoint, mats[geoms[hit.objectID].materialid], in);
+      kSpec = glm::clamp(kSpec,0.0f,1.0f);
+      glm::vec3 s = mats[geoms[hit.objectID].materialid].specularColor*kSpec;
+      out += s;
+    }
+  }
+  return out;
+}
+
+__host__ __device__ ray fresnel(const ray& r, const material& mat, const intersection& hit,
+                                float seed){
+  ray fresRay = r;
+
+  float n1 = 1.000293;
+  float n2 = mat.indexOfRefraction;
+
+  float R = schlick(hit, r, n1,n2);
+  float randomCheck = generateRandomNumberFromThread(seed).x;
+
+  if (randomCheck<R){
+    fresRay = reflect(fresRay,hit);
+  }
+  else{
+    fresRay = refract(fresRay,hit,n1,n2);
+  }
+
+  return fresRay;
+}
+
+__global__ void lighttraceRay(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
+                            staticGeom* geoms, int numberOfGeoms, material* mats, int nMats, 
+                            staticGeom* lights, int nLights){
+  
+  int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+  int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+  int index = x + (y * resolution.x);
+
+  staticGeom source = lights[0];
+  glm::vec3 startPoint = getRandomPointOnCube(source, time*(x + (y * resolution.x)));
+  glm::vec3 seed = generateRandomNumberFromThread(resolution, time, x,y);
+  glm::vec3 startDir = glm::normalize(getRandomDirectionInSphere(seed.x, seed.y));
+
+  ray r;
+  r.origin = startPoint + 0.001f*startDir;
+  r.direction = startDir;
+
+  glm::vec3 A = glm::cross(cam.view,cam.up);
+  glm::vec3 B = glm::cross(A,cam.view);
+  float tempH = glm::length(cam.view)*glm::tan(glm::radians(cam.fov.x))/glm::length(A);
+  glm::vec3 H = A*tempH;
+  glm::vec3 V = B*(glm::length(cam.view)*glm::tan(glm::radians(cam.fov.y))/glm::length(B));
+
+  glm::vec3 testDir = glm::normalize(startPoint - cam.position);
+  glm::vec3 p = cam.position + testDir;
+
+  glm::vec3 m = cam.position + cam.view;
+  glm::vec3 mv = m+V;
+  glm::vec3 mh = m+H;
+
+  glm::vec3 ip;
+  ip.y = mv.y-p.y;
+  ip.x = resolution.x - (mh.x-p.x);
+
+  // if ((x<=resolution.x && y<=resolution.y)){
+  //   // if (ip.x<=resolution.x && ip.y<=resolution.y){
+  //   //   colors[index]+=glm::vec3(1,1,0);
+  //   // }
+  //   // else{
+  //   //   colors[index]+=glm::vec3(0,0,1);
+  //   // }
+  //   if (abs(glm::length(p-cam.position)-1)>0.00001f || abs(glm::length(m-cam.position)-1)>0.00001f){
+  //     colors[index]+=glm::vec3(1,1,1);
+  //   }
+  // }
+  if (true){
+    index = (ip.y*resolution.x);
+    colors[index] += glm::vec3(1,0,1);
+  }
+}
+
+__global__ void pathtraceRay(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
+                            staticGeom* geoms, int numberOfGeoms, material* mats, int nMats, 
+                            staticGeom* lights, int nLights, glm::vec3** cudaMesh){
+
+    int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+    int index = x + (y * resolution.x);
+
+    if((x<=resolution.x && y<=resolution.y)){
+     
+      ray r = raycastFromCameraKernel(resolution, time, x,y, cam.position, cam.view, cam.up, cam.fov, cam.focusDistance, cam.dof);
+      intersection hit;
+      int depth=0;
+      glm::vec3 pixelColor = glm::vec3(1,1,1);
+
+      while (depth<100){
+        if (!intersect(r, geoms, numberOfGeoms, hit,cudaMesh)){
+          pixelColor = glm::vec3(0,0,0);
+          break;
+        }
+        material mat = mats[geoms[hit.objectID].materialid];
+        depth++;
+
+        if (mat.emittance>0){ //Stop tracing when you hit a light
+          pixelColor = pixelColor*mat.color*mat.emittance;
+          break;
+        }
+        else{
+          pixelColor = pixelColor*mat.color; //Non lights attenuate the color
+
+          if (mat.hasReflective && mat.hasRefractive){  //Fresnel
+            r=fresnel(r, mat, hit, index*time*(rayDepth)/2.0f);
+          }
+          else if (mat.hasReflective){ //mirror
+            r = reflect(r,hit); 
+          }
+          else if (mat.hasRefractive){ //glass
+            float n1 = 1.000293;
+            float n2 = mat.indexOfRefraction;
+
+            r = refract(r,hit,n1,n2);
+            // pixelColor = pixelColor*mat.color;
+          }
+          else{ //diffuse
+            //Add direct lighting contribution for specularity (Cook-Torrance microfacet model)
+            if(mat.specularExponent>0.0){
+              for (int i=0; i<nLights; i++){
+                glm::vec3 lightPos = getRandomPointOnCube(lights[i], time*index*(rayDepth)/2.0f);
+                float ct = cookTorrance(cam.position, hit, lightPos, mat, r);
+                pixelColor+=ct*mat.specularColor;
+              }
+            }
+            // r = diffuseReflection(hit, resolution, time, x,y);
+            ray out;
+            // reflect ray in random hemisphere direction
+            glm::vec3 seed = generateRandomNumberFromThread(time*index*(rayDepth)/2.0f);
+            out.direction = calculateRandomDirectionInHemisphere(hit.normal, seed.x, seed.y);
+            out.origin = hit.intersectionPoint + 0.001f*out.direction;
+            r = out;
+          }
+        }
+      }
+
+      colors[index] += pixelColor;
+     }
+}
+
+//Core raytracer kernel
 __global__ void raytraceRay(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
-                            staticGeom* geoms, int numberOfGeoms){
+                            staticGeom* geoms, int numberOfGeoms, material* mats, int nMats, 
+                            staticGeom* lights, int nLights, glm::vec3** cudaMesh){
 
   int x = (blockIdx.x * blockDim.x) + threadIdx.x;
   int y = (blockIdx.y * blockDim.y) + threadIdx.y;
   int index = x + (y * resolution.x);
 
   if((x<=resolution.x && y<=resolution.y)){
+    ray r = raycastFromCameraKernel(resolution, time, x,y, cam.position, cam.view, cam.up, cam.fov, cam.focusDistance, cam.dof);
+    intersection hit;
+
+    int depth=0;
+
+    float multColor[3];
+    multColor[0]=1; multColor[1]=1; multColor[2]=1;
+    while (intersect(r, geoms, numberOfGeoms, hit, cudaMesh) && depth<10){
+      material mat = mats[geoms[hit.objectID].materialid];
+      depth++;
+
+      if (mat.emittance>0){
+        colors[index] += mat.color*mat.emittance;
+        break;
+      }
+      else if (mat.hasReflective && mat.hasRefractive){
+        r=fresnel(r, mat, hit, time*index*(rayDepth+1));
+        multColor[0] = multColor[0]*mat.color.x;
+        multColor[1] = multColor[1]*mat.color.y;
+        multColor[2] = multColor[2]*mat.color.z;
+      }
+      else if (mat.hasReflective){
+        r = reflect(r,hit);
+        multColor[0] = multColor[0]*mat.color.x;
+        multColor[1] = multColor[1]*mat.color.y;
+        multColor[2] = multColor[2]*mat.color.z;
+      }
+      else if (mat.hasRefractive){
+        float n1 = 1.000293;
+        float n2 = mat.indexOfRefraction;
 
-    colors[index] = generateRandomNumberFromThread(resolution, time, x, y);
+        r = refract(r,hit,n1,n2);
+        multColor[0] = multColor[0]*mat.color.x;
+        multColor[1] = multColor[1]*mat.color.y;
+        multColor[2] = multColor[2]*mat.color.z;
+      }
+      else{ //diffuse
+        glm::vec3 diffColor = diffuseWithShadow(hit, geoms, numberOfGeoms, mats, nMats, lights, nLights, resolution, time, x, y, cam.view, r, cam.position, cudaMesh);
+        // glm::vec3 diffColor = diffuse(hit, geoms, numberOfGeoms, mats, nMats, lights, nLights, resolution, time, x, y, cam.view, r, cam.position);
+        
+        //reflected attentuation
+        diffColor.x*=multColor[0];
+        diffColor.y*=multColor[1];
+        diffColor.z*=multColor[2];
+
+        colors[index]+=diffColor;
+        
+        //ambient
+        colors[index] += 0.1f*mat.color;
+        break;
+      }
+    }
    }
 }
 
-// TODO: FINISH THIS FUNCTION
-// Wrapper for the __global__ call that sets up the kernel calls and does a ton of memory management
-void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms, int numberOfGeoms){
+//Core raytracer kernel
+__global__ void matID(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
+                            staticGeom* geoms, int numberOfGeoms, material* mats, int nMats, 
+                            staticGeom* lights, int nLights, glm::vec3** cudaMesh){
+
+  int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+  int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+  int index = x + (y * resolution.x);
+
+  if((x<=resolution.x && y<=resolution.y)){
+    ray r = raycastFromCameraKernel(resolution, time, x,y, cam.position, cam.view, cam.up, cam.fov, cam.focusDistance, cam.dof);
+    intersection hit;
+
+    int depth=0;
+
+    float multColor[3];
+    multColor[0]=1; multColor[1]=1; multColor[2]=1;
+    while (intersect(r, geoms, numberOfGeoms, hit, cudaMesh) && depth<100){
+      material mat = mats[geoms[hit.objectID].materialid];
+      depth++;
+
+      if (mat.emittance>0){
+        colors[index] += mat.color*mat.emittance;
+        break;
+      }
+      else if (mat.hasReflective){
+        r = reflect(r,hit);
+      }
+      else{ //diffuse
+        glm::vec3 diffColor = mat.color;
+        // glm::vec3 diffColor = diffuse(hit, geoms, numberOfGeoms, mats, nMats, lights, nLights, resolution, time, x, y, cam.view, r, cam.position);
+        
+        //reflected attentuation
+        diffColor.x*=multColor[0];
+        diffColor.y*=multColor[1];
+        diffColor.z*=multColor[2];
+
+        colors[index]+=diffColor;
+        break;
+      }
+    }
+   }
+}
+
+__global__ void sum(int* in, int* out, int n, int d1){
+  int k = (blockIdx.x * blockDim.x) + threadIdx.x;
   
-  int traceDepth = 1; //determines how many bounces the raytracer traces
+  if (k<n){
+    int ink = in[k];
+    if (k>=d1){
+      out[k] = in[k-d1] + ink;
+    }
+    else{
+      out[k] = ink;
+    }
+  }
+}
 
-  // set up crucial magic
-  int tileSize = 8;
-  dim3 threadsPerBlock(tileSize, tileSize);
-  dim3 fullBlocksPerGrid((int)ceil(float(renderCam->resolution.x)/float(tileSize)), (int)ceil(float(renderCam->resolution.y)/float(tileSize)));
+__global__ void shift(int* in, int* out, int n){
+  int k = (blockIdx.x * blockDim.x) + threadIdx.x;
+  
+  out[0] = 0;
+  if (k<n && k>0){
+    out[k] = in[k-1];
+  }
+}
+
+__global__ void streamCompaction(rayBounce* inRays, int* indices, rayBounce* outRays, int numRays){
+  int k = (blockIdx.x * blockDim.x) + threadIdx.x;
+
+  if (k<numRays){
+    rayBounce inRay = inRays[k];
+    if (inRay.alive){
+      outRays[indices[k]-1] = inRay;
+    }
+  }
+}
+
+__global__ void firstBounce(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
+                            staticGeom* geoms, int numberOfGeoms, material* mats, int nMats, 
+                            staticGeom* lights, int nLights, glm::vec3** cudaMesh, rayBounce* rays, int* in
+                            ){
+  
+  int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+  int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+  int index = x + (y * resolution.x);
+
+  if((x<=resolution.x && y<=resolution.y)){
+    ray r = raycastFromCameraKernel(resolution, time, x,y, cam.position, cam.view, cam.up, cam.fov, cam.focusDistance, cam.dof);
+    intersection hit;
+
+    rays[index].r=r;
+    rays[index].alive=true;
+    rays[index].index = index;
+    in[index]=1;
+
+    if (intersect(r, geoms, numberOfGeoms, hit, cudaMesh)){
+      material mat = mats[geoms[hit.objectID].materialid];
+      if (mat.emittance>0){
+        rays[index].alive=false;
+        in[index]=0;
+        colors[index]+=mat.color*mat.emittance;
+      }
+      else{
+
+        rays[index].color=mat.color;
+
+        if (mat.hasReflective && mat.hasRefractive){
+          rays[index].r = fresnel(r, mat, hit, time*index*float(rayDepth)/2.0f);
+        }
+        else if (mat.hasReflective){
+          rays[index].r = reflect(r,hit);
+        }
+        else if (mat.hasRefractive){
+          float n1 = 1.000293;
+          float n2 = mat.indexOfRefraction;
+
+          rays[index].r = refract(r,hit,n1,n2);
+          // pixelColor = pixelColor*mat.color;
+        }
+        else{ 
+
+          //Add direct lighting contribution for specularity (Cook-Torrance microfacet model)
+          if(mat.specularExponent>0.0){
+            for (int i=0; i<nLights; i++){
+              glm::vec3 lightPos = getRandomPointOnCube(lights[i], time*index*(rayDepth)/2.0f);
+              float ct = cookTorrance(cam.position, hit, lightPos, mat, r);
+              rays[index].color+=ct*mat.specularColor;
+            }
+          }
+
+          ray out;
+          // reflect ray in random hemisphere direction
+          glm::vec3 seed = generateRandomNumberFromThread(time*index*float(rayDepth)/2.0f);
+          out.direction = calculateRandomDirectionInHemisphere(hit.normal, seed.x, seed.y);
+          // out.direction = getRandomDirectionInSphere(seed.x, seed.y);
+          // if (glm::dot(out.direction, hit.normal)<0){
+            // out.direction*=-1.0f;
+          // }
+          out.origin = hit.intersectionPoint + 0.001f*out.direction;
+          rays[index].r = out;
+        }
+      }
+    }
+    else{
+      rays[index].alive=false;
+      in[index]=0;
+    }
+}
+}
+__global__ void bounce(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors,
+                            staticGeom* geoms, int numberOfGeoms, material* mats, int nMats, 
+                            staticGeom* lights, int nLights, glm::vec3** cudaMesh, rayBounce* rays, int* in,
+                            int numRays){
+  
+  int id = (blockIdx.x * blockDim.x) + threadIdx.x;
+
+  if(id<numRays){
+    ray r = rays[id].r;
+    intersection hit;
+
+    rays[id].alive=true;
+    in[id]=1;
+
+    if (intersect(r, geoms, numberOfGeoms, hit, cudaMesh)){
+      material mat = mats[geoms[hit.objectID].materialid];
+      if (mat.emittance>0){
+        rays[id].alive=false;
+        in[id]=0;
+        colors[rays[id].index]+=(rays[id].color*mat.color*mat.emittance);
+      }
+      else{
+
+        rays[id].color*=mat.color;
+
+        if (mat.hasReflective && mat.hasRefractive){
+          rays[id].r = fresnel(r, mat, hit, time*id*float(rayDepth)/2.0f);
+        }
+        else if (mat.hasReflective){
+          rays[id].r = reflect(r,hit);
+        }
+        else if (mat.hasRefractive){
+          float n1 = 1.000293;
+          float n2 = mat.indexOfRefraction;
+
+          rays[id].r = refract(r,hit,n1,n2);
+          // pixelColor = pixelColor*mat.color;
+        }
+        else{ 
+
+          //Add direct lighting contribution for specularity (Cook-Torrance microfacet model)
+          if(mat.specularExponent>0.0){
+            for (int i=0; i<nLights; i++){
+              glm::vec3 lightPos = getRandomPointOnCube(lights[i], time*id*(rayDepth)/2.0f);
+              float ct = cookTorrance(cam.position, hit, lightPos, mat, r);
+              rays[id].color+=ct*mat.specularColor;
+            }
+          }
+
+          ray out;
+          // reflect ray in random hemisphere direction
+          glm::vec3 seed = generateRandomNumberFromThread(time*id*float(rayDepth)/2.0f);
+          out.direction = calculateRandomDirectionInHemisphere(hit.normal, seed.x, seed.y);
+          // out.direction = getRandomDirectionInSphere(seed.x, seed.y);
+          // if (glm::dot(out.direction, hit.normal)<0){
+            // out.direction*=-1.0f;
+          // }
+          out.origin = hit.intersectionPoint + 0.001f*out.direction;
+          rays[id].r = out;
+        }
+      }
+    }
+    else{
+      rays[id].alive=false;
+      in[id]=0;
+      rays[id].color=glm::vec3(0,0,0);
+    }
+}
+}
+
+void cudaInit(uchar4* pos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms,
+                    int numberOfGeoms){
   
-  // send image to GPU
-  glm::vec3* cudaimage = NULL;
+  //send image to GPU
+  // = NULL;
+
+  std::cout<<"Initializing memory"<<std::endl;
+
+  cudaimage=NULL;
   cudaMalloc((void**)&cudaimage, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3));
   cudaMemcpy( cudaimage, renderCam->image, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3), cudaMemcpyHostToDevice);
-  
-  // package geometry and materials and sent to GPU
-  staticGeom* geomList = new staticGeom[numberOfGeoms];
+
+  //package geometry and materials and sent to GPU
+  geomList = new staticGeom[numberOfGeoms];
+  nLights=0;
+
+  nGeoms = 2*numberOfGeoms;
+  meshList = new glm::vec3*[nGeoms];
+
   for(int i=0; i<numberOfGeoms; i++){
+    if (materials[geoms[i].materialid].emittance>0){
+      nLights++;
+    }
     staticGeom newStaticGeom;
     newStaticGeom.type = geoms[i].type;
     newStaticGeom.materialid = geoms[i].materialid;
-    newStaticGeom.translation = geoms[i].translations[frame];
-    newStaticGeom.rotation = geoms[i].rotations[frame];
-    newStaticGeom.scale = geoms[i].scales[frame];
-    newStaticGeom.transform = geoms[i].transforms[frame];
-    newStaticGeom.inverseTransform = geoms[i].inverseTransforms[frame];
+
+    glm::vec3 translation = geoms[i].translations[frame];
+    glm::vec3 rotation = geoms[i].rotations[frame];
+    glm::vec3 scale = geoms[i].scales[frame];
+    cudaMat4 transform = geoms[i].transforms[frame];
+    cudaMat4 inverseTransform = geoms[i].inverseTransforms[frame];
+
+    if (frame>0 && frame<renderCam->frames){ //middle frame
+      float t1 = float(rand())/RAND_MAX;
+      float t2 = float(rand())/RAND_MAX;
+      float t3 = float(rand())/RAND_MAX;
+
+      glm::vec3 translation1 = t1*geoms[i].translations[frame-1] + (1-t1)*geoms[i].translations[frame];
+      glm::vec3 translation2 = t2*geoms[i].translations[frame] + (1-t2)*geoms[i].translations[frame+1];
+
+      translation = t3*translation1 + (1-t3)*translation2;
+
+      glm::vec3 scale1 = t1*geoms[i].scales[frame-1] + (1-t1)*geoms[i].scales[frame];
+      glm::vec3 scale2 = t2*geoms[i].scales[frame] + (1-t2)*geoms[i].scales[frame+1];
+
+      scale = t3*scale1 + (1-t3)*scale2;
+
+      glm::vec3 rotation1 = t1*geoms[i].rotations[frame-1] + (1-t1)*geoms[i].rotations[frame];
+      glm::vec3 rotation2 = t2*geoms[i].rotations[frame] + (1-t2)*geoms[i].rotations[frame+1];
+
+      rotation = t3*rotation1 + (1-t3)*rotation2;
+
+      glm::mat4 temp = utilityCore::buildTransformationMatrix(translation, rotation, scale);
+      transform = utilityCore::glmMat4ToCudaMat4(temp);
+      inverseTransform = utilityCore::glmMat4ToCudaMat4(glm::inverse(temp));
+
+    }
+
+    newStaticGeom.translation = translation;
+    newStaticGeom.rotation = rotation;
+    newStaticGeom.scale = scale;
+    newStaticGeom.transform = transform;
+    newStaticGeom.inverseTransform = inverseTransform;
+    
+    if (newStaticGeom.type==MESH){
+      newStaticGeom.obj.numTris = geoms[i].obj.numTris;
+      newStaticGeom.obj.numVerts = geoms[i].obj.numVerts;
+      newStaticGeom.obj.tris = geoms[i].obj.tris;
+      newStaticGeom.obj.verts = geoms[i].obj.verts;
+
+      //package meshes
+      cudaMalloc((void**)&cudameshtris, newStaticGeom.obj.numTris*sizeof(glm::vec3));
+      cudaMemcpy( cudameshtris, newStaticGeom.obj.tris, newStaticGeom.obj.numTris*sizeof(glm::vec3), cudaMemcpyHostToDevice);
+
+      cudaMalloc((void**)&cudameshverts, newStaticGeom.obj.numVerts*sizeof(glm::vec3));
+      cudaMemcpy( cudameshverts, newStaticGeom.obj.verts, newStaticGeom.obj.numVerts*sizeof(glm::vec3), cudaMemcpyHostToDevice);
+
+      meshList[i*2] = cudameshtris;
+      meshList[i*2+1] = cudameshverts;
+    }
+    else{
+      meshList[i*2] = 0;
+      meshList[i*2+1] = 0;
+    }
+    newStaticGeom.meshid = geoms[i].meshid;
     geomList[i] = newStaticGeom;
   }
   
-  staticGeom* cudageoms = NULL;
+  //package geom
   cudaMalloc((void**)&cudageoms, numberOfGeoms*sizeof(staticGeom));
   cudaMemcpy( cudageoms, geomList, numberOfGeoms*sizeof(staticGeom), cudaMemcpyHostToDevice);
-  
-  // package camera
-  cameraData cam;
+
+  cudaMalloc((void**)&cudamesh, nGeoms*sizeof(glm::vec3*));
+  cudaMemcpy( cudamesh, meshList, nGeoms*sizeof(glm::vec3*), cudaMemcpyHostToDevice);
+
+  //package materials
+  cudaMalloc((void**)&cudaMats, numberOfMaterials*sizeof(material));
+  cudaMemcpy(cudaMats, materials, numberOfMaterials*sizeof(material), cudaMemcpyHostToDevice);
+
+  //package lights
+  lightList = new staticGeom[nLights];
+  int lightIndex = 0;
+  for (int i=0; i<numberOfGeoms; i++){
+  if (materials[geomList[i].materialid].emittance>0){
+      lightList[lightIndex] = geomList[i];
+      lightIndex++;
+    }
+  }
+
+  cudaMalloc((void**)&cudaLights, nLights*sizeof(staticGeom));
+  cudaMemcpy(cudaLights, lightList, nLights*sizeof(staticGeom), cudaMemcpyHostToDevice);
+
+  //package camera
   cam.resolution = renderCam->resolution;
   cam.position = renderCam->positions[frame];
+
   cam.view = renderCam->views[frame];
   cam.up = renderCam->ups[frame];
   cam.fov = renderCam->fov;
+  cam.dof = renderCam->dof;
+  cam.focusDistance = renderCam->focusDistance;
+
+  int numRays = renderCam->resolution.x*renderCam->resolution.y;
+  cpuRays = new rayBounce[numRays];
+  cpuIndices = new int[numRays];
+
+  for (int i=0; i<numRays; i++){
+    cpuIndices[i] = 1;
+  }
+
+  cudaMalloc((void**)&raysA, numRays*sizeof(rayBounce));
+  cudaMemcpy(raysA, cpuRays, numRays*sizeof(rayBounce), cudaMemcpyHostToDevice);
+
+  cudaMalloc((void**)&raysB, numRays*sizeof(rayBounce));
+  cudaMemcpy(raysB, cpuRays, numRays*sizeof(rayBounce), cudaMemcpyHostToDevice);
+
+  cudaMalloc((void**)&indicesA, numRays*sizeof(int));
+  cudaMemcpy(indicesA, cpuIndices, numRays*sizeof(int), cudaMemcpyHostToDevice);
+
+  cudaMalloc((void**)&indicesB, numRays*sizeof(int));
+  cudaMemcpy(indicesB, cpuIndices, numRays*sizeof(int), cudaMemcpyHostToDevice);
+
+  std::cout<<"Done allocationg"<<std::endl;
+}
+
+// Wrapper for the __global__ call that sets up the kernel calls and does a ton of memory management
+void cudaRaytraceCoreStream(uchar4* PBOpos, camera* renderCam, int frame, int iterations , int numberOfMaterials, geom* geoms, int numberOfGeoms){
+
+  int traceDepth = 1; //determines how many bounces the raytracer traces
+
+  for (int i=0; i<numberOfGeoms; i++){
+    glm::vec3 translation = geoms[i].translations[frame];
+    glm::vec3 rotation = geoms[i].rotations[frame];
+    glm::vec3 scale = geoms[i].scales[frame];
+    cudaMat4 transform = geoms[i].transforms[frame];
+    cudaMat4 inverseTransform = geoms[i].inverseTransforms[frame];
 
-  // kernel launches
-  raytraceRay<<<fullBlocksPerGrid, threadsPerBlock>>>(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, numberOfGeoms);
+    if (frame>0 && frame<renderCam->frames){ //middle frame
+      float t1 = float(rand())/RAND_MAX;
+      float t2 = float(rand())/RAND_MAX;
+      float t3 = float(rand())/RAND_MAX;
 
-  sendImageToPBO<<<fullBlocksPerGrid, threadsPerBlock>>>(PBOpos, renderCam->resolution, cudaimage);
+      glm::vec3 translation1 = t1*geoms[i].translations[frame-1] + (1-t1)*geoms[i].translations[frame];
+      glm::vec3 translation2 = t2*geoms[i].translations[frame] + (1-t2)*geoms[i].translations[frame+1];
 
+      translation = t3*translation1 + (1-t3)*translation2;
+
+      glm::vec3 scale1 = t1*geoms[i].scales[frame-1] + (1-t1)*geoms[i].scales[frame];
+      glm::vec3 scale2 = t2*geoms[i].scales[frame] + (1-t2)*geoms[i].scales[frame+1];
+
+      scale = t3*scale1 + (1-t3)*scale2;
+
+      glm::vec3 rotation1 = t1*geoms[i].rotations[frame-1] + (1-t1)*geoms[i].rotations[frame];
+      glm::vec3 rotation2 = t2*geoms[i].rotations[frame] + (1-t2)*geoms[i].rotations[frame+1];
+
+      rotation = t3*rotation1 + (1-t3)*rotation2;
+
+      glm::mat4 temp = utilityCore::buildTransformationMatrix(translation, rotation, scale);
+      transform = utilityCore::glmMat4ToCudaMat4(temp);
+      inverseTransform = utilityCore::glmMat4ToCudaMat4(glm::inverse(temp));
+
+      geomList[i].translation = translation;
+      geomList[i].rotation = rotation;
+      geomList[i].scale = scale;
+      geomList[i].transform = transform;
+      geomList[i].inverseTransform = inverseTransform;
+    }
+  }
+  cudaMemcpy( cudageoms, geomList, numberOfGeoms*sizeof(staticGeom), cudaMemcpyHostToDevice);
+
+  int numRays = int(renderCam->resolution.x*renderCam->resolution.y);
+
+  int tileSize = 8;
+  dim3 threadsPerBlock(tileSize, tileSize);
+  dim3 fullBlocksPerGrid((int)ceil(float(renderCam->resolution.x)/float(tileSize)), (int)ceil(float(renderCam->resolution.y)/float(tileSize)));
+
+  // cudaMemcpy(indicesA, cpuIndices, numRays*sizeof(int), cudaMemcpyHostToDevice);
+  // cudaMemcpy(raysA, cpuRays, numRays*sizeof(rayBounce), cudaMemcpyHostToDevice);
+
+  rayBounce* streamA = raysA;
+  rayBounce* streamB = raysB;
+
+  firstBounce<<<fullBlocksPerGrid, threadsPerBlock>>>(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, 
+                                               numberOfGeoms, cudaMats, numberOfMaterials, cudaLights, nLights, cudamesh,
+                                               raysA, indicesA);
+
+  dim3 threadsPerBlockL(64);
+  dim3 fullBlocksPerGridL(int(ceil(float(numRays)/64.0f)));
+
+  while(traceDepth<100){
+
+    traceDepth++;
+
+    //scan algorithm
+    for (int d=1; d<=ceil(log(float(numRays))/log(2.0f)); d++){
+      sum<<<fullBlocksPerGridL, threadsPerBlockL>>>(indicesA, indicesB, numRays, powf(2.0f, d-1));
+      int* temp = indicesA;
+      indicesA = indicesB;
+      indicesB = temp;
+    }
+
+    //Stream compation from A into B, then save back into A
+    streamCompaction<<<fullBlocksPerGridL, threadsPerBlockL>>>(streamA, indicesA, streamB, numRays);
+    rayBounce* temp = streamA;
+    streamA = streamB;
+    streamB = streamA;
+
+    //update numrays
+    cudaMemcpy(&numRays, &indicesA[numRays-1], sizeof(int), cudaMemcpyDeviceToHost);
+    fullBlocksPerGridL = dim3(int(ceil(float(numRays)/64.0f)));
+
+    //subsequent bounces...
+    if (numRays>0){
+      bounce<<<fullBlocksPerGridL, threadsPerBlockL>>>(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, 
+                                               numberOfGeoms, cudaMats, numberOfMaterials, cudaLights, nLights, cudamesh,
+                                               streamA, indicesA, numRays);
+    }
+    else{
+      break;
+    }
+
+  }
+  // // std::cout<<traceDepth<<std::endl;
+
+  sendImageToPBO<<<fullBlocksPerGrid, threadsPerBlock>>>(PBOpos, renderCam->resolution, cudaimage, (float)iterations);
   // retrieve image from GPU
   cudaMemcpy( renderCam->image, cudaimage, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3), cudaMemcpyDeviceToHost);
+  // make certain the kernel has completed
+  cudaThreadSynchronize();
+  checkCUDAError("kernel failed!");
+}
+// Wrapper for the __global__ call that sets up the kernel calls and does a ton of memory management
+void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, 
+                    geom* geoms, int numberOfGeoms){
+
+  int traceDepth = 1; //determines how many bounces the raytracer traces
 
-  // free up stuff, or else we'll leak memory like a madman
-  cudaFree( cudaimage );
-  cudaFree( cudageoms );
-  delete geomList;
+  // set up crucial magic
+  int tileSize = 8;
+  dim3 threadsPerBlock(tileSize, tileSize);
+  dim3 fullBlocksPerGrid((int)ceil(float(renderCam->resolution.x)/float(tileSize)), (int)ceil(float(renderCam->resolution.y)/float(tileSize)));
+
+  //kernel launches
+  raytraceRay<<<fullBlocksPerGrid, threadsPerBlock>>>(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, numberOfGeoms, cudaMats, numberOfMaterials, cudaLights, nLights, cudamesh);
+  // pathtraceRay<<<fullBlocksPerGrid, threadsPerBlock>>>(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, numberOfGeoms, cudaMats, numberOfMaterials, cudaLights, nLights, cudamesh);
+  
+  sendImageToPBO<<<fullBlocksPerGrid, threadsPerBlock>>>(PBOpos, renderCam->resolution, cudaimage, (float)iterations);
+
+  //retrieve image from GPU
+  cudaMemcpy( renderCam->image, cudaimage, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3), cudaMemcpyDeviceToHost);
 
   // make certain the kernel has completed
   cudaThreadSynchronize();
+  checkCUDAError("kernel failed!");
+}
+
+void testStream(uchar4* PBOpos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, 
+                    geom* geoms, int numberOfGeoms){
+
+  int traceDepth = 1; //determines how many bounces the raytracer traces
 
-  checkCUDAError("Kernel failed!");
+  int numRays = int(renderCam->resolution.x*renderCam->resolution.y);
+  numRays=1500;
+
+  rayBounce* testRays = new rayBounce[numRays];
+
+  for (int i=0; i<numRays; i++){
+    rayBounce tr(i);
+    testRays[i] = tr;
+  }
+
+  testRays[1].alive=false;
+  testRays[4].alive=false;
+  testRays[5].alive=false;
+  testRays[7].alive=false;
+  testRays[9].alive=false;
+  testRays[12].alive=false;
+  testRays[19].alive=false;
+  testRays[14].alive=false;
+
+  rayBounce* cudaRaysA;
+  rayBounce* cudaRaysB;
+
+  cudaMalloc((void**)&cudaRaysA, numRays*sizeof(rayBounce));
+
+  cudaMalloc((void**)&cudaRaysB, numRays*sizeof(rayBounce));
+
+  int* testin;
+  int* testout;
+  int* cputest = new int[numRays];
+
+  for (int i=0; i<numRays; i++){
+    if (testRays[i].alive){
+      cputest[i]=1;
+    }
+    else{
+      cputest[i]=0;
+    }
+  }
+
+  cudaMalloc((void**)&testin, numRays*sizeof(int));
+  cudaMalloc((void**)&testout, numRays*sizeof(int));
+
+
+  cudaMemcpy(cudaRaysA, testRays, numRays*sizeof(rayBounce), cudaMemcpyHostToDevice);  
+  cudaMemcpy(cudaRaysB, testRays, numRays*sizeof(rayBounce), cudaMemcpyHostToDevice);  
+  cudaMemcpy(testin, cputest, numRays*sizeof(int), cudaMemcpyHostToDevice);
+  cudaMemcpy(testout, cputest, numRays*sizeof(int), cudaMemcpyHostToDevice);
+
+
+  //start loop
+  while(numRays>0 && traceDepth<10000){
+    
+    for (int i=0; i<numRays; i++){
+      std::cout<<testRays[i].index<<", "<<cputest[i]<<std::endl;
+    }
+
+    dim3 threadsPerBlock(64);
+    dim3 fullBlocksPerGrid(int(ceil(float(numRays)/64.0f)));
+
+    //scan
+    for (int d=1; d<=ceil(log(float(numRays))/log(2.0f))+1; d++){
+      sum<<<fullBlocksPerGrid, threadsPerBlock>>>(testin, testout, numRays, int(pow(2.0f,d-1)));
+      cudaThreadSynchronize();
+      cudaMemcpy(cputest, testout, numRays*sizeof(int), cudaMemcpyDeviceToHost);
+
+      // std::cout<<"sum at depth: "<<d<<" (using "<<testout<<")"<<std::endl;
+      // for (int i=0; i<numRays; i++){
+      //   std::cout<<cputest[i]<<std::endl;
+      // }
+
+      int* temp = testin;
+      testin=testout;
+      testout=temp;
+    }
+    //Compact
+    streamCompaction<<<fullBlocksPerGrid, threadsPerBlock>>>(cudaRaysA, testin, cudaRaysB, numRays);
+    cudaRaysA = cudaRaysB;
+    cudaThreadSynchronize();
+
+    cudaMemcpy(&numRays, &testin[numRays-1], 1*sizeof(int), cudaMemcpyDeviceToHost);
+
+
+    std::cout<<"number of rays left: "<<numRays<<std::endl;
+
+    if (numRays==0) break;
+
+    // for (int i=0; i<numRays; i++){
+    //   std::cout<<cputest[i]<<std::endl;
+    // }    
+
+    cudaMemcpy(cputest, testin, numRays*sizeof(int), cudaMemcpyDeviceToHost);
+    cudaMemcpy(testRays, cudaRaysA, numRays*sizeof(rayBounce), cudaMemcpyDeviceToHost);
+
+
+    for (int i=0; i<numRays; i++){
+      std::cout<<testRays[i].index<<std::endl;
+    }
+    std::cout<<"___________________________________"<<std::endl;
+
+    //kill some amount
+    if (numRays>0) testRays[0].alive=false;
+    if (numRays>3) testRays[3].alive=false;
+    if (numRays>10) testRays[10].alive=false;
+    if (numRays>13) testRays[13].alive=false;
+    if (numRays>30) testRays[30].alive=false;
+    cudaMemcpy(cputest, testin, numRays*sizeof(int), cudaMemcpyDeviceToHost);
+    
+    for (int i=0; i<numRays; i++){
+      if (testRays[i].alive){
+        cputest[i]=1;
+      }
+      else{
+        cputest[i]=0;
+      }
+    }
+    cudaMemcpy(cudaRaysA, testRays, numRays*sizeof(rayBounce), cudaMemcpyHostToDevice);  
+    cudaMemcpy(testin, cputest, numRays*sizeof(int), cudaMemcpyHostToDevice);
+
+    traceDepth++;
+
+    //end loop
+  }
+
+  delete [] cputest;
+  cudaFree(testin);
+  cudaFree(testout);
+
+  delete [] testRays;
+  cudaFree(cudaRaysA);
+  cudaFree(cudaRaysB);
+  
+  // make certain the kernel has completed
+  cudaThreadSynchronize();
+  checkCUDAError("kernel failed!");
 }
+
+void cudaFreeCPU(){
+  //free up stuff, or else we'll leak memory like a madman
+  std::cout<<"Freeing cuda memory"<<std::endl;
+  cudaFree( cudaimage  );
+  cudaFree( cudageoms  );
+  cudaFree( cudaMats   );
+  cudaFree( cudaLights );
+  cudaFree( indicesA );
+  cudaFree( indicesB );
+  cudaFree( raysA );
+  cudaFree( raysB );
+  for (int i=0; i<nGeoms; i++){
+    if (meshList[i]!=0) {
+      cudaFree( meshList[i] );
+    }
+  }
+  cudaFree( cudamesh );
+  delete [] meshList;
+  delete [] geomList;
+  delete [] lightList;
+  delete [] cpuRays;
+  delete [] cpuIndices;
+}
\ No newline at end of file
diff --git a/src/raytraceKernel.h b/src/raytraceKernel.h
index 984e89f..ee552de 100755
--- a/src/raytraceKernel.h
+++ b/src/raytraceKernel.h
@@ -6,14 +6,35 @@
 //       Yining Karl Li's TAKUA Render, a massively parallel pathtracing renderer: http://www.yiningkarlli.com
 
 #ifndef RAYTRACEKERNEL_H
-#define RAYTRACEKERNEL_H
+#define PATHTRACEKERNEL_H
 
 #include <stdio.h>
 #include <thrust/random.h>
 #include <cuda.h>
 #include <cmath>
 #include "sceneStructs.h"
+#include "glm/gtc/matrix_transform.hpp"
+#include <time.h>
 
-void cudaRaytraceCore(uchar4* pos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms, int numberOfGeoms);
+#if CUDA_VERSION >= 5000
+    #include <helper_math.h>
+#else
+    #include <cutil_math.h>
+#endif
+
+void cudaRaytraceCoreStream(uchar4* pos, camera* renderCam, int frame, int iterations, int numberOfMaterials, geom* geoms, int numberOfGeoms);
+
+void cudaRaytraceCore(uchar4* pos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms,
+	                  int numberOfGeoms);
+
+void testStream(uchar4* pos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms,
+	                  int numberOfGeoms);
+
+void cudaInit(uchar4* pos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms,
+	                  int numberOfGeoms);
+
+void cudaFreeCPU();
+
+// void cudaFree();
 
 #endif
diff --git a/src/scene.cpp b/src/scene.cpp
index 4cbe216..7d81d61 100644
--- a/src/scene.cpp
+++ b/src/scene.cpp
@@ -34,8 +34,74 @@ scene::scene(string filename){
 	}
 }
 
+mesh scene::loadMesh(string fileName){
+	ifstream ifile;
+	string line;
+	ifile.open(fileName.c_str());
+
+	cout<<fileName<<endl;
+
+	vector<glm::vec3> verts, faces;
+
+	while (utilityCore::safeGetline(ifile, line)) {
+		vector<string> tokens = utilityCore::tokenizeString(line);
+
+		if (tokens.size()>0 && strcmp(tokens[0].c_str(),"v")==0){
+			verts.push_back(glm::vec3(atof(tokens[1].c_str()),atof(tokens[2].c_str()),atof(tokens[3].c_str())));
+		}
+		else if (tokens.size()>0 && strcmp(tokens[0].c_str(),"f")==0){
+			char* findex1 = strtok (const_cast<char*>(tokens[1].c_str()),"/");
+			char* findex2 = strtok (const_cast<char*>(tokens[2].c_str()),"/");
+			char* findex3 = strtok (const_cast<char*>(tokens[3].c_str()),"/");
+			faces.push_back(glm::vec3(atof(findex1)-1,atof(findex2)-1,atof(findex3)-1));
+		}
+	}
+
+	glm::vec3* vertData = new glm::vec3[verts.size()];
+	glm::vec3* faceData = new glm::vec3[faces.size()];
+
+	glm::vec3 max = glm::vec3(-10000,-10000,-10000);
+	glm::vec3 min = glm::vec3( 10000, 10000, 10000);
+
+	for (int i=0; i<verts.size(); i++){
+		vertData[i] = verts[i];
+		if (verts[i].x<min.x){
+			min.x=verts[i].x;
+		}
+		if (verts[i].y<min.y){
+			min.y=verts[i].y;
+		}
+		if (verts[i].z<min.z){
+			min.z=verts[i].z;
+		}
+		if (verts[i].x>max.x){
+			max.x=verts[i].x;
+		}
+		if (verts[i].y>max.y){
+			max.y=verts[i].y;
+		}
+		if (verts[i].z>max.z){
+			max.z=verts[i].z;
+		}
+	}
+	for (int i=0; i<faces.size(); i++){
+		faceData[i] = faces[i];
+	}
+
+	mesh m;
+
+	m.numTris = faces.size();
+	m.numVerts = verts.size();
+	m.tris = faceData;
+	m.verts = vertData;
+	m.bh = max;
+	m.bl = min;
+	return m;
+}
+
 int scene::loadObject(string objectid){
     int id = atoi(objectid.c_str());
+	int number = 0;
     if(id!=objects.size()){
         cout << "ERROR: OBJECT ID does not match expected number of objects" << endl;
         return -1;
@@ -64,6 +130,7 @@ int scene::loadObject(string objectid){
                     cout << "Creating new mesh..." << endl;
                     cout << "Reading mesh from " << line << "... " << endl;
 		    		newObject.type = MESH;
+		    		newObject.obj = loadMesh(line);
                 }else{
                     cout << "ERROR: " << line << " is not a valid object type!" << endl;
                     return -1;
@@ -118,6 +185,7 @@ int scene::loadObject(string objectid){
 	newObject.scales = new glm::vec3[frameCount];
 	newObject.transforms = new cudaMat4[frameCount];
 	newObject.inverseTransforms = new cudaMat4[frameCount];
+	newObject.meshid = -1;
 	for(int i=0; i<frameCount; i++){
 		newObject.translations[i] = translations[i];
 		newObject.rotations[i] = rotations[i];
@@ -138,7 +206,8 @@ int scene::loadCamera(){
 	cout << "Loading Camera ..." << endl;
         camera newCamera;
 	float fovy;
-	
+	newCamera.dof=0;
+	newCamera.focusDistance=1;
 	//load static properties
 	for(int i=0; i<4; i++){
 		string line;
@@ -172,7 +241,7 @@ int scene::loadCamera(){
         }
 	    
 	    //load camera properties
-	    for(int i=0; i<3; i++){
+	    for(int i=0; i<5; i++){
             //glm::vec3 translation; glm::vec3 rotation; glm::vec3 scale;
             utilityCore::safeGetline(fp_in,line);
             tokens = utilityCore::tokenizeString(line);
@@ -182,7 +251,14 @@ int scene::loadCamera(){
                 views.push_back(glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str())));
             }else if(strcmp(tokens[0].c_str(), "UP")==0){
                 ups.push_back(glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str())));
-            }
+            }else if(strcmp(tokens[0].c_str(),"DOF")==0){
+			newCamera.dof = atof(tokens[1].c_str());
+			}else if(strcmp(tokens[0].c_str(),"FOCUSDIST")==0){
+				newCamera.focusDistance = atof(tokens[1].c_str());
+				if (newCamera.focusDistance<0.0001f){
+					cout<<"Focus distance must be greater than 0"<<endl;
+				}
+			}
 	    }
 	    
 	    frameCount++;
@@ -194,7 +270,7 @@ int scene::loadCamera(){
 	newCamera.positions = new glm::vec3[frameCount];
 	newCamera.views = new glm::vec3[frameCount];
 	newCamera.ups = new glm::vec3[frameCount];
-	for(int i = 0; i < frameCount; i++){
+	for(int i=0; i<frameCount; i++){
 		newCamera.positions[i] = positions[i];
 		newCamera.views[i] = views[i];
 		newCamera.ups[i] = ups[i];
@@ -229,13 +305,15 @@ int scene::loadMaterial(string materialid){
 		material newMaterial;
 	
 		//load static properties
-		for(int i=0; i<10; i++){
+		for(int i=0; i<11; i++){
 			string line;
             utilityCore::safeGetline(fp_in,line);
 			vector<string> tokens = utilityCore::tokenizeString(line);
 			if(strcmp(tokens[0].c_str(), "RGB")==0){
 				glm::vec3 color( atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()) );
 				newMaterial.color = color;
+			}else if(strcmp(tokens[0].c_str(), "ROUGH")==0){
+				newMaterial.roughness = atof(tokens[1].c_str());	
 			}else if(strcmp(tokens[0].c_str(), "SPECEX")==0){
 				newMaterial.specularExponent = atof(tokens[1].c_str());				  
 			}else if(strcmp(tokens[0].c_str(), "SPECRGB")==0){
diff --git a/src/scene.h b/src/scene.h
index ea3ddaa..b245089 100644
--- a/src/scene.h
+++ b/src/scene.h
@@ -22,6 +22,7 @@ class scene{
     int loadMaterial(string materialid);
     int loadObject(string objectid);
     int loadCamera();
+    mesh loadMesh(string fileName);
 public:
     scene(string filename);
     ~scene();
diff --git a/src/sceneStructs.h b/src/sceneStructs.h
index 5e0c853..2388bd4 100644
--- a/src/sceneStructs.h
+++ b/src/sceneStructs.h
@@ -18,25 +18,37 @@ struct ray {
 	glm::vec3 direction;
 };
 
+struct mesh{
+	int numTris, numVerts;
+	glm::vec3* tris;
+	glm::vec3* verts;
+	glm::vec3 bh, bl;
+};
+
 struct geom {
 	enum GEOMTYPE type;
 	int materialid;
 	int frames;
+	int meshid;
 	glm::vec3* translations;
 	glm::vec3* rotations;
 	glm::vec3* scales;
 	cudaMat4* transforms;
 	cudaMat4* inverseTransforms;
+	mesh obj;
+	// int meshid;
 };
 
 struct staticGeom {
 	enum GEOMTYPE type;
 	int materialid;
+	int meshid;
 	glm::vec3 translation;
 	glm::vec3 rotation;
 	glm::vec3 scale;
 	cudaMat4 transform;
 	cudaMat4 inverseTransform;
+	mesh obj;
 };
 
 struct cameraData {
@@ -45,6 +57,8 @@ struct cameraData {
 	glm::vec3 view;
 	glm::vec3 up;
 	glm::vec2 fov;
+	float focusDistance;
+	float dof;
 };
 
 struct camera {
@@ -58,6 +72,7 @@ struct camera {
 	glm::vec3* image;
 	ray* rayList;
 	std::string imageName;
+	float dof,focusDistance;
 };
 
 struct material{
@@ -71,6 +86,7 @@ struct material{
 	glm::vec3 absorptionCoefficient;
 	float reducedScatterCoefficient;
 	float emittance;
+	float roughness;
 };
 
 #endif //CUDASTRUCTS_H
diff --git a/src/utilities.cpp b/src/utilities.cpp
index c1a364b..3ef0986 100755
--- a/src/utilities.cpp
+++ b/src/utilities.cpp
@@ -4,14 +4,10 @@
 //  File: utilities.cpp
 //  A collection/kitchen sink of generally useful functions
 
-#define GLM_FORCE_RADIANS
-
-#include <glm/gtc/matrix_transform.hpp>
-#include <glm/gtc/matrix_inverse.hpp>
 #include <iostream>
-#include <cstdio>
-
 #include "utilities.h"
+#include "glm/gtc/matrix_transform.hpp"
+#include "glm/gtc/matrix_inverse.hpp"
 
 float utilityCore::clamp(float f, float min, float max){
     if(f<min){
diff --git a/src/utilities.h b/src/utilities.h
index f51598f..fc628ec 100644
--- a/src/utilities.h
+++ b/src/utilities.h
@@ -20,7 +20,7 @@
 #define PI                          3.1415926535897932384626422832795028841971
 #define TWO_PI                      6.2831853071795864769252867665590057683943
 #define SQRT_OF_ONE_THIRD           0.5773502691896257645091487805019574556476
-#define NATURAL_E                   2.7182818284590452353602874713526624977572
+#define E                           2.7182818284590452353602874713526624977572
 #define EPSILON                     .000000001
 #define ZERO_ABSORPTION_EPSILON     0.00001
 #define RAY_BIAS_AMOUNT             0.0002