Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
7a81925
started the first try rasterizer. compiles but crashes video driver, …
Oct 7, 2015
5bf16e3
added some debugs, take a look. it's probably the part where you allo…
Oct 7, 2015
03b5bf2
ONE TRIANGLE YEAAAAA
likangning93 Oct 8, 2015
c9be451
fixed a little thing with pixel coordinates. added a frag shader out …
likangning93 Oct 10, 2015
8e41315
added camera stuff and skeleton for the proper rasterizer. also somet…
likangning93 Oct 10, 2015
499a790
fixed some stuff with scanline... still need to resolve clipping issue
likangning93 Oct 10, 2015
db5fb6b
added camera controls so we can figure out wtf is up with depth test
likangning93 Oct 10, 2015
fddf4cf
depth test is working more
likangning93 Oct 11, 2015
9a749d7
moved crappy rasterizer into its own file. now starting work on the R…
likangning93 Oct 11, 2015
f1f6d28
fixed compilation bug
likangning93 Oct 11, 2015
e6d5210
up to speed with the 'real' rasterizer. now need to do atomics
likangning93 Oct 11, 2015
a8d3255
handling race conditions using an int depth grid and atomicMin
likangning93 Oct 11, 2015
f947710
added lights and phong shading
likangning93 Oct 11, 2015
359b2ab
NOT SO ORTHOGRAPHIC NOW ARE WE
likangning93 Oct 12, 2015
25eaf4a
updated notes.
likangning93 Oct 12, 2015
60a0352
really this time
likangning93 Oct 12, 2015
a2ff93d
added some new kernels to handle instancing
Oct 14, 2015
20029c1
modifying code in rasterize to support instancing
likangning93 Oct 14, 2015
380fad3
ready to test instancing
likangning93 Oct 14, 2015
1b4300b
instancing is WORKING
likangning93 Oct 14, 2015
637076e
added components for MSAA and modularized scanline
likangning93 Oct 15, 2015
7d6d69a
added more parts
likangning93 Oct 15, 2015
2f0dca2
FSAA working, MSAA not so much
likangning93 Oct 15, 2015
4573589
ok MSAA seems to be working!
likangning93 Oct 15, 2015
684d07d
added tile pipeline rasterization notes
likangning93 Oct 16, 2015
e27c31a
updated notes on tiling again
likangning93 Oct 16, 2015
eeacfd2
started adding structs and... stuff for tiling
likangning93 Oct 16, 2015
8833fd1
more tiling setup
likangning93 Oct 17, 2015
074b5c8
fixed a little bug in MSAA where the first triangle wasn't getting sh…
likangning93 Oct 17, 2015
e970131
added primitive/tile binning kernel
likangning93 Oct 17, 2015
bf869b8
added tiled scanline. scary. please check and test.
likangning93 Oct 17, 2015
f5c198f
debugging
likangning93 Oct 17, 2015
bf88ee5
now debugging the tile scanline
likangning93 Oct 17, 2015
bae6f0f
progress! now have to figure out why it doesn't work for any other case
likangning93 Oct 17, 2015
1ffb720
now tiling works up to width 160 x 160, 100 tiles
likangning93 Oct 17, 2015
c5a1c07
small fixes. still problems.
likangning93 Oct 17, 2015
e0aaa83
tiny tweak, now seems to work with resos up to 336 336 and general tr…
likangning93 Oct 17, 2015
ab05d28
fixed tiling and clipping issue, which also seemed to fix reso drawin…
likangning93 Oct 17, 2015
3fd3b21
added tile bin wiping between draws.
likangning93 Oct 17, 2015
11008d3
tried dimming contrast in tile grid so it's less painful to look at
likangning93 Oct 17, 2015
e14c9dc
fixed the worst of the grid effect thing
likangning93 Oct 17, 2015
4c82234
slight cleanup. beginning analysis.
likangning93 Oct 18, 2015
18be319
finalized block sizes for now
likangning93 Oct 18, 2015
48f2a99
all data gathered, all charts made
likangning93 Oct 18, 2015
347d70b
started nice readme
likangning93 Oct 18, 2015
202ad49
updates to readme. on instancing
likangning93 Oct 18, 2015
753f086
small fixes
likangning93 Oct 18, 2015
a64cdf4
added antialiasing section
likangning93 Oct 18, 2015
468b8a1
small fixes
likangning93 Oct 18, 2015
46fa022
changed some images to spreads
likangning93 Oct 18, 2015
55bbc39
added tiling section
likangning93 Oct 18, 2015
3fe74d8
added chart
likangning93 Oct 18, 2015
54c611b
typos. also, can i embed?
likangning93 Oct 18, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
438 changes: 95 additions & 343 deletions README.md

Large diffs are not rendered by default.

Binary file added img/AAAAAAAAAAAAAAA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/AA_spread.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/AA_spread_comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/FSAA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/MSAA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/aliased.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions img/charts/antialiasing/antialiasing.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"";"scanline rasterization";"fragment shading";"render to frame buffer"
"no antialiasing";"2799.816";"277.787";"284.307"
"FSAA";"13887.599";"2159.613";"2985.455"
"MSAA";"13910.298";"1142.535";"2883.819"
Binary file added img/charts/antialiasing/antialiasing.pdf
Binary file not shown.
Binary file added img/charts/antialiasing/antialiasing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions img/charts/instancing/many_cows_host.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"clearDepthBuffer";"408.746";"#ff6633";"0"
"computeVertexTFs";"2.750";"#99ff66";"0"
"vertexShader";"186.007";"#ccaa00";"0"
"primitiveAssembly";"1053.543";"#aa99cc";"0"
"scanlineRasterization";"1355.387";"#33aa66";"0"
"fragmentShader";"249.116";"#aaff33";"0"
"host -> device memcpy";"2899.678";"#ff9933";"0"
Binary file added img/charts/instancing/many_cows_host.pdf
Binary file not shown.
Binary file added img/charts/instancing/many_cows_host.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions img/charts/instancing/many_cows_instanced.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"clearDepthBuffer";"407.226";"#ff6633";"0"
"computeVertexTFs";"2.795";"#99ff66";"0"
"vertexShader";"160.891";"#ccaa00";"0"
"primitiveAssembly";"1044.682";"#aa99cc";"0"
"scanlineRasterization";"1481.845";"#33aa66";"0"
"fragmentShader";"256.672";"#aaff33";"0"
"host -> device memcpy";"50.192";"#ff9933";"0"
Binary file added img/charts/instancing/many_cows_instanced.pdf
Binary file not shown.
Binary file added img/charts/instancing/many_cows_instanced.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/charts/instancing/pies.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions img/charts/instancing/single_cow.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"clearDepthBuffer";"407.638";"#ff6633";"0"
"computeVertexTFs";"2.751";"#99ff66";"0"
"vertexShader";"34.2";"#ccaa00";"0"
"primitiveAssembly";"49.181";"#aa99cc";"0"
"scanlineRasterization";"1847.084";"#33aa66";"0"
"fragmentShader";"236.989";"#aaff33";"0"
"host -> device memcpy";"60.225";"#ff9933";"0"
Binary file added img/charts/instancing/single_cow.pdf
Binary file not shown.
Binary file added img/charts/instancing/single_cow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions img/charts/instancing/stack_comparison.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"";"clearDepthBuffer";"computeVertexTFs";"vertexShader";"primitiveAssembly";"scanlineRasterization";"fragmentShader";"host -> device memcpy"
"SIngle Cow";"407.638";"2.751";"34.2";"49.181";"1847.084";"236.989";"60.225"
"Geometry Cows";"408.746";"2.75";"186.007";"1053.543";"1355.387";"249.116";"2899.678"
"Instanced Cows";"407.226";"2.795";"160.891";"1044.682";"1481.845";"256.672";"50.192"
Binary file added img/charts/instancing/stack_comparison.pdf
Binary file not shown.
Binary file added img/charts/instancing/stack_comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions img/charts/tiling/tiling.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"";"clear tiles/fragment buffer";"compute vertex transforms";"vertex shade";"primitive assembly";"bin primitives (tiling only)";"scanline rasterization"
"cow";"407.724";"2.790";"33.997";"48.961";"0.0";"2803.701"
"cow (tiled)";"5.793";"3.970";"33.211";"47.463";"4982.626";"1772.194"
"zoomed cube";"409.288";"2.0768";"4.691";"10.299";"0.0";"56579.231"
"zoomed cube (tiled)";"5.854";"3.952";"4.701";"10.212";"18.913";"1546.637"
Binary file added img/charts/tiling/tiling.pdf
Binary file not shown.
Binary file added img/charts/tiling/tiling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/instanced_cows.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/instancing_spread.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/one_cow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/tiled_cow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/tiled_cube.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/tiling_spread.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/untiled_cow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/untiled_cube.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
147 changes: 147 additions & 0 deletions notes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
add:
-anti-aliasing
-tessellation shader + wireframe shading OR instancing
-tile based pipeline

TESSELATION
-every triangle gets an "innter" value and 4 "outer" values
-wat

INSTANCING [working]
-we'll do it two ways
-serial instancing: multiple draws between buffer wipes
-parallel instancing: allocates more memory every time number of instances changes
-more memory in the primitives buffer
-add new kernels for parallel instancing

-btw there's a problem where the fragment shader doesn't work on lab machines. wtf.

ANTIALIASING
-seems like the way MSAA works is:
-compute 5 samples per fragment
-but only compute fragment shading on the center
-use that value for each occluded sample
-and then average the samples for a color
-it's cheating antialiasing!
-for frags with more than one triangle... bad luck, must compute more times. worst case is 5x per frag, so worst case is like FSAA

-TODO:
-add a struct for AAfrag [done]
-add a new AAdepth_buffer [done]
-add a new AArender [done]
-add a new AAscanline [done]
-add fragment shaders [done]
-add antialiasing to rasterize() [done]

TILING
-need to tile scanline and fragment shading
-so need to make shared memory versions of each
-need to use syncthreads()
-go look at how the work efficient version was done

TODO:
-add a spatial datastructure
-we'll need one for each tile
-each needs room to hold a list of every primitive (yikes)
-so spatial datastructure size and allocation all needs to be done in one go
-also need a fast kernel to "dump" transformed primitives into each structure
-also will need to add a special destruction handler to destroy each list (yikes)
-add a new tiled scanline -> AA version if we can get to it. doubtful.
-basically, uses shared memory
-so needs to clip to smaller coordinate area, use lower resolution
-fractional tiles are OK - be sure to handle this
-how to parallelize?
-over prims AND tiles? -> how to even do this? b/c need to ensure each tile's threads stay together.
-serial tiles, parallelize over prims in tile? -> just run a 250x for loop?
-this seems more likely.
-but then need to memcpy tile's primitive count over to hst?
-only other option is to launch kernel assuming full primitives, early-retire those that don't have a prim. <- ok. doable. slow?
-but again: problem is sharing a chunk of depth buffer.

-ditto with fragment shading

-what if...
-we allocate primitives * [tiles to tesselate screen] (yikes, but similar to above)
-for each primitive, compute its AABB coords and determine a list of tiles that it covers
-some are occupied, some are not. tiles no longer need to store a list of internal prims
-then when we scanline each primitive, we parallelize per tile
-blocksize must be [tiles to tesselate screen]. share a chunk of depth buffer
-similar memory requirements, in theory. improvement in that there's potentially less fetching from global memory
-this idea makes a "tile" kind of like a giant fragment
-in case of cow at 800x800, that's 5803 * 50 * 50 fragment_tiles.
-this also allows stream compaction to reduce number of threads to launch
-hmmm... but idea is to allow a block to share a chunk of the depth buffer. possibly impossible with this implementation

-FUNDAMENTAL PROBLEM: idea is to share a chunk of the depth buffer, which limits the block size
-this CAN'T BE DONE if we parallelize by primitive!
-unless block size is dynamic to number of primitives. potentially bad idea.
-otherwise complicates things a lot
-b/c suddenly if we have multiple blocks trying to write to the same chunk of screen space
-then we have to do another atomic write comparison at the writeback stage.
-which we CAN do, it's just slow.
-but I suppose... if the goal is just to access the global frame buffer LESS...
-this could be doable
-the only hope is that not all the primitives are in the same chunk of screen space!

-the polygon list implementation (fewer tiles, more data per tile) appears to be how it's done. ok.
-risk: low occupancy. unless we can figure out how to parallelize over both tiles and polygon lists-just may need stream compaction to cull lists that don't do stuff.

-scratch that: FUNDAMENTAL PROBLEM: the idea is to share a chunk of the depth buffer, which limits the block size
-assume we try to parallelize across all tiles and all primitives
-so, in case of cow at 800x800, grid with 5803 * 50 * 50 kernel instances.
-each block however can only assess one tile. that's the whole point of using shared memory.
-possible to resolve blocks that end up with primitives spread across tiles, but ruins the point.
-this requires thrasing global memory all over again for threads that aren't in the "current tile"
-buuuuuut wait. maybe this is only a problem if we do a stream compaction run!
-otherwise, primitives can be buffered apart by threads that don't do anything!
-potentially lots of wasted threads, but it should work without having to thrash global framebuffer!
-so when we allocate list for each tile, must allocate a multiple of block size.
-well. or we could handle this in indexing. probably better that way.
-theoretically similar to serializing though
-low occupancy in serializing a result of not a lot of prims case, though
-so this should perform better in case of, say, few prims at a large screen size
-and should perform similarly in other cases, b/c effectively serialized by thread controller

REVISED TODO
-add a define for shared memory size. [done]
-add a struct for TILE [done]
-holds pointer to a list of triangle indices. I suspect copying these over isn't going ot help
-holds int of number of triangles (aka point at which to "insert")
-also holds either NDC min/max or world min/max. check what rasterize does again
-add a buffer of TILEs [done]
-add a TILES setup function [done]
-add a RASTERIZE TILE PRIMITIVE function <- the biggie. the realllll biggie.
-needs to use a shared memory chunk
-needs to correctly compute index of tile and index of primitive within tile (yikes)
-needs to use atomics both in scanline portion and in writeback portion

-add tiled fragment shading (shouldn't be hard)

-so: according to http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0363d/CJAEEJCF.html,
it's OKAY to serialize tiles?
-where does this get us?
-so fully parallelizing (running all tiles at once) imroves occupancy but also leads to a lot of wasted threads
-but bottom line problem: a tile may be spread across multiple blocks if there's a lot of primitives
-in which case the tile's writeback is no better than normal writeback
-how to force a single tile into all shared memory? -> might not be possible
-unless the block is just really really huge and we run each tile individually
-but this constrains the number of polygons we can use

-how about hybrid parallel/serial approach with parallelization per tile? <- THIS. THIS MAKES THE MOST SENSE
-each block gets one tile
-each thread processes primitives in serial, using atomics to write to the block memory
-entire output is written out to the frame buffer; no need for atomics here

TODO README:
-describe features
-instancing
-MSAA, FSAA
-Tiling
-cows scene
-zoomed in cube scene

ANALYSIS - INSTANCING (no AA)
-single cow: 5804 triangles, 2903 vertices
-6 cows: 34824 triangles, 17418 verts
-single cow, host multicow, and device multicow all run at 60 fps
-cube scene: on the range of 18 fps untiled
Loading