-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pr/batched gen ops #1413
Merged
lgritz
merged 3 commits into
AcademySoftwareFoundation:master
from
AlexMWells:PR/BatchedGenOps
Oct 5, 2021
Merged
Pr/batched gen ops #1413
lgritz
merged 3 commits into
AcademySoftwareFoundation:master
from
AlexMWells:PR/BatchedGenOps
Oct 5, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added batched llvm code gen for: llvm_gen_blackbody, llvm_gen_luminance, llvm_gen_transformc (including code gen of loop to identify unique "from" and "to" space combinations to call library function with). Made ColorSystem const correct. Pulled much of ColorSystem's implementation into src/liboslexec/opcolor_impl.h so it could be inlined in both opcolor.cpp and wide_opcolor.cpp. To enable SIMD fast path for ColorSystem::blackbody_rgb added new methods to expose underlying lookup table optimization. bool can_lookup_blackbody(float T /*Kelvin*/) const; Color3 lookup_blackbody_rgb (float T /*Kelvin*/) const; Color3 compute_blackbody_rgb (float T /*Kelvin*/) const; When can_lookup_blackbody() returns true then lookup_blackbody_rgb can be safely called, otherwise the expensive compute_blackbody_rgb is required. Updated implementation of ColorSystem to succesfully vectorize on multiple compilers, optimized lookup table data types and indexed dereferences to generate prefered 32bit indexed gathers vs 64bit gathers which require multiple instructions and registers. To improve code generation and less chances of aliasing issues, changed nested M[i][j] operator access to Matrix in ImathMatrix.h to directly access underlying 2d array M.x[i][j] also changed Vector operator V[i] to directly access V.x, V.y, V.z in OSL/IMathx/IMath*.h Moved helper testIfAnyLaneIsNonZero from wide_opmatrix.cpp to be testIfAnyLaneIsNonZero in OSL/wide.h Added sfm::min_val to provide an implementation of min which returns values not references to the original through a ternary (which can cause issues with vectorization). Added Clang specific sfm::min_val and sfm::max_val implementations to assist in vectorization. Enabled BATCHED execution of existing testsuites: blackbody, color, transformc, wavelength_color Added new testsuite BATCHED regression tests: blackbody-reg, color-reg, luminance-reg, transformc-reg, wavelength_color-reg Added sfm::min_val to provide an implementation of min which returns values not references to the original through a ternary (which can cause issues with vectorization). Added Clang specific sfm::min_val and sfm::max_val implementations to assist in vectorization. Signed-off-by: Alex M. Wells <[email protected]>
Use single precision for bb_spectrum object/operator and manually implement std::pow(wlm,-5.0f) to better enable vectorization. Reduced masked code generation for rgb_to_hsl. Removed duplicated code by implementing ColorSystem::blackbody_rgb(T) in terms of ColorSystem::can_lookup_blackbody(T), ColorSystem::lookup_blackbody_rgb(T), and ColorSystem::compute_blackbody_rgb(T). Removed some casting to uint16_t that was unnecessary. Removed some test shaders that were not intended to be promoted. Signed-off-by: Alex M. Wells <[email protected]>
…_op, clamp, get_simple_SG_field, isconstant, select, unary_op, mix Fixed bug in BatchedAnalysis where the complement operator was being treated as always having a boolean result. In reality the result of complement or other bitwise operation would only be boolean if its input parameters were forced to be boolean (left TODO note for future improvement). Fixed bug in printf where integer who is forced_llvm_bool() is formatted as a float was not converted to an integer first. Enabled BATCHED for execution for tests: and-or-not-synonyms, isconstant, logic, select. Expanded testsuite/shaderglobals to exercise/access all shader global data members. Add regression tests: andor-reg, bitwise-and-reg bitwise-or-reg bitwise-shl-reg bitwise-shr-reg bitwise-xor-reg, complement-reg, mix-reg, select-reg Signed-off-by: Alex M. Wells <[email protected]>
lgritz
approved these changes
Oct 5, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Implement batched llvm code generation for ops:
NOTE: clamp maybe unreachable as stdosl.h removed the builtin in 2010, but unless clamp is removed an operation through all of oso and scalar implementation, thought it should be here.
NOTE: andor maybe unreachable because "and" and "or" built-ins were removed in commit f7e8ed3, May 2011, release 0.5.4., but unless "and" and "or" is removed an operation through all of oso and scalar implementation, thought it should be here.
Fixed bug in BatchedAnalysis where the complement operator was being treated as always having a boolean result. In reality the result of complement or other bitwise operation would only be boolean if its input parameters were forced to be boolean (left TODO note for future improvement).
Fixed bug in printf where integer who is forced_llvm_bool() is formatted as a float was not converted to an integer first.
Tests
Enabled BATCHED for execution for tests:
Expanded testsuite/shaderglobals to exercise/access all shader global data members.
Add regression tests:
Checklist: