Configuration flags for Linux Release

Overview

Linux release build allows enabling user-selected configuration flags. They are available after installing release build according to the instructions here. This file is autogenerated from igc_flags.h.

Important notice

Configuration flags are generally used either for debug purposes or to experimentally change the compiler's behavior. Intel does not guarantee full performance and conformance when using configuration flags.

How to enable a flag

A flag is enabled when it is set as a variable in an environment.

The syntax is as follows:

IGC_<flag>=<value>

For example - to enable ShaderDumpEnable flag in shell:

$ export IGC_ShaderDumpEnable=1

VISA optimization

Flag	Description	Release builds
`AssumeUniformIndirectCall`	Assume indirect call is uniform to avoid looping code	-
`AvoidDstSrcGRFOverlap`	avoid GRF overlap for destination and source operands of an SIMD16/SIMD32 instruction	-
`AvoidSrc1Src2Overlap`	avoid src1 and src2 GRF overlap to avoid the conflict without read suppression	-
`CSSIMD16_SpillThreshold`	Percentage of instructions allowed for spilling on CS SIMD16	-
`CSSIMD32_SpillThreshold`	Percentage of instructions allowed for spilling on CS SIMD32	-
`DPASTokenReduction`	optimization to reduce the tokens used for DPAS instruction.	Available
`DisableCSEL`	disable csel peep-hole	-
`DisableFlagOpt`	Disable optimization cmp with logic op	-
`DisableGatherRSFusionSyncWA`	Disable WA for gather instruction when read suppression and EU fusion are enabled.	Available
`DisableHFMath`	Disables HF math instructions.	-
`DisableIfCvt`	Disable ifcvt	-
`DisableMixMode`	Disables mix mode in vISA BE.	-
`DisableRegDistDep`	distable regDist dependence	Available
`DisableSendS`	Setting this to 1/true adds a compiler switch to not generate sends commands, default is to enable sends	-
`DisableThreeALUPipes`	Disable three ALU Pipelines. XeHP only	Available
`DisableWriteCombine`	Disable write combine. PVC+ only	-
`DumpASMToConsole`	Dump ASM to console and do early exit	Available
`DumpPromoteI8`	Dump useful info during promoting i8 to i16	Available
`DumpVISAASMToConsole`	Dump VISAASM to console and do early exit	Available
`Enable16DWURBWrite`	Enable 16 Dword URB Write messages	Available
`Enable16OWSLMBlockRW`	Enable 16 OWord (8 GRF) SLM block read/write message	Available
`Enable64BMediaBlockRW`	Enable 64 byte wide media block read/write message	Available
`EnableAdd3`	Enable Add3. XeHP+ only	Available
`EnableAtomicFusion`	To enable/disable atomic send fusion (simd8 shaders). Valid if EnableSendFusion is on.	-
`EnableBCR`	Enable bank conflict reduction.	Available
`EnableBfn`	Enable Bfn. XeHP+ only	Available
`EnableCallUniform`	[tmp, testing] Ignore indirect call's uniform	Available
`EnableCallWA`	Control call WA when EU fusion is on. 0: off; 1: on	Available
`EnableCoalesceScalarMoves`	Enable scalar moves to be coalesced into fewer moves	Available
`EnableForceDebugSWSB`	Enable force debugging functionality for software scoreboard generation	Available
`EnableGroupScheduleForBC`	Enable bank conflict reduction in scheduling.	Available
`EnableHWGenerateThreadID`	Enable new behavior of HW generating threadID for GPGPU pipe. XeHP and non-OCL only.	Available
`EnableHWGenerateThreadIDForTileY`	Enable HW generating threadID for GPGPU pipe for TileY mode. XeHP and non-OCL only.	Available
`EnableIGAEncoder`	Enable VISA IGA encoder	-
`EnableIGASWSB`	Use IGA for SWSB	Available
`EnableMathDPASWA`	PVC math instruction running with DPAS issue	-
`EnableNonOCLWalkOrderSel`	Enable WalkOrder selection for HW generating threadID for GPGPU pipe. XeHP and non-OCL only.	Available
`EnablePassInlineData`	1: Force pass 1st GRF of cross-thread payload as inline data; -1: Force disable passing inline data	Available
`EnablePreemption`	Enable generating preeemptable code (SKL+)	-
`EnablePromoteI8`	Enable promoting i8 (char) to i16 on all ALU insts that does support i8. It's only for XeHPC+ for now.	Available
`EnablePromoteI8Vec`	Control if a certain i8 vector needs to be promoted (detail in code)	Available
`EnablePvtMemHalfToFloat`	Enable conversion from half to float for private memory.	Available
`EnableQWRotateInstructions`	Enable QW type support for rotate instructions. PVC only.	Available
`EnableQuickTokenAlloc`	Insert dependence resolve for kernel stitching	Available
`EnableSWSBInstStall`	Enable force stall to specific(start) instruction start for software scoreboard generation	Available
`EnableSWSBInstStallEnd`	Enable force stall to end instruction for software scoreboard generation	Available
`EnableSWSBStitch`	Insert dependence resolve for kernel stitching	Available
`EnableSWSBTokenBarrier`	Enable force specific instruction as a barrier for software scoreboard generation	Available
`EnableSendFusion`	Enable(!=0)/disable(0)/force(2) send fusion. Valid for simd8 shader/kernel only.	-
`EnableSeparateScratchWA`	Apply the workaround in slot0 and slot1 sizes when separating scratch spacesSeparate scratch space.	Available
`EnableSpillSpaceCompression`	Enable spill space compression. 0 - off, 1 - on, 2 - platform default	-
`EnableUntypedSurfRWofSS`	Enable untyped surface RW to scratch space. XeHP A0 only.	Available
`EnableVISABinary`	Enable VISA Binary	Available
`EnableVISABoundsChecking`	Enable VISA bounds checking.	-
`EnableVISADebug`	Runs VISA in debug mode, all optimizations disabled	-
`EnableVISADotAll`	Enable VISA DotAll. Dumps dot files for intermediate stages	-
`EnableVISADumpCommonISA`	Enable VISA Dump Common ISA	Available
`EnableVISAJmpi`	Enable/Disable VISA generating jmpi (scalar jump).	-
`EnableVISANoBXMLEncoder`	Enable VISA No-BXML encoder	-
`EnableVISANoSchedule`	Enable VISA No-Schedule	Available
`EnableVISAOutput`	Enable VISA GenISA output	Available
`EnableVISAPreSched`	Enable VISA Pre-RA Scheduler	Available
`EnableVISASlowpath`	Enable VISA Slowpath. Needed to dump .visaasm	Available
`EnableVISAStructurizer`	Enable/Disable VISA structurizer. See value defs in igc_flags.hpp.	-
`ExpandPlane`	Enable pln to mad macro expansion.	-
`Force32bitConstantGEPLowering`	Go back to old version of GEP lowering for constant address space. PVC only	-
`ForceAllowSmallSpill`	Allow small spills regardless of SIMD, API, or platform. The spill amount is set below	-
`ForceBCR`	Force bank conflict reduction, no matter spill or not.	Available
`ForceHWThreadNumberPerEU`	Total HW thread number per-EU.	-
`ForceInlineDataForXeHPC`	Force InlineData for XeHPC. For testing purposes.	Available
`ForceNoMaskWA`	[tmp, testing] Force NoMaskWA on any platforms	-
`ForcePreemptionWA`	Force generating preemptable code across platforms	Available
`ForcePreserveR0`	Setting this to true makes VISA preserve r0 in r0	Available
`ForcePromoteI8`	Force promoting i8 (char) to i16 on all ALU insts (for testing).	Available
`ForceSubReturn`	If a subroutine does not have a return, generate a dummy return if this key is set (to meet visa requirement)	-
`ForceTexelMaskClear`	If set to 1 or 2, forces evaluate messages to clear the texel mask to 0 or 1, respectively.	Available
`ForceUniformBuffer`	Force buffer operand to be uniform	-
`ForceUniformSurfaceSampler`	Force surface and sampler operand to be uniform	-
`ForceVISAPreSched`	Force enabling of VISA Pre-RA Scheduler	-
`ForceVISAStructurizer`	Force VISA structurizer for testing. Used on platforms in which we turns off SCF and use UCF by default	-
`GlobalSendVarSplit`	Enable global send variable splitting when we are about to spill	-
`NewSpillCostFunction`	Use new spill cost function in VISA RA	-
`NoMaskWA`	Enable NoMask WA by using software-computed emask flag	-
`ReplaceIndirectCallWithJmpi`	Replace indirect call with jmpi instruction (HW WA)	Available
`ReservedRegisterNum`	Reserve register number for spill cost testing.	-
`SIMD16_SpillThreshold`	Percentage of instructions allowed for spilling on SIMD16	-
`SIMD32_SpillThreshold`	Percentage of instructions allowed for spilling on SIMD32	-
`SIMD8_SpillThreshold`	Percentage of instructions allowed for spilling on SIMD8	-
`SWSBMakeLocalWAR`	make WAR SBID dependence tracking BB local	Available
`SWSBTokenNum`	Total tokens used for SWSB.	Available
`ScratchSpaceSizeLimit`	Size limit of scratch space. XeHP and above only. Test only. Remove it once stabalized.	Available
`ScratchSpaceSizeReserved`	Reserved size of scratch space. XeHP and above only. Test only. Remove it once stabalized.	Available
`SeparateSpillPvtScratchSpace`	Separate scratch spaces for spillfill and privatememory. XeHP and above only. Test only. Remove it once stabalized.	Available
`SetA0toTdrForSendc`	Set A0 to tdr0 before each sendc/sendsc	Available
`SpillCompressionThresholdOverride`	Set a threshold number (1K based) to run with spill compression	-
`TotalGRFNum`	Total GRF setting for both IGC-LLVM and vISA	-
`TotalGRFNum4CS`	Total GRF setting for both IGC-LLVM and vISA, for ComputeShader-only experiment.	-
`UnifiedSendCycle`	Using unified send cycle.	-
`Use16ByteBindlessSampler`	True if 16-byte aligned bindless sampler state is used	-
`UseLinearScanRA`	use Linear Scan as default register allocation algorithm	-
`UseMathWithLUT`	Use the implementations of cos, cospi, log, sin, sincos, and sinpi with Look-Up Tables (LUT).	-
`VISALTO`	vISA LTO optimization flags. check LINKER_TYPE for more details	-
`VISAOptions`	Options to vISA. Space-separated options.	Available
`VISAPostScheduleEndBBID`	The ID of BB which will be last scheduled	-
`VISAPostScheduleStartBBID`	The ID of BB which will be first scheduled	-
`VISAPreSchedCtrl`	Configure Pre-RA Scheduler, default(0), logging(1), latency(2), pressure(4)	-
`VISAPreSchedExtraGRF`	Bump up GRF number to make pre-RA Scheduling more greedy, 0 for the default	-
`VISAPreSchedRPThreshold`	Threshold to commit a pre-RA Scheduling without spills, 0 for the default	-
`VISAScheduleEndBBID`	The ID of BB which will be last scheduled	-
`VISAScheduleStartBBID`	The ID of BB which will be first scheduled	-
`WARSWSBLocalEnd`	WAR localization end BB	Available
`WARSWSBLocalStart`	WAR localization start BB	Available
`disableCompaction`	Disables compaction.	Available
`disableIGASyntax`	Disables GEN isa text output using IGA and new syntax.	-

IGC Optimization

Flag	Description	Release builds
`AllowMem2Reg`	Setting this to true makes IGC run mem2reg even when optimizations are disabled	Available
`BlockPushConstantGRFThreshold`	Set the maximum limit for block push constants i.e. UBO data pushed. Set to 0xFFFFFFFF to use the default threshold for the platform. Note that for small pixel shaders the PayloadSizeThreshold may be the limiting factor.	-
`CodeLoopSinkingMinSize`	Don't sink in the loop if the number of instructions in the kernel is less	-
`CodeSinkingLoadSchedulingInstr`	Instructions number to step to schedule loads in advance before the load use to cover latency. 1 to insert it immediately before use	-
`CodeSinkingMinSize`	Don't sink if the number of instructions in the kernel is less	-
`DisableAttributePush`	Bit mask to disable push Attribute per shader stages. bit0 = All, Bit 1 = VS, Bit 2 = HS, Bit 3 = DS, Bit 4 = GS	-
`DisableBranchSwaping`	Setting this to 1/true adds a compiler switch to disable branch swapping.	-
`DisableCodeHoisting`	Setting this to 1/true adds a compiler switch to disable code-hoisting	-
`DisableCodeSinking`	Setting this to 1/true adds a compiler switch to disable code-sinking	-
`DisableCodeSinkingInputVec`	Setting this to 1/true disable sinking inputVec inst (test)	-
`DisableConstBaseGlobalBaseArg`	Do no generate kernel implicit arguments: constBase and globalBase	-
`DisableConstantCoalescing`	Setting this to 1/true adds a compiler switch to disable constant coalesing	-
`DisableConstantCoalescingOfStatefulNonUniformLoads`	Disable merging non-uniform loads from stateful buffers. Note: does not affect merging to sampler loads	-
`DisableConstantCoalescingOutOfBoundsCheck`	Setting this to 1/true adds a compiler switch to disable constant coalesing out of bounds check	-
`DisableCustomUnsafeOpt`	Disable IGC to run custom unsafe optimizations	-
`DisableDX9LowPrecision`	Disables HF in DX9.	-
`DisableDotAddToDp4aMerge`	Disable Dot and Add ops to Dp4a merge optimization.	-
`DisableDynamicResInfoFolding`	Disable Dynamic ResInfo Instruction Folding	-
`DisableDynamicTextureFolding`	Disable Dynamic Texture Folding	-
`DisableEmptyBlockRemoval`	Setting this to 1/true adds a compiler switch to disable empty block optimization	-
`DisableFDivReassociation`	Disable reassociation for Fdiv operations to avoid precision difference	-
`DisableFlattenSmallSwitch`	Disable the flatten small switch pass	-
`DisableGatingSimilarSamples`	Disable Gating of similar sample instructions	-
`DisableIGCOptimizations`	Setting this to 1/true adds a compiler switch to disables all the above IGC optimizations	-
`DisableIPConstantPropagation`	Disable Inter-procedrual constant propgation	-
`DisableIRVerification`	Setting this to 1/true adds a compiler switch to disable IGC IR verification.	-
`DisableImmConstantOpt`	Disable IGC IndirectICBPropagaion optimization	-
`DisableLLVMGenericOptimizations`	Disable LLVM generic optimization passes	-
`DisableLoadSinking`	Setting this to 1/true adds a compiler switch to disable load sinking during retry	-
`DisableLoopSink`	Disable sinking in all loops	-
`DisableLoopSplitWidePHIs`	Disable splitting of loop PHI values to eliminate subvector extract operations	-
`DisableLoopUnroll`	Setting this to 1/true adds a compiler switch to disable loop unrolling.	Available
`DisableMCSOpt`	Disable IGC to run MCS optimization	-
`DisableMatchFloor`	Setting this to 1/true adds a compiler switch to disable sub-frc = floor optimization	-
`DisableMatchMad`	Setting this to 1/true adds a compiler switch to disable mul+add = mad optimization	-
`DisableMatchPow`	Setting this to 1/true adds a compiler switch to disable log2/mul/exp2 = pow optimization	-
`DisableMatchPredAdd`	Setting this to 1/true adds a compiler switch to disable pred+add = predAdd optimization	-
`DisableMatchSimpleAdd`	Setting this to 1/true adds a compiler switch to disable simple cmp+and+add optimization	-
`DisableMovingInstanceIDIndexOfVS`	Disable moving index of InstanceID in VS to last location.	-
`DisablePayloadCoalescing`	Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for all types	-
`DisablePayloadCoalescing_AtomicTyped`	Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for atomic typed only	-
`DisablePayloadCoalescing_RT`	Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for RT only	-
`DisablePayloadCoalescing_Sample`	Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for Samplers only	-
`DisablePayloadCoalescing_URB`	Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for URB writes only	-
`DisablePromotePrivMem`	Setting this to 1/true adds a compiler switch to disable IGC private array promotion	-
`DisablePullConstantHeuristics`	Disable the heuristics to determine the no. push constants based on payload size.	-
`DisablePushConstant`	Bit mask to disable push constant per shader stages. bit0 = All, Bit 1 = VS, Bit 2 = HS, Bit 3 = DS, Bit 4 = GS, Bit 5 = PS	-
`DisableRectListOpt`	Disable Rect List optimization	-
`DisableReducePow`	Disable IGC to reduce pow instructions	-
`DisableSIMD32Slicing`	Setting this to 1/true adds a compiler switch to disable emitting SIMD32 VISA code in slices	-
`DisableSimplePushWithDynamicUniformBuffers`	Disable Simple Push Constants Optimization for dynamic uniform buffers.	-
`DisableSqrtOpt`	Prevent IGC from doing the optimization y*y = x if y = sqrt(x)	-
`DisableStaticCheck`	Disable static check to push constants.	-
`DisableStaticCheckForConstantFolding`	Disable static check to fold constants.	-
`DisableSynchronizationObjectCoalescingPass`	Disable SynchronizationObjectCoalescing pass	-
`DisableURBPartialWritesPass`	Disable IGC pass that converts URB partial writes to full-mask writes.	-
`DisableURBReadMerge`	Disable IGC pass that merges URB Read instructions.	-
`DisableURBWriteMerge`	Setting this to 1/true adds a compiler switch to disable URB write merge	-
`DisableUniformAnalysis`	Setting this to 1/true adds a compiler switch to disable uniform_analysis	-
`DisableUniformTypedAccess`	Setting this will disable uniform typed access handling	-
`DisableUniformURBWrite`	Disables generation of uniform URB write messages	-
`EnableAtomicBranch`	Enable Atomic branch optimization that break atomic into if/else. 1: if Val == 0 ignore iadd/sub/umax 0. 2: checks if memory is lower than Val before doing umax. 3: applies both 1 for iadd/sub and 2 for umax	-
`EnableBitcastedLoadNarrowing`	Enable narrowing of vector loads in bitcasts patterns.	-
`EnableBitcastedLoadNarrowingToScalar`	Enable narrowing of vector loads to scalar ones in bitcasts patterns.	-
`EnableBlendToDiscard`	Enable blend to discard based on blend state.	-
`EnableBlendToFill`	Enable blend to fill based on blend state.	-
`EnableCodeAssumption`	If set (> 0), generate llvm.assume to help certain optimizations. It is OCL only for now. Only 1 and 2 are valid. 2 will be 1 plus additional assumption. It also does other minor changes.	-
`EnableCustomLoopVersioning`	Enable IGC to do custom loop versioning	-
`EnableDeSSA`	Setting this to 0/false adds a compiler switch to disable De-SSA	-
`EnableDeSSAWA`	[tmp]Keep some piece of code to avoid perf regression	-
`EnableExtractCommonMultiplier`	Enable ExtractCommonMultiplier optimization in CustomUnsafeOptPass.	-
`EnableFastMath`	Enable fast math optimizations in IGC	-
`EnableFastSampleD`	Enable fast sample D opt.	-
`EnableGEPLSR`	Enables GEP Loop Strength Reduction pass	-
`EnableGEPLSRAnyIntBitWidth`	Experimental: Enables reduction of SCEV with illegal integers. Requires legalization pass to clear up expanded code.	Available
`EnableGEPLSRToPreheader`	Enables reduction to loop's preheader in GEP Loop Strength Reduction pass	-
`EnableGVN`	Enable LLVM global value numbering	-
`EnableGenUpdateCB`	Enable derived constant optimization.	-
`EnableGenUpdateCBResInfo`	Enable derived constant optimization with resinfo.	-
`EnableHighestSIMDForNoSpill`	When there is no spill choose highest SIMD (compute shader only).	-
`EnableHoistDp3`	Enable dp3 Hoisting.	-
`EnableHoistMulInLoop`	Hoist multiply with loop invirant out of loop, FP unsafe	-
`EnableIndependentSharedMemoryFenceFunctionality`	Enable treating global memory fences as shared memory fences in SynchronizationObjectCoalescing pass	-
`EnableIntegerMad`	Setting this to 1/true adds a compiler switch to enable integer mul+add = mad optimization	-
`EnableJumpThreading`	Setting this to 1/true adds a compiler switch to enable llvm jumpThreading pass.	Available
`EnableLSCFence`	Enable LSC Fence in ConvertDXIL for the device has LSC	-
`EnableLoadChainLoopSink`	Allow sinking of load address calculation when the load was sinked to the loop, even if the needed regpressure is achieved (only single use instructions)	-
`EnableLoadsLoopSink`	Allow sinking of loads in the loop	-
`EnableLogicalAndToBranch`	Enable convert logical AND to conditional branch	-
`EnableLoopHoistConstant`	Enables pass to check for specific loop patterns where variables are constant across all but the last iteration, and hoist them out of the loop.	-
`EnableNewTileYCheck`	Enable new TileY check. 0 - off, 1 - on, 2 - platform default	-
`EnableOptReportLoadNarrowing`	Generate opt report for narrowing of vector loads.	-
`EnablePingPongTextureOpt`	Enables the Ping Pong texture optimization which is used only for Compute Shaders for back to back dispatches	-
`EnablePlatformFenceOpt`	Force fence optimization	-
`EnablePowToLogMulExp`	Enable pow to exp(log(x)*y) optimization in CustomUnsafeOptPass.	-
`EnableRobustBufferAccessPush`	Setting to 1/true will allow a single push buffer to be supported when the client requests robust buffer access (DG2+ only)	-
`EnableSLMConstProp`	Enable SLM constant propagation (compute shader only).	-
`EnableSamplerChannelReturn`	Setting this to 1/true adds a compiler switch to enable using header to return selective channels from sampler	-
`EnableSimplePushSizeBasedOpimization`	Enable the simplepush optimization to do push based on size	-
`EnableSimplifyGEP`	Enable IGC to simplify indices expr of GEP.	-
`EnableSoftwareStencil`	Enable software stencil for PS.	-
`EnableSoftwareVertexFetch`	Enable software vertex fetch for VS.	-
`EnableSplitIndirectEEtoSel`	Enable the split indirect extractelement to icmp+sel pass	-
`EnableSplitUnalignedVector`	Enable Splitting of unaligned vectors for loads and stores	-
`EnableStatefulAtomic`	Enable promoting stateless atomic to stateful atomic.	-
`EnableStatefulToken`	Enable generating patch token to indicate a ptr argument is fully converted to stateful (temporary)	-
`EnableStatelessToStateful`	Enable Stateless To Stateful transformation for global and constant address space in OpenCL kernels	-
`EnableSumFractions`	Enable SumFractions optimization in CustomUnsafeOptPass.	-
`EnableTextureLoadCoalescing`	Enable merging non-uniform loads from bindless textures	-
`EnableThreadCombiningOpt`	Enables the thread combining optimization which is used only for Compute Shaders for combining a number of software threads to dispatch smaller number of hardware threads	-
`EnableThreeWayLoadSpiltOpt`	Enable three way load spilt opt.	-
`EnableTrigFuncRangeReduction`	reduce the sin and cosing function domain range	Available
`EnableUnmaskedFunctions`	Enable unmaksed functions SYCL feature.	Available
`EnableWaveForce32`	Force Wave to use simd32	-
`EnableWorkGroupUniformGoto`	Setting to 1 enables generating uniform goto for work group uniform [eu fusion only]	-
`FPRoundingModeCoalescingMaxDistance`	Max distance in instructions for reordering FP instructions with common rounding mode	-
`ForceAddressArithSinking`	Force sinking address arithmetic closer to the usage	-
`ForceHoistDp3`	force dp3 Hoisting.	-
`ForceLinearWalkOnLinearUAV`	Force linear walk on linear UAV buffer	-
`ForceLoadsLoopSink`	Force sinking of loads in the loop from the beginning	-
`ForceLoopSink`	Force sinking in all loops	-
`ForceSupportsAutoGRFSelection`	ForceSupportsAutoGRFSelection	Available
`ForceSupportsStaticRegSharing`	ForceSupportsStaticRegSharing	Available
`ForceTileY`	Force TileY mode on DG2	-
`GEPLSRThresholdRatio`	Ratio for register pressure threshold in GEP Loop Strength Reduction pass	-
`KeepTileYForFlattened`	Keep TileY for FlattenedThreadIdInGroup. 0 - off, 1 - on, 2 - platform default	-
`LLVMCommandLine`	applies LLVM command line	-
`LoopSinkMinSave`	If loop sink can have save more 32-bit values than this Minimum, do it; otherwise, skip	-
`LoopSinkMinSaveUniform`	If loop sink can have save more scalar (uniform) values than this Minimum, do it; otherwise, skip	-
`LoopSinkRegpressureMargin`	Sink into the loop until the pressure becomes less than #grf-margin	-
`LoopSinkRollbackThreshold`	Rollback loop sinking if the estimated regpressure after the sinking is still higher than this + #available registers, and the number of registers can be increased	-
`LoopSinkThresholdDelta`	Do loop sink If the estimated register pressure is higher than this + #avaialble registers	-
`MaxImmConstantSizePushed`	Set the max size of immediate constant buffer pushed	-
`PSSIMD32HeuristicFP16`	enable PS SIMD32 heuristic based on fp16 characteristic	-
`PSSIMD32HeuristicLoopAndDiscard`	enable PS SIMD32 heuristic based on loop info and discard	-
`PayloadSizeThreshold`	Set the max payload size threshold for short shades that have PSD bottleneck.	-
`PrepopulateLoadChainLoopSink`	Check the loop for loop chains before sinking to use the existing chains in a heuristic	-
`RovOpt`	Bitmask for ROV optimizations. 0 for all off, 1 for force fence flush none, 2 for setting LSC_L1UC_L3C_WB, 3 for both opt on	-
`RuntimeLoopUnrolling`	Setting this to switch on/off runtime loop unrolling. 0: default (on), 1: force on, 2: force off	-
`SelectiveHashOptions`	applies options to hash range via string	-
`SetBranchSwapThreshold`	Set the branch swaping threshold.	-
`SetDefaultTileYWalk`	Use TileY walk as default for HW generating threadID	Available
`SetLoopUnrollThreshold`	Set the loop unroll threshold. Value 0 will use the default threshold.	-
`SetLoopUnrollThresholdForHighRegPressure`	Set the loop unroll threshold for shaders with high reg pressure. Value 0 will use the default threshold.	-
`SetRegisterPressureThresholdForLoopUnroll`	Set the register pressure threshold for limiting the loop unroll to smaller loops	-
`SetURBFullWriteGranularity`	Overrides the minimum access granularity for URB full writes. Valid values are 0, 16 and 32, value 0 means use default for the platform.	Available
`SplitIndirectEEtoSelThreshold`	Split indirect extractelement cost threshold	-
`SynchronizationObjectCoalescingConfig`	Modify the default behavior of SynchronizationObjectCoalescing value is a bitmask bit0 – remove fences in read barrier write scenario	Available
`UseHDCTypedReadForAllTextures`	Setting this to use HDC message rather than sampler ld for texture read	-
`UseHDCTypedReadForAllTypedBuffers`	Setting this to use HDC message rather than sampler ld for buffer read	-
`UseTiledCSThreadOrder`	Use 4x4 disaptch for CS order when it seems beneficial	-
`WaAllowMatchMadOptimizationforVS`	Setting this to 1/true adds a compiler switch to enable mul+add = mad optimization for VS	-
`WaDisableMatchMadOptimizationForCS`	Setting this to 1/true adds a compiler switch to disable mul+add = mad optimization for CS	-
`forceFullUrbWriteMask`	Set Full URB write mask.	-
`forcePushConstantMode`	set the push constant mode, 0 is default behavior, 1 is simple push, 2 is gather constant, 3 is none/pull constants	-

Shader debugging

Flag	Description	Release builds
`CompileOneAtTime`	Compile only one kernel (out of many in llvm::module) at a time. Prints compiled kenrels names to stdout. Useful to debug compilation time and crashes - it does not produce valid binary.	-
`CopyA0ToDBG0`	Copy a0 used for extended msg descriptor to dbg0 to help debug	-
`DPASReadSuppressionWA`	Enable read suppression WA for the send and indirect access	-
`DebugInternalSwitch`	Code pass selection, debug only	-
`DisablePassToggles`	Disable each IGC pass by setting the bit. HEXADECIMAL ONLY!. Ex: C0 is to disable pass 6 and pass 7.	-
`DisableSendSrcDstOverlapWA`	Disable Send Source/destination overlap WA which is enabled for GEN10/GEN11 and whenever Wddm2Svm is set in WATable	-
`DumpPayloadToScratch`	Setting this to 1/true dumps thread payload to scartch space. Used for workloads which doesnt use scartch space for other purposes	-
`EnableBitcastExtractInsertPattern`	Enable BitcastExtractInsertPattern in CustomSafeOptPass.	Available
`EnableCSSIMD32`	Enable computer shader SIMD32 mode, and fall back to lower SIMD when spill	-
`EnableDebugging`	Enable shader debugging for release internal	-
`EnableDivergentBarrierCheck`	Uses WIAnalysis to find barriers in divergent flow control. May have false positives.	-
`EnableHashMovsAtPrologue`	Rather than after EOT, insert hash code movs at shader entry	Available
`EnableLSCFenceUGMBeforeEOT`	Enable inserting fence.ugm.06.tile before EOT if a kernel has any write to UGM [XeHPC, PVC].	Available
`EnableOptionalBufferOffset`	For StatelessToStateful optimization [OCL], if true, make buffer offset optional. Valid only if buffer offset is supported.	Available
`EnableRTLSCFenceUGMBeforeEOT`	[tmp]Enable inserting fence.ugm.06.tile before EOT for RT shader [XeHPC, PVC].	-
`EnableRTmaskPso`	Enable render target mask optimization in PSO opt	-
`EnableSIPOverride`	This key forces load of SIP from a a Local File.	-
`EnableSupportBufferOffset`	[debugging]For StatelessToStateful optimization [OCL], support implicit buffer offset argument (same as -cl-intel-has-buffer-offset-arg).	-
`EnableTestIGCBuiltin`	Enable testing igc builtin (precompiled kernels) using OCL.	-
`EnableTrivialEmulateSinCos`	Enable Emulation for Sine and Cosine instructions	-
`EnableZeroSomeARF`	If set, insert mov inst to zero a0, acc, etc to assist HW debugging.	-
`EnablerReadSuppressionWA`	Enable read suppression WA for the send and indirect access	-
`ForceCSLeastSIMD`	Force computer shader to the lowest allowed SIMD mode	-
`ForceCSSIMD16`	Force computer shader SIMD16 mode if allowed, otherwise it will use SIMD32	-
`ForceCSSIMD32`	Force computer shader SIMD32 mode	-
`ForceDisableShaderDebugHashCodeInKernel`	Disable hash code addition to the binary after EOT	Available
`ForceEmuKind`	Force emuKind used by PreCompiledFuncImport pass. This flag takes emulation kind value that is defined in EmuKind enum in PreCompiledFuncImport.hpp [TEST ONLY]	-
`ForceFunctionsToNop`	Replace functions with immediate return to help narrow down shaders; use with Options.txt.	-
`ForceLoosenSimd32Occu`	Control loosenSimd32occu return value. 0 - off, 1 - on, 2 - platform default	-
`ForceMemoryFenceBeforeEOT`	Forces inserting SLM or gloabal memory fence before EOT if shader writes to SLM or goblam memory respectively.	-
`ForcePerThreadPrivateMemorySize`	Useful for ensuring a certain amount of private memory when doing a shader override.	Available
`ForceStatelessForQueueT`	In OCL, force to use stateless memory to hold queue_t*. This is a legacy feature to be removed.	-
`MSAAClearedKernel`	Insert the discard code for MSAA_MSC_Cleared kernels. 2/4/8/16	-
`PrintVerboseGenericControlFlowLog`	Forces compiler to print detailed log about additional control flow generated due to a presence of generic memory operations	Available
`RetryManagerFirstStateId`	For debugging purposes, it can be useful to start on a particular id rather than id 0.	-
`RouteByLodHint`	An integer offset addon to route the resource to HDC on DG2	-
`SIPOverrideFilePath`	This key when enabled with EnableSIPOverride load of SIP from a specified path.	-
`SToSProducesPositivePointer`	This key is for StatelessToStateful optimization if the user knows the pointer offset is postive to the kernel argument.	-
`ShaderDebugHashCode`	The driver will set a breakpoint in the first instruction of the shader which has the provided hash code. It works only when the value is different then 0 and SystemThreadEnable is set to TRUE. Ex: VS_asm2df26246434553ad_nos0000000000000000 , only the LowPart Need to be Enterd in Registry Ex : 0x434553ad ,i.e Lower 8 Hex Digits of the 16 Digit Hash Code for Compatibilty Reasons	-
`ShaderDebugHashCodeInKernel`	Add hash code to the binary	Available
`ShaderDisableOptPassesAfter`	Will only run first N optimization passes, any further passes will be ignored. This flag can be used to bisect optimization passes.	-
`ShaderDisplayAllPassesNames`	Display to console all passes name with their ID and occurrence number.	-
`ShaderOverride`	Will override any LLVM shader with matching name in c:\Intel\IGC\ShaderOverride	-
`ShaderPassDisable`	Disable specific passes eg. '9;17-19;239-;Error Check;ResolveOCLAtomics:2;Dead Code Elimination:3-5;BreakConstantExprPass:7-' disable pass 9, disable passes from 17 to 19, disable all passes after 238, disable all occurrences of pass Error Check, disable second occurrence of ResolveOCLAtomics, disable pass Dead Code Elimination occurrences from 3 to 5, disable all BreakConstantExprPass after his 6 occurrence To show a list of pass names and their occurrence set ShaderDisplayAllPassesNames. Must be used with ShaderDumpEnableAll flag.	-
`SystemThreadEnable`	This key forces software to create a system thread. The system thread may still be created by software even if this control is set to false.The system thread is invoked if either the software requires exception handling or if kernel debugging is active and a breakpoint is hit.	-
`TestIGCPreCompiledFunctions`	Enable testing for precompiled kernels. [TEST ONLY]	-
`ld2dmsInstsClubbingThreshold`	Do not club more than these ld2dms insts into the new BB during MCSOpt	-
`manualEnableRSWA`	Enable read suppression WA for the send and indirect access	-

Shader dumping

Flag	Description	Release builds
`AddExtraIntfInfo`	Will add extra inteference info from .extraintf files from c:\Intel\IGC\ShaderOverride	-
`DebugDumpNamePrefix`	Set a prefix to debug info dump filenames(with path) and drop hash info from them (for testing purposes)	Available
`DumpDeSSA`	dump DeSSA info into file.	Available
`DumpHasNonKernelArgLdSt`	Print if hasNonKernelArg load/store to stderr	Available
`DumpLLVMIR`	dump LLVM IR	Available
`DumpLoopSink`	Dump debug info in LoopSink	-
`DumpOCLProgramInfo`	dump OpenCL Patch Tokens, Kernel/Program Binary Header	Available
`DumpPatchTokens`	Enable dumping of patch tokens.	Available
`DumpResourceLoop`	dump resource loop detected by ResourceLoopAnalysis	Available
`DumpTimeStats`	Timing of translation, code generation, finalizer, etc	Available
`DumpTimeStatsCoarse`	Only collect/dump coarse level time stats, i.e. skip opt detail timer for now	Available
`DumpTimeStatsPerPass`	Collect Timing of IGC/LLVM passes	Available
`DumpToCurrentDir`	dump shaders to the current directory	Available
`DumpToCustomDir`	Dump shaders to custom directory. Parent directory must exist.	Available
`DumpUseShorterName`	If set, use an internal shader name(_entry_id) in dump file name	Available
`DumpVariableAlias`	Dump variable alias info, valid if EnableVariableAlias is on	Available
`DumpWIA`	dump WI (uniform) infomation into files in dump directory if set to true	-
`DumpZEInfoToConsole`	Dump zeinfo to console	Available
`ElfDumpEnable`	dump ELF file	Available
`ElfTempDumpEnable`	dump temporary ELF files	Available
`EnableCapsDump`	Enable hardware caps dump	Available
`EnableCisDump`	Enable cis dump	Available
`EnableCosDump`	Enable cos dump	Available
`EnableKernelNamesBasedHash`	If set, use kernels' names to calculate the hash. Doesn't work on .cl dump's hash. Will overwrite dumps if multiple modules have the same kernel names.	-
`EnableLivenessDump`	Enable dumping out liveness info on stderr.	Available
`EnableScalarizerDebugLog`	print step by step scalarizer debug info.	Available
`EnableShaderNumbering`	Number shaders in the order they are dumped based on their hashes	Available
`ForceRPE`	Force RPE (RegisterEstimator) computation if > 0. If 2, force RPE per inst.	Available
`InterleaveSourceShader`	Interleave the source shader in asm dump	Available
`PrintAfter`	Take either all or comma/semicolon-separated list of pass names. If set, enable print LLVM IR after the given pass is done (mimic llvm print-after)	Available
`PrintBefore`	Take either all or comma/semicolon-separated list of pass names. If set, enable print LLVM IR before the given pass is done (mimic llvm print-before)	Available
`PrintHexFloatInShaderDumpAsm`	print floats in hex in asm dump	Available
`PrintInstOffsetInShaderDumpAsm`	print instruction offsets as comments in asm dump	Available
`PrintMDBeforeModule`	Print metadata of the module at the beginning of the dump. Used for LIT tests.	Available
`PrintPsoDdiHash`	Print psoDDIHash in TimeStats_Shaders.csv file	Available
`PrintToConsole`	dump to console	Available
`ProgbinDumpFileName`	Specify filename to use for dumping progbin file to current dir	Available
`QualityMetricsEnable`	Enable Quality Metrics for IGC	Available
`RPEDumpLevel`	> 0 : dump info of register pressure estimate on stderr. See igc_flags.hpp level defs.	-
`ShaderDataBaseStats`	Enable gathering sends' sizes for shader statistics	-
`ShaderDataBaseStatsFilePath`	Path to a file with dumped shader stats additional data e.g. data available during compilation only	-
`ShaderDumpEnable`	dump LLVM IR, visaasm, and GenISA	Available
`ShaderDumpEnableAll`	dump all LLVM IR passes, visaasm, and GenISA	Available
`ShaderDumpEnableG4`	same as ShaderDumpEnable but adds G4 dumps (0 = off, 1 = some, 2 = all)	-
`ShaderDumpEnableIGAJSON`	adds IGA JSON output to shader dumps (0 = off, 1 = enabled, 2 = include def/use info but causes longer compile times)	-
`ShaderDumpEnableRAMetadata`	adds RA Metadata file to shader dumps	Available
`ShaderDumpFilter`	Only dump files matching the given regex	Available
`ShaderDumpInstNamer`	dump all unnamed LLVM IR instruction with variable names 'tmp' which makes easier for shaderoverriding	Available
`ShaderDumpPidDisable`	disabled adding PID to the name of shader dump directory	Available
`ShowFullVectorsInShaderDumps`	print all elements of vectors in ShaderDumps, can dramatically increase ShaderDumps size	Available

Debugging features

Flag	Description	Release builds
`AvoidUsingR0R1`	Do not use r0 and r1 as generic usage registers	-
`BufferBoundsChecking`	Setting this to 1 (true) enables buffer bounds checking	-
`DebugInfoEnforceAmd64EM`	Enforces elf file with the debug infomation to have eMachine set to AMD64	-
`DebugInfoValidation`	Enable optional (strict) checks to detect debug information inconsistencies	-
`EnableRelocations`	Setting this to 1 (true) makes IGC emit relocatable ELF with debug info	Available
`EnableTestSplitI64`	Test legalization that split i64 store unnecessarily, to be deleted once test is done[temp]	Available
`EnableWriteOldFPToStack`	Setting this to 1 (true) writes the caller frame's frame-pointer to the start of callee's frame on stack, to support stack walk	-
`ExtraOCLInternalOptions`	Extra internal options for OpenCL	Available
`ExtraOCLOptions`	Extra options for OpenCL	Available
`ForceAssignRhysicalReg`	Force assigning dclId to phyiscal reg.	Available
`ForceSpillVariables`	comma-separated string, each provide the declare id of variable which will be spilled	Available
`InitializeAddressRegistersBeforeUse`	Setting this to 1 (true) initializes address register to 0 before each use	-
`InitializeRegistersEnable`	Setting this to 1/true initializes all GRFs, Flag and address registers to 0 at the beginning of the shader	-
`InitializeUndefValueEnable`	Setting this to 1/true initializes all undefs in URB payload to 0	-
`MetricsDumpEnable`	Dump IGC Metrics to file *.optrpt in current working directory. Setting to 0 - disabled, 1 - makes in binary format, 2 - makes in plain-text format.	Available
`MinimumValidAddress`	If it's greater than 0, it enables minimal valid address checking where the threshold is the given value (in hex).	-
`NoCatchAllDebugLine`	Don't emit special placeholder instruction to map VISA orphan instructions	-
`PrintDebugSettings`	Prints all non-default debug settings	-
`ShaderDumpTranslationOnly`	Dump LLVM IR right after translation from SPIRV to stderr and ignore all passes	-
`StackOverflowDetection`	Inserts checks for stack overflow when stack calls are used.	Available
`UseMTInLLD`	Use multi-threading when linking multiple elf files	Available
`UseVISAVarNames`	Make VISA generate names for virtual variables so they match with dbg file	Available
`UseVMaskPredicate`	Use VMask as predicate for subspan usage	-
`UseVMaskPredicateForIndirectMove`	Use VMask as predicate for subspan usage (indirect mov only)	Available
`UseVMaskPredicateForLoads`	Use VMask as predicate for subspan usage (loads only)	Available
`ZeBinCompatibleDebugging`	Setting this to 1 (true) enables embed debug info in zeBinary	Available
`deadLoopForFloatException`	enable a dead loop if float exception happened	-

IGC Features

Flag	Description	Release builds
`AdvCodeMotionControl`	Control bits to fine-tune advanced code motion	-
`AdvRuntimeUnrollCount`	Advanced runtime unroll count	-
`AllowedSpillRegCount`	Max allowed spill size without recompile	-
`CSSpillThreshold2xGRFRetry`	Spill Threshold for CS to trigger 2xGRFRetry	-
`CSSpillThresholdNoSLM`	Spill Threshold for CS SIMD16 without SLM	-
`CSSpillThresholdSLM`	Spill Threshold for CS SIMD16 with SLM	-
`CheckCSSLMLimit`	Check SLM or threads limit on compute shader to turn on Enable2xGRF on DG2+ 0 - off, 1 - SLM limit heuristic, 2 - platform based heuristic (XE2 - threads limit, others - SLM limit)	-
`DPEmuNeedI64Emu`	Double Emulation needs I64 emulation. Unsetting it to disable I64 Emulation for testing.	-
`DisableCorrectlyRoundedMacros`	Tmp flag to disable correcly rounded macros for BMG+. This flag will be removed in the future.	-
`DisableDSDualPatch`	Setting it to true with enable Single and Dual Patch dispatch mode for Domain Shader	-
`DisableEarlyOutPatterns`	Disable optimization trying to create an early out after sampleC messages	-
`DisableGPGPUIndirectPayload`	Disable OCL indirect GPGPU payload	-
`DisableLSCForTypedUAV`	Forces legacy HDC messages for typed UAV read/write. Temporary knob for XE2 bringup.	Available
`DisableLSCSIMD32TGMMessages`	Forces splitting SIMD32 typed messages into 2xSIMD16. Only valid on XE2+.	Available
`DisableMemOpt`	Disable MemOpt, merging load/store	Available
`DisableMemOpt2`	Disable MemOpt2	-
`DisableMergeStore`	[temp]If EnableLdStCombine is on, disable mergestore (memopt) if this is set. Temp key for testing	Available
`DisablePrefetchToL1Cache`	Disable prefetch to L1 cache	Available
`DisablePromoteToDirectAS`	This key disables the PromoteResourceToDirectAS pass	-
`DisableRecompilation`	Disable recompilation, skip retry stage	Available
`DisableScalarAtomics`	Disable the Scalar Atomics optimization	-
`DisableSystemMemoryCachingInGPUForConstantBuffers`	Disables caching system memory in GPU for loads from constant buffers	-
`DisableWaSampleLZ`	Disable The Sample Lz workaround and generate Sample LZ	-
`DivergentBarrierUniformLoad`	Optimize loads for spill/fill generated by DivergentBarrier with uniform analysis	Available
`Enable16BitLDMCS`	Enable 16-bit ld_mcs on supported platforms	Available
`Enable2xGRF`	Enable 2x GRF for high SLM or high threads usage 0 - off, 1 - on, 2 - platform default	-
`Enable64BitEmulation`	Enable 64-bit emulation	-
`Enable64BitEmulationOnSelectedPlatform`	Enable 64-bit emulation on selected platforms	-
`EnableAIParameterCombiningWithLODBias`	Enable AI parameter combining With LOD Bias parameter. XeHP	Available
`EnableAdvCodeMotion`	Enable advanced code motion	-
`EnableAdvMemOpt`	Enable advanced memory optimization	-
`EnableAdvRuntimeUnroll`	Enable advanced runtime unroll	-
`EnableCPSMSAAOMaskWA`	Enable WA which forces rt writes to happen at pixel rate when cps, msaa, and omask are present.	Available
`EnableCPSOmaskWA`	Enable workaround for oMask with CPS	-
`EnableConstIntDivReduction`	Enables strength reduction on integer division/remainder with constant divisors/moduli	Available
`EnableDG2LSCSIMD8WA`	Enables WA for DG2 LSC simd8 d32-v8/d64-v3/d64-v4. [temp, should be replaced with WA id	-
`EnableDPEmulation`	Enforce double precision floating point operations emulation on platforms that do not support it natively	Available
`EnableDivergentBarrierWA`	Generate continuation code to handle shaders that places barriers in divergent control flow	-
`EnableDualSIMD8`	enable dual SIMD8 on supported platforms	Available
`EnableExplicitCopyForByVal`	Enable generating an explicit copy (alloca + memcpy) in a caller for aggregate argumentes with byval attribute	Available
`EnableFallbackToBindless`	This key enables fallback to bindless mode on all shaders	-
`EnableFallbackToStateless`	This key enables fallback to stateless mode on all shaders	-
`EnableFunctionPointer`	Enables support for function pointers and indirect calls	-
`EnableGASResolver`	Enable GAS Resolver	-
`EnableGEPSimplification`	Enable GEP simplification	Available
`EnableGen11TwoStackTSG`	Enable Two stack TSG gen11 feature	-
`EnableGlobalStateBuffer`	This key allows stack calls to read implicit args from side buffer. It also emits a relocatable add in VISA.	Available
`EnableHFpacking`	Enable HF packing	-
`EnableHSSinglePatchDispatch`	Setting this to 1/true enables SIMD8 single-patch dispatch in HullShader. Default is either SIMD8 single patch/dual patch dispatch based on control point count	-
`EnableImplicitArgAsIntrinsic`	Use GenISAIntrinsic instructions for supported implicit args instead of passing them as function arguments	Available
`EnableIndirectCallOptimization`	Enables inlining indirect calls by comparing function addresses	-
`EnableInsertingPairedResourcePointer`	Enable to insert a bindless paired resource address into sampler headers in context of sampling feedback resources	Available
`EnableIntDivRemCombine`	Given div/rem pairs with same operands merged; replace rem with mul+sub on quotient; 0x3 (set bit[1]) forces this on constant power of two divisors as well	Available
`EnableL3FlushForGlobal`	Enable/disable flushing L3 cache for globals	-
`EnableLSC`	Enables the new dataport encoding for LSC messages.	Available
`EnableLdStCombine`	Enable load/store combine pass if set to 1 (lsc message only) or 2; bit 3 = 1 [tmp for testing] : enabled load combine (intend to replace memopt)	Available
`EnableLowerGPCallArg`	Enable pass to lower generic pointers in function arguments	-
`EnableLscSamplerRouting`	Enables conversion of LD to LD_L instructions.	-
`EnableMadLoopSlice`	Enables the slicing of mad loops.	Available
`EnableMaxWGSizeCalculation`	Enable max work group size calculation [OCL only]	Available
`EnableMeshSLMCache`	Enables caching Mesh shader outputs in SLM, bitmask: bit0 - cache AND flush mode, enable caching of Primitive Count and Primitive Indices, bit1 - cache AND flush mode, enable caching of per-vertex outputs, bit2 - cache AND flush mode, enable caching of per-primitive outputs, bit3 - mirror mode, if this bit is set bits 0, 1 and 2 are ignored, enable caching of outputs that are read in the shader data is only mirrored in SLM	Available
`EnableMeshShaderSimdSize`	Set allowed simd sizes for mesh shader compilation, bitmask bit0 - simd8, bit1 - simd16, bit2 - simd32, e.g. 0x7 enables all simd sizes and 0x2 enables only simd16, valid values are from 0 to 7 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size, ignored if ForceMeshShaderSimdSize is set	Available
`EnableOCLSIMD16`	Enable OCL SIMD16 mode	Available
`EnableOCLSIMD32`	Enable OCL SIMD32 mode	Available
`EnableOCLScratchPrivateMemory`	Enable the use of scratch space for private memory [OCL only]	Available
`EnablePartialEmuI64`	Enable the partial I64 emulation for PVC-B, Xe2	Available
`EnablePostCullPatchFIFOHP`	Enable Post-Cull Patch Decoupling FIFO. XeHP.	Available
`EnablePostCullPatchFIFOLP`	Enable Post-Cull Patch Decoupling FIFO. GEN12LP.	Available
`EnablePreRARematFlag`	Enable PreRA Rematerialization of Flag	-
`EnablePromotionToSampleMlod`	Enables promotion of sample and sample_c to sample_mlod and sample_c_mlod instructions when min lod is present	-
`EnableReadGTPinInput`	Enables setting GTPin context flags by reading the input to the compiler adapters	-
`EnableRecursionOpenCL`	Enable recursion with OpenCL user functions	-
`EnableSIMD16ForNonWaveXe2`	Enable SIMD16 for Xe2 if the shader doesn't have wave	-
`EnableSIMD16ForXe2`	Enable SIMD16 for Xe2	-
`EnableSIMDVariantCompilation`	Enables compiling kernels in variant SIMD sizes	-
`EnableSMRescheduling`	Change instruction order to enable extra Sample Multiversioning cases	-
`EnableSampleBMLODWA`	Enable workaround for sample_b messages that use the mlod parameter	-
`EnableSampleDEmulation`	Enable emulation of sample_d.	Available
`EnableSampleDEmulationForTesting`	Enable emulation of sample_d on pre-XeHP platforms.	Available
`EnableSamplerSupport`	Enables sampler messages generation for PVC.	Available
`EnableScalarTypedAtomics`	Enable the Scalar Typed Atomics optimization	-
`EnableScratchMessageD64WA`	Enables WA to legalize D64 scratch messages to D32	-
`EnableSelectiveScalarizer`	enable selective scalarizer on GPGPU path	Available
`EnableSingleVertexDispatch`	Vertex Shader Single Patch Dispatch Regkey	-
`EnableTaskShaderSimdSize`	Set allowed simd sizes for task shader compilation, bitmask bit0 - simd8, bit1 - simd16, bit2 - simd32, e.g. 0x7 enables all simd sizes and 0x2 enables only simd16, valid values are from 0 to 7 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size, ignored if ForceMeshShaderSimdSize is set	Available
`EnableTileYForExperiments`	Enable TileY heuristics for experiments	-
`EnableTypeDemotion`	Enable Type Demotion	-
`Enable_Wa14010017096`	Enable Wa_14010017096 regardless of the platfrom stepping	Available
`Enable_Wa1507979211`	Enable Wa_1507979211 regardless of the platfrom stepping	Available
`Enable_Wa1807084924`	Enable Wa_1807084924 regardless of the platfrom stepping	Available
`Enable_Wa22010487853`	Enable Wa_22010487853 regardless of the platfrom stepping	Available
`Enable_Wa22010493955`	Enable Wa_22010493955 regardless of the platfrom stepping	Available
`Force32BitIntDivRemEmu`	Force 32-bit Int Div/Rem emulation using fp64, ignored if no native fp64 support	Available
`Force32BitIntDivRemEmuSP`	Force 32-bit Int Div/Rem emulation using fp32, ignored if Force32BitIntDivRemEmu is set and actually used	Available
`ForceDPEmulation`	Force double emulation for testing purpose	-
`ForceFFIDOverwrite`	Force overwriting ffid in sr0.0	-
`ForceFormatConversionDG2Plus`	Forces SW image format conversion for R10G10B10A2_UNORM, R11G11B10_FLOAT, R10G10B10A2_UINT image formats on DG2+ platforms	Available
`ForceI64DivRemEmu`	Forces specific int64 div/rem emulation: 0 = platform default, 1 = int based, 2 = SP based, 3 = DP based	-
`ForceMeshShaderSimdSize`	Force mesh shader simd size, valid values are 0 (not set), 8, 16 and 32 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size	Available
`ForceNoLSC`	Disables the new dataport encoding for LSC messages.	Available
`ForceOCLSIMDWidth`	Force using SIMD width specified. 0 : no forcing. This overrides driver forced SIMD value(if any) and runtime behaviour could be different if driver expects something fixed	Available
`ForcePrefetchToL1Cache`	Forces standard builtin prefetch to use L1 cache	Available
`ForceSPDivEmulation`	Force SP Div emulation for testing purpose	-
`ForceStaticToDynamic`	Force write of vertex count in GS	-
`ForceTaskShaderSimdSize`	Force task shader simd size, valid values are 0 (not set), 8, 16 and 32 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size	Available
`ForceXYZworkGroupWalkOrder`	Force X/Y/Z WorkGroup walk order	Available
`HoistPSConstBufferValues`	Hoists up down converts for contant buffer accesses, so they an be vectorized more easily.	-
`LICMStatThreshold`	LICM stat threshold to avoid retry SIMD16 for CS	-
`LateInlineUnmaskedFunc`	Postpone inlining of Unmasked functions till end of CG to avoid code movement inside/outside of unmasked region	-
`LscForceSpillNonStackcall`	Non-stack call kernels that spill will use LSC on DG2+	Available
`LscImmOffsMatch`	Match address patterns that have an immediate offset for the vISA LSC API (0 means off/no matching, 1 means on/match for supported platforms (Xe2+) and APIs, 2 means force on for all platforms (vISA will emulate the addition if HW lacks support) and APIs; also see LscImmOffsVisaOpts	Available
`LscImmOffsVisaOpts`	This maps to vISA_lscEnableImmOffsFor (enables/disables immediate offsets for various address types; see that option for semantics)	Available
`MaxLiveOutThreshold`	Max LiveOut Threshold in MemOpt2	-
`MaxLoadVectorSizeInBytes`	[LdStCombine] the max non-uniform vector size for the coalesced load. 0: compiler choice (default, 16(4DW)); others: 4/8/16/32	Available
`MaxStoreVectorSizeInBytes`	[LdStCombine] the max non-uniform vector size for the coalesced store. 0: compiler choice (default, 16(4DW)); others: 4/8/16/32	Available
`MemOptGEPCanon`	[test] GEP canonicalization in MemOpt. 0 : enable; 1: disable; 2: disable only for OCL;	Available
`OCLEnableReassociate`	Enable reassociation	Available
`OCLSIMD16SelectionMask`	Select SIMD 16 heuristics. Valid values are 0, 1, 2 and 3	-
`OverrideDeviceIdForWA`	Enable this to override DeviceId	-
`OverrideProductFamilyForWA`	Enable this to override the product family, get the correct enum from igfxfmid.h	-
`OverrideRevIdForWA`	Enable this to override the stepping/RevId, default is a0 = 0, b0 = 1, c0 = 2, so on...	-
`RemoveLegacyOCLStatelessPrivateMemoryCases`	Remove cases where OCL uses stateless private memory. XeHP and above only! [OCL only]	Available
`SampleMultiversioning`	Create branches aroung samplers which can be redundant with some values	-
`SelectiveLoopUnrollForDPEmu`	Setting this to 0/false disable selective loop unrolling for DP emu.	Available
`SendMultipleSIMDModesCS`	Send multiple SIMD modes for CS	-
`SkipPsSimdWithDualSimd`	Setting it to values def in igc.h will force SIMD mode to skip if the dual-SIMD8 kernel exists	Available
`TestGEPSimplification`	[Test] Testing GEP simplification without actually lowering GEP. Used in lit test	-
`UniformMemOpt4OW`	increase uniform memory optimization from 2 owords to 4 owords	Available
`allowLICM`	Enable LICM in IGC.	Available

Performance experiments

Flag	Description	Release builds
`AddNoInlineToTrimmedFunctions`	Tell late passes not to inline trimmed functions	-
`AllocaRAPressureThreshold`	The threshold for the register pressure potential	-
`AllocateZeroInitializedVarsInBss`	Allocate zero initialized global variables in .bss section in ZEBinary	Available
`AllowNonLoopConstantPromotion`	Allows promotion for constants not in loop (e.g. used once)	-
`AllowStackCallRetry`	Enable/Disable retry when stack function spill. 0 - Don't allow, 1 - Allow retry on kernel group, 2 - Allow retry per function	-
`BlockFrequencySampling`	Use block frequencies to derive a distribution	Available
`ByPassAllocaSizeHeuristic`	Force some Alloca to pass the pressure heuristic until the given size	Available
`CodePatch`	Enable Pixel Shader code patching to directly emit code after stitching	-
`CodePatchExperiments`	Experiment with code patching when != 0	-
`CodePatchFilter`	Filter out unsupported patterns	-
`CodePatchLimit`	Debug CodePatch via limiting the number of shader been patched	-
`ConstantPromotionCmpSelSize`	Array size threshold for cmp-sel transform	-
`ConstantPromotionSize`	Threshold in number of GRFs	-
`ControlInlineImplicitArgs`	Avoid trimming functions with implicit args	Available
`ControlInlineTinySize`	Tiny function size for controlling kernel total size	Available
`ControlInlineTinySizeForSPGT`	Tiny function size for controlling kernel total size	Available
`ControlKernelTotalSize`	Control kernel total size	Available
`ControlUnitSize`	Control compilation unit size by unit trimming	Available
`DelayEmuInt64AddLimit`	Delay emulating Int64 Add operations in vISA	-
`DetectCastToGAS`	Check if the module contains local/private to GAS (Gerneric Address Space) cast, it also check internal flags	Available
`DiableWaSamplerNoMask`	Disable WA DiableWaSamplerNoMask	-
`DisableAddingAlwaysAttribute`	Disable adding always attribute	Available
`DisableCSContentCheck`	Disable CS content check that can force SIMD32	Available
`DisableDualBlendSource`	Force the compiler to never use dual blend source messages	-
`DisableFDIV`	Disable fdiv support	-
`DisableFastMathConstantHandling`	Disable Fast Math Constant Handling	Available
`DisableFastRAWA`	Disable Fast RA for hanging issues on large workloads	-
`DisableFastestGopt`	Disable global optimizations for stage 1 shaders.	-
`DisableFastestLinearScan`	Disable LinearScanRA in FastestSIMD.	-
`DisableUndefAlphaOutputAsRed`	Disable output red for undefined alpha output	-
`DisableWaDisableSIMD16On3SrcInstr`	Disable C0 WA WaDisableSIMD16On3SrcInstr, may be unsafe	-
`DisableWaSendSEnableIndirectMsgDesc`	Disable a C0 WA WaSendSEnableIndirectMsgDesc, may be unsafe	-
`DisbleLocalFences`	On CNL+ we need to emit local fences. Setting this to true removes those. It may be functionaly not correct.	-
`DispatchAlongY_XY_ratio`	min threshold for thread group size x / y for dispatchAlongY	-
`DispatchAlongY_X_threshold`	min threshold for thread group size x for dispatchAlongY	-
`DispatchGPGPUWalkerAlongYFirst`	0 = No SW Y-walk, 1 = Dispatch GPGPU walker along Y first	-
`DownConvertI32Sampler`	Convert i32 sampler messages to return i16. This optimization can only be enabled for resources with 16bit integer format or if it is known that the upper 16bits of data is always 0.	-
`DumpRegPressureEstimate`	Dump RegPressureEstimate to a file	-
`DumpRegPressureEstimateFilter`	Only dump RegPressureEstimate for functions matching the given regex	-
`EmitPreDefinedForAllFunctions`	When enabled, pre-defined variables for gid, grid, lid are emitted for all functions. This causes those functions to be inlined even when stack calls is enabled.	Available
`EmulateFDIV`	Emulate fdiv instructions	-
`EmulationFunctionControl`	FunctionControl on some DP emulation functions. It has the same value as FunctionControl.	Available
`EnableA64WA`	Guarantee A64 load/store addres-hi is uniform	Available
`EnableAccSub`	Enable accumulator substitution	-
`EnableByValStructArgPromotion`	If enabled, byval/sret struct arguments are promoted to pass-by-value if possible.	Available
`EnableConstantPromotion`	Enable global constant data to register promotion	-
`EnableDisableMidThreadPreemptionOpt`	Disable mid thread preemption	-
`EnableEvaluateSamplerSplit`	Split evaluate messages to sampler into either SIMD8 or SIMD1 messages	-
`EnableExtractMask`	When enabled, it is mostly for reducing response size of send messages.	-
`EnableFastestSingleCSSIMD`	Enable selecting single CS SIMD in staged compilation.	-
`EnableForceGroupSize`	Enable forcing thread Group Size ForceGroupSizeX and ForceGroupSizeY	-
`EnableForceThreadCombining`	Enable forcing Thread Combining with thread Group Size ForceGroupSizeX and ForceGroupSizeY	-
`EnableFunctionCloningControl`	If enabled, limits function cloning by converting stackcalls to indirect calls based on the FunctionCloningThreshold value.	Available
`EnableGPUFenceScopeOnSingleTileGPUs`	Allow the use of `GPU` fence scope on single-tile GPUs. By default the `TILE` scope is used instead of `GPU` scope on single-tile GPUs.	Available
`EnableGSURBEntryPadding`	Enable padding of GS URB Entry by adding extra portions of Control Data Header.	-
`EnableGSVtxCountMsgHalfCLSize`	Enable the Vertex Count msg of half CL size, instead of 1DW size.	-
`EnableGather4cpoWA`	Enable WA transforming gather4cpo/gather4po into gather4c/gather4	-
`EnableGreedyTrimming`	Find the optimal set of functions to trim	Available
`EnableHalfPromotion`	Enable pass that replaces instructions using halfs with corresponding float counterparts for pre-SKL	-
`EnableInsertElementScalarCoalescing`	Enable coalescing on the scalar operand of insertelement	-
`EnableIntelFast`	Enable intel fast, experimental flag.	-
`EnableLTO`	Enable link time optimization	-
`EnableLTODebug`	Enable debug information for LTO	Available
`EnableLeafCollapsing`	Collapse leaf functions in order to avoid trimming small leaf functions	Available
`EnableLocalIdCalculationInShader`	Enables calcualtion of local thread IDs in shader. Valid only in compute shaders on XeHP+. IDs are calculated only if HW generated IDs cannot be used.	Available
`EnableMixIntOperands`	Enable generating mix-sized operands for int ALU	-
`EnableOptReportPrivateMemoryToSLM`	[POC] Generate opt report file for moving private memory allocations to SLM.	-
`EnablePreRAAccSchedAndSub`	Enable accumulator substitution	-
`EnablePrivMemNewSOATranspose`	0 : disable new algo; 1 and up : enable new algo. 1 : enable new algo just for array of struct; 2 : 1 plus new algo for array of dw[xn]/qw[xn],etc 3 : 2 plus new algo for array of complicated struct.	Available
`EnableProgrammableOffsetsMessageBitInHeader`	Use pre-delta feature (legacy) method of passing MSB of PO messages opcode.	-
`EnableReusingLSCStoreConstPayload`	Enable reusing LSC stores const payload	-
`EnableReusingXYZWStoreConstPayload`	Enable reusing XYZW stores const payload	-
`EnableSOAPromotionDisablingHeuristic`	Enable heuristic to disable SOA promotion when it may be not beneficial	-
`EnableSamplerSplit`	Split Sampler 3d message to odd and even	-
`EnableSizeContributionOptimization`	Put more weight on a function when the potential size contirubion is big	Available
`EnableStackCallFuncCall`	If enabled, the default function call mode will be set to stack call. Otherwise, subroutine call is used.	-
`EnableTCSHWBarriers`	Enable TCS pass with HW barriers support. Default TCS pass is TCS pass with multiple continuation functions.	-
`EnableTEFactorsClear`	Enable clearing of tessellation factors.	-
`EnableTEFactorsPadding`	Enable padding of the TE factors.	-
`EnableThreadCombiningWithNoSLM`	Enable thread combining opt for shader without SLM	-
`EnableTrackPtr`	Track Staging Context alloc/dealloc	-
`EnableVariableAlias`	Enable variable aliases (part of VariableReuse Pass, but separate functionality)	-
`EnableVariableReuse`	Enable local variable reuse	-
`EnableVector8LoadStore`	Enable Vectorizer to generate 8x32i and 4x64i loads and stores	Available
`ExcludeIRFromZEBinary`	Exclude IR sections from ZE binary	Available
`ExpandedUnitSizeThreshold`	Trimming target of compilation unit size	Available
`ExtraRetrySIMD16`	Enable extra simd16 with retry for STAGE1_BEST_PREF	-
`FastCompileRA`	Provide the fast compilatoin path for RA, fail safe at first iteration	-
`FastSpill`	fast spill code gen. This may produce worse equality code for the spilling shader	-
`FastestS1Experiments`	Select configs for fastest compilation by bits.	-
`FirstStagedSIMD`	Force Pixel shader to be 1: FastSIMD (SIMD8), 2: BestSIMD (SIMD16 or SIMD8), 3: FatestSIMD (SIMD8 opt off)	-
`ForceAddingStackcallKernelPrerequisites`	Force adding static overhead for stackcall to the kernel entry such as HWTID instructions for experiments	Available
`ForceAllPrivateMemoryToSLM`	[POC] Force moving all private memory allocations to SLM.	-
`ForceBestSIMD`	Force pixel shader to return the best SIMD, either SIMD16 or SIMD8.	-
`ForceDisableSrc0Alpha`	Force the compiler to skip sending src0 alpha. Only works if we are sure alpha to coverage and alpha test is off	-
`ForceFastestSIMD`	Force PS, CS, VS to return lowest possible SIMD as fast as possible.	-
`ForceFastestSingleCSSIMD`	Force selecting single CS SIMD in staged compilation on unsupported platforms.	-
`ForceGroupSizeShaderHash`	Shader hash for forcing thread group size or thread combining (lower 8 hex digits)	-
`ForceGroupSizeX`	force group size along X	-
`ForceGroupSizeY`	force group size along Y	-
`ForceHalfPromotion`	Force enable pass that replaces instructions using halfs with corresponding float counterparts	-
`ForceInlineExternalFunctions`	not to trim functions called from multiple kernels	Available
`ForceInlineStackCallWithImplArg`	If enabled, stack calls that uses implicit args will be force inlined.	Available
`ForceLowestSIMDForStackCalls`	If enabled, compile to the lowest allowed SIMD mode when stack calls or indirect calls are present	Available
`ForceMCFBarriers`	Force TCS pass with MCF (SW) barriers support. Default TCS pass is TCS pass with multiple continuation functions.	-
`ForceMixMode`	force enable mix mode even on platforms that do not support it	-
`ForceNoFP64bRegioning`	force regioning rules for FP and 64b FPU instructions	-
`ForceNoInfiniteLoops`	Limit # of loop iterations to UINT_MAX in while/for loops. Can be used to detect infinite loops in shaders	-
`ForceNonCoherentStatelessBTI`	Enable gneeration of non cache coherent stateless messages	-
`ForcePixelShaderSIMDMode`	Setting it to values def in igc.h will force SIMD mode compilation for pixel shaders. Note that only SIMD8 is compiled unless other ForcePixelShaderSIMD* are also selected. 1-SIMD8, 2-SIMD16,4-SIMD32	-
`ForcePrivateMemoryToGlobalOnGeneric`	Force moving private memory allocations to global buffer when generic pointer is present	Available
`ForcePrivateMemoryToSLMOnBuffers`	[POC] Force moving private memory allocations to SLM, semicolon-separated list of buffers.	-
`ForceSWCoalescingOfAtomicCounter`	Force software coalescing of atomic counter	-
`ForceScratchSpaceSize`	Override Scratch Space Size in bytes for perf testing	-
`ForceSendsSupportOnSKLA0`	Allow sends on SKL A0, may be unsafe	-
`FunctionCloningThreshold`	Limits the number of cloned functions when called from multiple function groups. If number of cloned functions exceeds the threshold, compile the function only once and use address relocation instead. Setting this to '0' allows IGC to choose the default threshold.	Available
`FunctionControl`	Control function inlining/subroutine/stackcall. See value defs in igc_flags.hpp.	Available
`FuseResourceLoop`	Enable fusing resource loops	-
`FuseTypedWrite`	Enable fusing of simd8 typed write	-
`HPCFastCompilation`	Force to do fast compilation for HPC kernel	-
`HPCGlobalInstNumThreshold`	The threshold for the register pressure potential	-
`HPCInstNumThreshold`	The threshold for the register pressure potential	-
`HasDoubleAcc`	has doubled accumulators	-
`HybridRAWithSpill`	Did Hybrid RA with Spill	-
`InlinedEmulationThreshold`	Inlined instruction threshold for enabling subroutines	-
`JointMatrixLoadStoreOpt`	Selects subgroup (0), or block read/write (1), or optimized block read/write (2), 2d block read/write (3) implementation of Joint Matrix Load/Store built-ins	Available
`KernelTotalSizeThreshold`	Trimming target of kernel total size	Available
`LTOForStage1Compilation`	LTO for stage 1 compilation	-
`LimitConstantBuffersPushed`	Limit max number of CBs pushed when SupportIndirectConstantBuffer is true	-
`MSAA16BitPayloadEnable`	Enable support for MSAA 16 bit payload , a hardware DCN supporting this from ICL+ to improve perf on MSAA workloads	-
`MemCpyLoweringUnrollThreshold`	Min number of mem instructions that require non-unrolled loop when lowering memcpy	-
`MemOptWindowSize`	Size of the window in unit of instructions in which load/stores are allowed to be coalesced. Keep it limited in order to avoid creating long liveranges. Default value is 150	-
`MetricForKernelSizeReduction`	Set 1 to active a normal distribution, 2 a long-tail distribution, and 4 an average%	Available
`MidThreadPreemptionDisableThreshold`	Threshold to disable mid thread preemption	-
`NewSOATransposeForOpenCL`	If true, EnablePrivMemNewSOATranspose only applies to OpenCL kernels. For testing purpose	Available
`NumGeneralAcc`	set the number [1-8] of general acc for accumulator substitution. 0 means using the platform-default value	-
`OCLInlineThreshold`	Setting OCL inline thershold	Available
`OverrideCsTileLayout`	Override compute walker tile layout. False is linear. True is TileY	Available
`OverrideCsTileLayoutEnable`	Enable overriding compute walker tile layout	Available
`OverrideCsWalkOrder`	Override compute walker walk order	Available
`OverrideCsWalkOrderEnable`	Enable overriding compute walker walk order	Available
`OverrideOCLMaxParamSize`	Override the value imposed on the kernel by CL_DEVICE_MAX_PARAMETER_SIZE. Value in bytes, if value==0 no override happens.	Available
`ParameterForColdFuncThreshold`	C/10-STD for a normal distribution / low K% for a long-tail distribution	Available
`PartitionUnit`	Partition compilation unit	Available
`PartitionWithFastHybridRA`	Enable FastRA and HybridRA when partition is enabled	Available
`PixelShaderDoNotAbortOnSpill`	Do not abort on a spill	-
`PrintControlKernelTotalSize`	Print Control kernel total size	Available
`PrintControlUnitSize`	Print information about unit trimming	Available
`PrintFunctionSizeAnalysis`	Print analysis data of function sizes	Available
`PrintPartitionUnit`	Print information about compilation unit partitioning	Available
`PrintStackCallDebugInfo`	Print all debug info to command line related to stack call debugging	Available
`PrintStaticProfileGuidedKernelSizeReduction`	Print information about static profile-guided trimming and partitioning	Available
`PrintStaticProfileGuidedSpillCostAnalysis`	Print debug messages for profile embedding	Available
`RegPressureVerbocity`	Different printing types	-
`RematAddrSpaceCastToUse`	Allow rematerialization of inttoptr that are used inside AddrSpaceCastInst	-
`RematAllowExtractElement`	Allow Extract Element to computation chain	-
`RematAllowLoads`	Remat allow to move loads, no checks, exclusively for testing purposes	-
`RematAllowOneUseLoad`	Remat allow to move loads that have one use and it's inside the chain	-
`RematCallsOperand`	Allow rematerialization of inttoptr that are used as call's operand	-
`RematChainLimit`	If number of instructions we've collected is more than this value, we bail on it	-
`RematEnable`	Enable clone adress arithmetic pass not only on retry	-
`RematFlowThreshold`	Proportion of the whole rematerialization targets to cutoff remat chain	-
`RematInstCombineBefore`	Enable short sequence of passes before clone address arithmetic pass to potentially decrese amount of operations that will be rematerialized	-
`RematLog`	Dump Remat Log, usefull for analyzing spills as well	-
`RematRPELimit`	Cutoff value for register estimator, lower than that, kernel won't be rematted	-
`RematReassocBefore`	Enable short sequence of passes before clone address arithmetic pass to potentially decrese amount of operations that will be rematerialized	-
`RematRespectUniformity`	Cutoff computation chain on uniform values	-
`RematSameBBScope`	Confine rematerialization only to variables within the same BB, we won't pull down values from predeccors	-
`RequestStage2`	Enable staged compilation via requesting stage 2	-
`RetryRevertExcessiveSpillingKernelCoefficient`	Sets the coefficient for Retry Manager to know whether we should revert back to a previously compiled kernel	-
`RetryRevertExcessiveSpillingKernelThreshold`	Sets the threshold for Retry Manager to know which kernel is considered as Excessive Spilling and applies different set of rules	-
`SSOShifter`	Adjust ScratchSurfaceOffset with shl(hwtid, shifter). 0 menas disabling padding	-
`SaveRestoreIR`	Save/Restore IR for staged compilation to avoid duplicated compilations	-
`SelectiveFastRA`	Apply fast RA with spills selectively using heuristics	Available
`SelectiveFunctionControl`	Selectively enables FunctionControl for a list of line-separated function names in 'FunctionDebug.txt' in the IGC output dir. When set by this flag, the functions in the FunctionDebug list will override the default FunctionControl mode. 0 - Disable, 1 - Enable and read from FunctionDebug.txt, 2 - Print all callable functions to FunctionDebug.txt See comments in ProcessFuncAttributes.cpp for how to use this flag.	Available
`SelectiveTrimming`	Choose a specific function to trim	Available
`SkipPaddingScratchSpaceSize`	Skip adding padding when estimated scratch space size is smaller than or equal to this value	-
`SkipTREarlyExitCheck`	Skip SIMD16 early exit check in ShaderCodeGen	-
`SkipTrimmingOneCopyFunction`	Don't trim a function whose size contribution is no more than its size	Available
`StagedCompilationExperiments`	Experiment with staged compilation when != 0	-
`StaticProfileGuidedPartitioning`	Enable static analysis in the partitioning algorithm.	Available
`StaticProfileGuidedSpillCostAnalysis`	Use static profile information to estimate spill cost, 1 for profile generation, 2 for profile transfer, 4 for profile embedding, 8 for spill computation, and 16 for enabling frequency-based spill selection	Available
`StaticProfileGuidedSpillCostAnalysisFunc`	Spill cost function where 0 is based on a new spill cost and 1 the existing one	Available
`StaticProfileGuidedSpillCostAnalysisScale`	Scale adjustment for static profile guided spill cost analysis	Available
`StaticProfileGuidedTrimming`	Enable static analysis in the kernel trimming	Available
`StripDebugInfo`	Strip debug info from llvm IR lowered from input to IGC . Possible values: 0 - dont strip, 1 - strip all, 2 - strip non-line info	-
`SubroutineInlinerThreshold`	Subroutine inliner threshold	-
`SubroutineThreshold`	Minimal kernel size to enable subroutines	-
`UnitSizeThreshold`	Compilation unit size threshold	Available
`UpConvertF16Sampler`	up-convert fp16 sampler message to return fp32	-
`UseFrequencyInfoForSPGT`	Consider frequency information for trimming functions	Available
`UseOldSubRoutineAugIntf`	Use the old subroutine augmentation code which is slower	-
`VFPackingDisablePartialElements`	disable packing for partial vertex element as it causes performance drops	-
`VariableReuseByteSize`	The byte size threshold for variable reuse	-
`VectorAlias`	Vector aliasing control under EnableVariableAlias. Some features are still experimental	Available
`VectorAliasBBThreshold`	Max number of BBs of a function that VectorAlias will apply. VectorAlias will skip for funtions beyond this threshold	Available
`cl_khr_srgb_image_writes`	Enable cl_khr_srgb_image_writes extension	-
`disableRemat`	disable re-materialization	-
`disableUnormTypedReadWA`	disable software conversion for UNORM surface in Dx10	-
`disableVarSplit`	disable variable splitting	-
`forceGlobalRA`	force global register allocator	-
`forceSamplerHeader`	force sampler messages to use header	-
`samplerHeaderWA`	enable sampler header to solve HW WA	-

Generating precompiled headers

Flag	Description	Release builds
`ApplyConservativeRastWAHeader`	Apply WaConservativeRasterization for the platforms enabled	-

Raytracing Options

Flag	Description	Release builds
`ContinuationInlineThreshold`	If number of continuations is greater than threshold, default to indirect	Available
`DeferCollectionStateObjectCompilation`	Wait to compile till the RTPSO stage	Available
`DisableCanonizationWA`	WA for A0 to inject shifts to canonize global and local pointers	Available
`DisableCompactifySpills`	Just emit spill/fill at the point of def/use	Available
`DisableCrossFillRemat`	Rematerialize values if they use already spilled values	Available
`DisableDPSE`	Disable Dead PayloadStore Elimination.	Available
`DisableEarlyRemat`	Disable quick remats to avoid some spills	Available
`DisableEntryFences`	Don't emit the evict and invalidate fences for A0 WA	-
`DisableExamineRayFlag`	Don't do IPO to see if we can fold control flow given knowledge of possible rayflag values	-
`DisableFuseContinuations`	If set, we will look for small duplicated continuations to merge into one.	Available
`DisableInvalidateRTStackAfterLastRead`	Disables L1 cache invalidation after the last read of the RT stack. Affects rayqueries only	Available
`DisableInvariantLoad`	Disabled !invariant_load metadata for raytracing shaders	Available
`DisableLSCControlsForRayTracing`	Disable different LSC Controls for HW and SW portions of the RTStack	Available
`DisableLateRemat`	Disable quick remats to avoid some spills	Available
`DisableMatchRegisterRegion`	Disable matching for debug purposes	Available
`DisablePayloadSinking`	sink stores to payload into inlined continuations	Available
`DisablePreSplitOpts`	Disable last minute optimizations befoer shader splitting	Available
`DisablePredicatedStackIDRelease`	Emit a single stack ID release at the end of the shader	Available
`DisablePrepareLoadsStores`	Disable preparation for MemOpt	Available
`DisableProceedBasedApproachForRayQueryDynamicRayManagementMechanism`	Disables proceed based approach for dynamic ray management mechanism	Available
`DisablePromoteContinuation`	BTD-able continuations in the raygen may be moved to the shader identifier	-
`DisablePromoteToScratch`	Use scratch space rather than SWStack when possible.	Available
`DisableRTAliasAnalysis`	Disable Raytracing Alias Analysis	-
`DisableRTBindlessAccess`	do bindful rather than bindless accesses to raytracing memory	Available
`DisableRTFenceElision`	Disable optimization to remove unneeded fences	-
`DisableRTGlobalsKnownValues`	load MaxBVHLevels from RTGlobals rather than assumming = 2	Available
`DisableRTMemDSE`	Analyze stores to SWStack, etc. that aren't read before Stack ID Release	-
`DisableRTRetryPickBetter`	Disables raytracing retry to pick the best compilation instead of always using the retry compilation.	-
`DisableRTStackOpts`	Disable some optimizations that minimize reads/writes to the RTStack	Available
`DisableRayQueryDynamicRayManagementMechanism`	Dynamic ray management mechanism for Synchronous Ray Tracing	Available
`DisableRayQueryDynamicRayManagementMechanismForBarriers`	Disable dynamic ray management mechanism for shaders with barriers	Available
`DisableRayQueryDynamicRayManagementMechanismForExternalFunctionsCalls`	Disable dynamic ray management mechanism for shaders with external functions calls	Available
`DisableRayTracingConstantCoalescing`	Disable coalescing	Available
`DisableRayTracingOptimizations`	Disable RayTracing Optimizations for debugging	Available
`DisableRaytracingIntrinsicAttributes`	Turn off noalias and dereferenceable attributes	Available
`DisableSWStackOffsetElision`	Avoid loading offseting when known at compile-time	-
`DisableShaderFusion`	Don't check for duplicate, renamed shaders	-
`DisableSpillReorder`	Disables reordering of spills to try to minmize spills in a loop	-
`DisableStatefulRTStackAccess`	do stateless rather than stateful accesses to the HW portion of the async stack	Available
`DisableStatefulRTSyncStackAccess`	do stateless rather than stateful accesses to the HW portion of the sync stack	Available
`DisableStatefulRTSyncStackAccess4RTShader`	do stateless rather than stateful accesses to the HW portion of the sync stack. RT Shader only.	Available
`DisableStatefulRTSyncStackAccess4nonRTShader`	do stateless rather than stateful accesses to the HW portion of the sync stack. nonRT Shader only.	Available
`DisableStatefulSWHotZoneAccess`	do stateless rather than stateful accesses to the SW HotZone	Available
`DisableStatefulSWStackAccess`	do stateless rather than stateful accesses to the SW Stack	Available
`DisableWideTraceRay`	Disable SIMD16 style message payloads for send.rta	Available
`EnableCompressedRayIndices`	Use an alternate form with bit twiddling to pack stack pointer and indices into two DWORDs	Available
`EnableFillScheduling`	Schedule fills for reduced register pressure	-
`EnableHoistRemat`	Hoist rematerialized instructions to shader entry. Longer live ranges but common values fused.	Available
`EnableIndirectContinuations`	Enable BTD for continuation shaders (regardless of inline threshold).	Available
`EnableInlinedContinuations`	Forcibly inline all continuations	Available
`EnableKnownBTIBase`	For testing, assume that we know what baseBTI is in RTGlobals	Available
`EnableLSCCacheOptimization`	Optimize store instructions for utilizing the LSC-L1 cache	-
`EnableOuterLoopHoistingForRayQueryDynamicRayManagementMechanism`	Disable dynamic ray management mechanism for shaders with barriers	Available
`EnableRQHideLatency`	Hide RayQuery Proceed latency.	-
`EnableRTDispatchAlongY`	Dispatch Compute Walker along Y first	Available
`EnableRTPrintf`	Enable printf for ray tracing.	Available
`EnableRayTracingTGMFence`	Enable tgm fence in RT workloads for debugging	-
`EnableSingleRQMemRayStore`	Store RayQuery MemRay[TOP] only once.	-
`EnableStackIDReleaseScheduling`	Schedule Stack ID Release messages prior to the end of the shader	-
`EnableSyncDispatchRays`	Enable sync DispatchRays implementation	-
`ForceCSLeastSIMD4RQ`	Force computer shader with RayQuery to the lowest allowed SIMD mode	-
`ForceCSSimdSize4RQ`	Force RayQuery compute shader simd size, valid values are 0 (not set), 8, 16 and 32 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size	Available
`ForceFirstFencesEvict`	Force evict fence op on fences prior to the stack ID release	Available
`ForceGenMemDefaultCacheCtrl`	If enabled, no message specific cache ctrls are set on memory outside of RTStack, SWStack, and SWHotZone	Available
`ForceGenMemLoadCacheCtrl`	Enables GenMemLoadCacheCtrl regkey for custom lsc load cache controls in other memory	Available
`ForceGenMemStoreCacheCtrl`	Enables GenMemStoreCacheCtrl regkey for custom lsc store cache controls in other memory	Available
`ForceIndirectCallsInSyncDispatchRays`	Will skip direct calls in synchronous raytracing and immediately call raytracing shaders via KSP shader ptr	-
`ForceInliningTraceRayCallsInSyncDispatchRays`	Will inline calls to __TraceRay, __Invoke and __TraceRaySyncToAsyncAdapter even when indirect calls are not necessary	-
`ForceNullBVH`	Swap BVH with null pointer. Infinitely fast ray traversal.	Available
`ForceRTCheckInstanceLeafPtr`	Check MemHit::valid before loading GeometryIndex, PrimitiveIndex, etc.	Available
`ForceRTCheckInstanceLeafPtrMask`	Test only. 1: committedindex; 2: potentialindex	Available
`ForceRTConstantBufferCacheCtrl`	Enables RTConstantBufferCacheCtrl regkey for custom lsc load cache controls for constant buffers	Available
`ForceRTRetry`	Raytracing is compiled in the second retry state	-
`ForceRTShortCircuitingOR`	Only for specific test.... Short curcite OR condition if CommittedGeometryIndex is used	Available
`ForceRTStackLoadCacheCtrl`	Enables RTStackLoadCacheCtrl regkey for custom lsc load cache controls in the RTStack	Available
`ForceRTStackStoreCacheCtrl`	Enables RTStackStoreCacheCtrl regkey for custom lsc store cache controls in the RTStack	Available
`ForceSWHotZoneLoadCacheCtrl`	Enables SWHotZoneLoadCacheCtrl regkey for custom lsc load cache controls in the SWHotZone	Available
`ForceSWHotZoneStoreCacheCtrl`	Enables SWHotZoneStoreCacheCtrl regkey for custom lsc store cache controls in the SWHotZone	Available
`ForceSWStackLoadCacheCtrl`	Enables SWStackLoadCacheCtrl regkey for custom lsc load cache controls in the SWStack	Available
`ForceSWStackStoreCacheCtrl`	Enables SWStackStoreCacheCtrl regkey for custom lsc store cache controls in the SWStack	Available
`ForceWholeProgramCompile`	Compile as if we know all of the shaders upfront	Available
`KnownBTIBaseValue`	If EnableKnownBTIBase is set, use this value for baseBTI	Available
`OverrideTMax`	Force TMax to the given value. When 0, do nothing.	-
`PrintfBufferSize`	Set printf buffer size. Unit: KB.	Available
`RTFenceToggle`	Toggle fences	Available
`RTInValidDefaultIndex`	If MemHit::valid is false, the default value to return for some intrinsics like GeometryIndex or PrimitiveIndex etc.	Available
`RayTracingConstantCoalescingMinBlockSize`	Set the minimum load size in # OWords = [1,2,4,8,16].	Available
`RayTracingCustomTileXDim1D`	X dimension of tile (default: 256)	Available
`RayTracingCustomTileXDim2D`	X dimension of tile (default: 32)	Available
`RayTracingCustomTileYDim1D`	Y dimension of tile (default: 1)	Available
`RayTracingCustomTileYDim2D`	Y dimension of tile (default: 4 for XE, 32 for XE2+)	Available
`RayTracingDumpYaml`	Dump yaml input/output files	Available
`RayTracingKeepUDivRemWA`	Workaround till jitIsa supports cr0 for rtz conversions	Available
`RematThreshold`	Tunes how aggresively we should remat values into continuations	Available
`RetryRTPickBetterThreshold`	Only pick the retry shader if the spill cost of the 2nd compilation is at least this percentage better than the previous compilation	-
`RetryRTSpillCostThreshold`	Only retry if the percentage of spills (over total instructions) is more than this value	-
`RetryRTSpillMemThreshold`	Only retry if spill mem used is more than this value	-
`ShaderFusionThrehold`	If there are less shaders than this, don't spend time checking duplicates	-
`TotalGRFNum4RQ`	Total GRF used for register allocation for RayQuery only. Test only. Delete later.	-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configuration_flags.md

configuration_flags.md

Configuration flags for Linux Release

Overview

Important notice

How to enable a flag

VISA optimization

IGC Optimization

Shader debugging

Shader dumping

Debugging features

IGC Features

Performance experiments

Generating precompiled headers

Raytracing Options

Files

configuration_flags.md

Latest commit

History

configuration_flags.md

File metadata and controls

Configuration flags for Linux Release

Overview

Important notice

How to enable a flag

VISA optimization

IGC Optimization

Shader debugging

Shader dumping

Debugging features

IGC Features

Performance experiments

Generating precompiled headers

Raytracing Options