Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backends: DX9: add programmable rendering pipeline implementation #3874

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Demonese
Copy link
Contributor

@Demonese Demonese commented Mar 4, 2021

Before

The old pull request #3844 is too messy, so I create a new pr to discuss.

I think this pull request won't be merged for a long time:

  • The master branch is under active development, but I may not be able to merge changes in the master branch in time
  • Direct3D9 is a 20-year-old API set, is very outdated, is used very rarely now
  • The performance improvement is not obvious (see below)

Thanks to @ocornut, bring us such an excellent GUI library 👍.

Why?

So why did I put so much effort into writing this implementation?

  • Current implementation need to convert the vertices. Yes, we can #define IMGUI_OVERRIDE_DRAWVERT_STRUCT_LAYOUT struct { ImVec2 pos; float z; ImVec2 uv; ImU32 col; };, but what about z? Can we ignore it whatever it is? Ok, we will also need to setup our custom MemAlloc and MemFree, but this is not very elegant.
  • My legecy project is using d3d9 programmable rendering pipeline (they still running on older devices).
  • Many online tutorials, blogs only touch fixed rendering pipeline of d3d9, or still using abandoned D3DX Effect library (in DirectX SDK). I share this implementation as a example to help others get rid of the legacy D3DX Effect library.

Features

  • Faster vertices copy without convertion (if IMGUI_USE_BGRA_PACKED_COLOR defined), and smaller vertex size result in faster upload
  • It can automatically enable support for programmable rendering pipelines without more consideration. (device compatible)
  • I didn't modify imgui_impl_dx9.h file. (API compatible)
  • It can compiled under C++98. (same as current implementation, no build issue)
  • It does not require other libraries (same as current implementation, only extra links to d3d9.lib, no MSVC link issue)

Test

Test code:

  • imconfig.h
#define IMGUI_USE_BGRA_PACKED_COLOR
  • imgui_impl_dx9.cpp
#include <stdio.h>
#include <Windows.h>
LARGE_INTEGER freq, t1, t2;
int timer = 0;
double times = 0.0f;

// Render function.
void ImGui_ImplDX9_RenderDrawData(ImDrawData* draw_data)
{
    ::QueryPerformanceCounter(&t1);

    ...

    ::QueryPerformanceCounter(&t2);
    times += (double)(t2.QuadPart - t1.QuadPart) * 1000000.0 / (double)freq.QuadPart;
    timer += 1;
    if ((timer % 60) == 0)
    {
        static char buffer[32];
        snprintf(buffer, 32, "%.3f\n", times / 60.0);
        OutputDebugStringA(buffer);
        times = 0.0f;
    }
}

bool ImGui_ImplDX9_Init(IDirect3DDevice9* device)
{
    ::QueryPerformanceFrequency(&freq);
    
    ...
}
  • main.cpp
...

ImDrawList* renderer = ImGui::GetBackgroundDrawList();
for (float x = 0.0f; x < 2048.0f; x += 8.0f)
{
    for (float y = 0.0f; y < 2048.0f; y += 8.0f)
    {
        renderer->AddRectFilled(ImVec2(x, y), ImVec2(x + 4.0f, y + 4.0f), IM_COL32(255, 255, 255, 128), 2.0f, 64);
    }
}

...

Build on Release config, running result:

  • Fixed rendering pipeline
    31
  • Programmable rendering pipeline
    32

Submitting the contents != rendering. Present is the best way to guarantee everything has been fully processed.

For those who need this...

You are free to merge this pr into your own fork. This implementation is already working on our old devices, although I didn't do a whole lot of testing.
But, anyway, I don't guarantee it's bug-free. If you have any questions, feel free to leave a comment.

@ocornut
Copy link
Owner

ocornut commented Mar 4, 2021 via email

@Demonese
Copy link
Contributor Author

Demonese commented Mar 4, 2021

I am very surprised the fps is so low. Are you running on a very old computer? I would expect to get 1000+ fps on most machines. Printing to system console is slow you can remove this before checking the framerate.

65536 white rects & Intel integrated graphics card (awesome), cause low fps.
Our project using ImDrawList even more heavily, so I have to pay attention to the performance penalty of vertex conversions.

This is the normal FPS:

image

@Demonese Demonese force-pushed the imgui_impl_d3d9_shader branch 2 times, most recently from e662ed4 to 7716489 Compare March 11, 2021 14:24
@Demonese Demonese force-pushed the imgui_impl_d3d9_shader branch 3 times, most recently from dfecb73 to 7229327 Compare March 19, 2021 01:54
@ocornut ocornut force-pushed the master branch 2 times, most recently from 0c1e5bd to bb6a60b Compare August 27, 2021 19:10
@Demonese
Copy link
Contributor Author

Reorganized the code.
Now the newly inserted code is easier to recognize.

However, some duplicate codes have also been generated, and I am still thinking about how to solve it.

Copy link
Owner

@ocornut ocornut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

As you intuited, I'm not yet sure this can be merged because it seems rather esoteric, but I am glad this exist and I would encourage to further tweak it. The suggestions I posted here are to facilitate the likehood of merging.

@@ -46,6 +46,11 @@ struct ImGui_ImplDX9_Data
int VertexBufferSize;
int IndexBufferSize;

IDirect3DVertexDeclaration9* pInputLayout;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting to add a comment at the top of those 4 fields, e.g. // Support for Shader Pipeline mode (rare).
Move the bool at the top of this 4 variables block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the variable position and added comments. It may be surprising that the number of devices supporting shader model 2 is very large, so I didn't use (rare), this should be acceptable.

backends/imgui_impl_dx9.cpp Show resolved Hide resolved
// HLSL source code
/*
float4x4 mvp : register(c0);

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest compacting the source code a little, remove empty lines, compact structures into one-liner.
The reasoning is to make this "optional" feature of DX9 backend not takes too much visibility in the source file, making it more likely to be mergeable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I deleted redundant empty lines

CODE(43,54,41,42),
CODE(1C,00,00,00),
CODE(4B,00,00,00),
CODE(00,02,FE,FF),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest removing the macros and using raw 0xXX data, with 16-bytes per lines. This will reduce the line count by 4 and remove a subtle layer of complexity.

CODE(00,02,FE,FF),
CODE(FE,FF,1E,00),
CODE(43,54,41,42),
CODE(1C,00,00,00),

into

0x00, 0x02, 0xFE, 0xFF, 0xFE, 0xFF, 0x1E, 0x00, 0x43, 0x54, 0x41, 0x42, 0x1C, 0x00, 0x00, 0x00,

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a comment with an explicit command-line to turn the shader source into binary data.
I realize that MAYBE the output of that compiler is to output CODE(XX,XX,XX,XX) but as it is unlikely to be rebuild (our other shaders have stayed stables for many years) this is mostly as reference.

Copy link
Contributor Author

@Demonese Demonese Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally decided to put them on the same line, and do not use hex, this helps reduce the number of source code characters (hex requires at least 3 characters, while decimal requires at least only 1)


IDirect3DDevice9* ctx = bd->pd3dDevice;

// Setup orthographic projection matrix
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this function call regular ImGui_ImplDX9_SetRenderState() and then add the extra stuff afterwise?

@@ -143,6 +577,9 @@ static void ImGui_ImplDX9_SetupRenderState(ImDrawData* draw_data)
// Render function.
void ImGui_ImplDX9_RenderDrawData(ImDrawData* draw_data)
{
if (ImGui_ImplDX9_Shader_RenderDrawData(draw_data))
return;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bd->IsShaderPipeline test can be performed here, and the sub-function asserts on that value. Makes the flow more obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really better. I have modified it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants