Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionals may include inappropriate parallel inames #724

Closed
kaushikcfd opened this issue Jan 4, 2023 · 2 comments · Fixed by #777
Closed

Conditionals may include inappropriate parallel inames #724

kaushikcfd opened this issue Jan 4, 2023 · 2 comments · Fixed by #777

Comments

@kaushikcfd
Copy link
Collaborator

Consider the fairly straightforward transformation:

import loopy as lp
import numpy as np
import pyopencl as cl


t_unit = lp.make_kernel(
    "{[i,j,k]: 0<=i,j<72 and 0<=k<32}",
    """
    C[i,j] = sum(k, A[i,k] * B[k,j])
    """,
    [lp.GlobalArg("A,B", dtype=np.float64, shape=lp.auto),
     ...],
    lang_version=(2018, 2),
)
ref_t_unit = t_unit

Tx = 8
Ty = 23
Tk = 11

t_unit = lp.split_iname(t_unit, "i", Tx, inner_tag="l.0", outer_tag="g.0")
t_unit = lp.split_iname(t_unit, "j", Ty, inner_tag="l.1", outer_tag="g.1")
t_unit = lp.split_iname(t_unit, "k", Tk)
t_unit = lp.add_prefetch(
    t_unit, "A",
    sweep_inames=["i_inner", "k_inner"],
    temporary_address_space=lp.AddressSpace.LOCAL,
    fetch_outer_inames=frozenset({"i_outer", "j_outer", "k_outer"}),
    dim_arg_names=["iprftch_A", "kprftch_A"],
    default_tag=None,
)

t_unit = lp.add_prefetch(
    t_unit, "B",
    sweep_inames=["k_inner", "j_inner"],
    temporary_address_space=lp.AddressSpace.LOCAL,
    fetch_outer_inames=frozenset({"i_outer", "j_outer", "k_outer"}),
    dim_arg_names=["kprftch_B", "jprftch_B"],
    default_tag=None,
)

t_unit = lp.split_iname(t_unit, "kprftch_A", Tx, inner_tag="l.0")
t_unit = lp.split_iname(t_unit, "iprftch_A", Ty, inner_tag="l.1")
t_unit = lp.split_iname(t_unit, "jprftch_B", Tx, inner_tag="l.0")
t_unit = lp.split_iname(t_unit, "kprftch_B", Ty, inner_tag="l.1")

ctx = cl.create_some_context()
lp.auto_test_vs_ref(ref_t_unit, ctx, t_unit)

fails with

Traceback (most recent call last):
  File "/home/line/temp/loopy_mwe_for_multi_loops.py", line 49, in <module>
    lp.auto_test_vs_ref(ref_t_unit, ctx, t_unit)
  File "/home/line/projects/pytato_env/src/loopy/loopy/auto_test.py", line 602, in auto_test_vs_ref
    raise AutomaticTestFailure(error)
loopy.diagnostic.AutomaticTestFailure: results do not match -- (rel) l_2 err: 5.35702e-05, l_inf err: 1

I'm unsure whether the transformation is incorrect or loopy's code-generator is to be blamed.

@kaushikcfd
Copy link
Collaborator Author

After looking at this for a bit, this indeed looks like a loopy bug. The issue is that the domain contains multiple loops tagged with l.0, l.1 and that leads to incorrect loo- bound calculation. For the above example, loop-bounds for prefetching A should not depend on g.1, but the generated code disagrees.

I attempted to decouple this interference by introducing a transformation called decouple domains but even that does not work as CodeGenerationState.implemented_domains does not distinguish between non-interfering local hardware inames.

Conclusion: This transformation cannot be implemented in current-loopy, but the good thing is that these details are in the undefined realm of Loopy, so we just need to patch those definitions while accounting for such use-cases.

@inducer
Copy link
Owner

inducer commented May 16, 2023

Thanks for finding this issue! I'm pretty surprised this hasn't bitten us sooner.

@inducer inducer changed the title [not verified] DGEMM transformation violates memory dependencies Conditionals may include inappropriate parallel inames May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants