Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context parallelism #7739

Merged
merged 88 commits into from
Jan 10, 2024
Merged

context parallelism #7739

merged 88 commits into from
Jan 10, 2024

Commits on Jun 6, 2023

  1. make nemo recognize sequence_parallel_size

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    afce64e View commit details
    Browse the repository at this point in the history
  2. merge with main

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    3f98473 View commit details
    Browse the repository at this point in the history
  3. add helper functions to set up SP running in TE

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    e313000 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    52ff102 View commit details
    Browse the repository at this point in the history

Commits on Jun 8, 2023

  1. slice seq length for a specific rank

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    5580955 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    887c615 View commit details
    Browse the repository at this point in the history
  3. fix data_parallel_size calculation

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    ebd6323 View commit details
    Browse the repository at this point in the history
  4. minor change

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    58cca3d View commit details
    Browse the repository at this point in the history
  5. add missing argument of self

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    87f027a View commit details
    Browse the repository at this point in the history
  6. pass sp_global_ranks to TE transformer layer

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    9ebfcf7 View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2023

  1. fix nsys setting

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Jun 9, 2023
    Configuration menu
    Copy the full SHA
    728fd43 View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2023

  1. fix seq_len calculation

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 13, 2023
    Configuration menu
    Copy the full SHA
    66615e8 View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2023

  1. fix attn_mask split across seq-length dim

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 17, 2023
    Configuration menu
    Copy the full SHA
    e1f5eb7 View commit details
    Browse the repository at this point in the history
  2. code update of input split

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 17, 2023
    Configuration menu
    Copy the full SHA
    cf0c75c View commit details
    Browse the repository at this point in the history

Commits on Jun 21, 2023

  1. fix loss calculation

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 21, 2023
    Configuration menu
    Copy the full SHA
    b57e218 View commit details
    Browse the repository at this point in the history
  2. fix loss_mask_sum calculation

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 21, 2023
    Configuration menu
    Copy the full SHA
    69f4ae8 View commit details
    Browse the repository at this point in the history

Commits on Jun 22, 2023

  1. fix losss calculation

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 22, 2023
    Configuration menu
    Copy the full SHA
    a38dd9a View commit details
    Browse the repository at this point in the history
  2. merge with main

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 22, 2023
    Configuration menu
    Copy the full SHA
    b31e31f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8ac42f1 View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2023

  1. minor change

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 24, 2023
    Configuration menu
    Copy the full SHA
    f7c9b5b View commit details
    Browse the repository at this point in the history
  2. fix loss_mask_sum calculation

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Jun 24, 2023
    Configuration menu
    Copy the full SHA
    49b1052 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2023

  1. merge with main

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 1, 2023
    Configuration menu
    Copy the full SHA
    ae889fc View commit details
    Browse the repository at this point in the history

Commits on Aug 3, 2023

  1. Configuration menu
    Copy the full SHA
    2c43687 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    25bf369 View commit details
    Browse the repository at this point in the history
  3. slice position embedding for different CP rank

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 3, 2023
    Configuration menu
    Copy the full SHA
    61af551 View commit details
    Browse the repository at this point in the history
  4. fix mising property decorator

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 3, 2023
    Configuration menu
    Copy the full SHA
    dc8a540 View commit details
    Browse the repository at this point in the history
  5. typo fix

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 3, 2023
    Configuration menu
    Copy the full SHA
    46479c6 View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2023

  1. fix rpe_bias CP slicing

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 4, 2023
    Configuration menu
    Copy the full SHA
    b64b563 View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2023

  1. Configuration menu
    Copy the full SHA
    0362de6 View commit details
    Browse the repository at this point in the history
  2. code style fix

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 6, 2023
    Configuration menu
    Copy the full SHA
    e1654fb View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2023

  1. fix loss_mask_sum calculation

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 8, 2023
    Configuration menu
    Copy the full SHA
    4f0a3be View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c46b42e View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2023

  1. merge with main

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    b6db8f3 View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2023

  1. do not load attention mask if it's not needed

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Aug 22, 2023
    Configuration menu
    Copy the full SHA
    4076d06 View commit details
    Browse the repository at this point in the history
  2. cherry pick attention mask data loader skip

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Aug 22, 2023
    Configuration menu
    Copy the full SHA
    3353e13 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2023

  1. bug fix

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Aug 23, 2023
    Configuration menu
    Copy the full SHA
    433f6a7 View commit details
    Browse the repository at this point in the history

Commits on Aug 25, 2023

  1. Configuration menu
    Copy the full SHA
    c4592e8 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2023

  1. fix ubuf size with CP > 1

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Sep 6, 2023
    Configuration menu
    Copy the full SHA
    5efaa76 View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2023

  1. address naming confusion of mixed dp and cp

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    006677d View commit details
    Browse the repository at this point in the history
  2. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    d64b85d View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2023

  1. Configuration menu
    Copy the full SHA
    499f0d6 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2023

  1. Configuration menu
    Copy the full SHA
    693b8b7 View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2023

  1. Configuration menu
    Copy the full SHA
    0f7d079 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4dcfdb6 View commit details
    Browse the repository at this point in the history
  3. pop context_parallel from dist opt kwargs

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Oct 3, 2023
    Configuration menu
    Copy the full SHA
    3351953 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2023

  1. Configuration menu
    Copy the full SHA
    08f785b View commit details
    Browse the repository at this point in the history
  2. remove use_fp8 from initialize_model_parallel

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Oct 5, 2023
    Configuration menu
    Copy the full SHA
    e277b3d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a27155c View commit details
    Browse the repository at this point in the history

Commits on Oct 6, 2023

  1. make implementaitons of setup_transformer_engine_tp_groups and setup_…

    …transformer_engine_cp_running consistent
    
    Signed-off-by: xren <[email protected]>
    xrennvidia committed Oct 6, 2023
    Configuration menu
    Copy the full SHA
    dc65d34 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2023

  1. Configuration menu
    Copy the full SHA
    42a6b83 View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2023

  1. cp function renaming

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    5013189 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2023

  1. make loss logging broadcast aware of cp

    Signed-off-by: xren <[email protected]>
    xrennvidia committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    52dd50b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b61fa4e View commit details
    Browse the repository at this point in the history
  3. fix a typo

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 13, 2023
    Configuration menu
    Copy the full SHA
    52381e8 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    fb9cc3d View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2023

  1. Configuration menu
    Copy the full SHA
    1b92952 View commit details
    Browse the repository at this point in the history
  2. var name fix

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 14, 2023
    Configuration menu
    Copy the full SHA
    e394392 View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2023

  1. import transformer layer specs from MCore

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    f9bf0d8 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2023

  1. upgrade MCore version

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    1f8815f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f1bc1a7 View commit details
    Browse the repository at this point in the history
  3. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    a40b183 View commit details
    Browse the repository at this point in the history
  4. add add context_parallel into the kwargs of dist opt

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    d15ae17 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2023

  1. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    8ae9061 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6be25b9 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2023

  1. Configuration menu
    Copy the full SHA
    4cbdb0e View commit details
    Browse the repository at this point in the history
  2. remove redundant cp check

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    55b7e13 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    840103e View commit details
    Browse the repository at this point in the history
  4. code style fix

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    03b2922 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    14a589e View commit details
    Browse the repository at this point in the history
  6. recover docker file

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    7c5b9c1 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2023

  1. Configuration menu
    Copy the full SHA
    50d0385 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2023

  1. Configuration menu
    Copy the full SHA
    45002d4 View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2023

  1. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Nov 17, 2023
    Configuration menu
    Copy the full SHA
    319e659 View commit details
    Browse the repository at this point in the history

Commits on Nov 18, 2023

  1. Configuration menu
    Copy the full SHA
    071d234 View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2023

  1. Configuration menu
    Copy the full SHA
    baafb02 View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2023

  1. Configuration menu
    Copy the full SHA
    bf100fc View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2023

  1. fix seq_length of CP

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Nov 27, 2023
    Configuration menu
    Copy the full SHA
    2da819e View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2023

  1. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    cd7021a View commit details
    Browse the repository at this point in the history
  2. recover seq-length which has been fixed in mcore

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    22eeaf9 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2023

  1. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Dec 16, 2023
    Configuration menu
    Copy the full SHA
    b56ce02 View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2023

  1. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Dec 18, 2023
    Configuration menu
    Copy the full SHA
    3a29733 View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2023

  1. function name fix

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Dec 19, 2023
    Configuration menu
    Copy the full SHA
    5d25e67 View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2023

  1. Configuration menu
    Copy the full SHA
    ead55a0 View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2024

  1. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    3a36003 View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2024

  1. Configuration menu
    Copy the full SHA
    2d42b1c View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2024

  1. merge with main

    Signed-off-by: Xiaowei Ren <[email protected]>
    xrennvidia committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    f66a5aa View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2024

  1. Configuration menu
    Copy the full SHA
    5d464c9 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2024

  1. Configuration menu
    Copy the full SHA
    2c9c95e View commit details
    Browse the repository at this point in the history