Adding semaphore to lower memory usage #1

thomasw21 · 2021-07-15T12:19:52Z

This is linked to the problem of increasing memory usage when preprocessing dataset. I believe the issue is that the imap is much faster than the single thread write. Consequently the memory usage is going up. In this PR, we suggest to use a global semaphore, that limits the number of samples stored in memory, ie we wait for the consumer to process X amount of samples before allowing the generator to generate more samples.

I was not able to test this (I don't have access to any dataset), maybe @TevenLeScao you can try?

TevenLeScao · 2021-07-15T12:26:47Z

Trying it now

thomasw21 · 2021-07-15T16:15:27Z

imap_unordered is necessary, otherwise you might end up in deadlock (imap requires a specific element to finish due to ordering). I've added an assert that should guarantee no deadlock.

thomasw21 · 2021-07-16T00:25:15Z

Checkout #3 instead, as I was trying to merge on the wrong branch.

* ICT zeroshot evaluation code * made more generic, aligned with other tasks * Fixed based on review recoemmendation * fixed another issue * implementing DPR * implementation dpr * adding dpr code * removed commnets * removed commnets * removed commnets * DPR evaluation debugging * DPR ongoing * DPR finetune and evaluation * fixing model evaluation of retriver * added pre ad post process * added pre ad post process * evaluation works! * debugging DPR * fix copy-n-paste error remove erroneous arg. * Typo fix in readme * t5 fixes * before cleaning the comments * vit pipeline fixes * cleaning the code * additional cleaning * renaming the folders * Add temporary assert to finetuning until it can be fixed. * Fixed issues with ICT pretraining * updated the evaluation script for retriver * updated the evaluation script for retriver * updated the evaluation script for retriver * updated the evaluation script for retriver * added exit interval for finetuning * updating the scripts * updating no load rng * updating script * Update T5 scripts * resolved hang issue * fixed the tensor size miss-mass issue * fixed the evaluation hangs * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Clean up README.md a bit * addressed comments * updated readme * updated readme * updated readme * updated readme * Basic handling of prefix lm by updating the mask * Add prefix option to gpt temporarily and prevent it to use custom kernel * Add argument for prefix lm, in order to configure masking strategy * Woops * loss_on_targets_only flag, assert that current prefix implementation only works with reset_attention_mask set to True and attempt to fix empty slice issue * Format * Reverse renaming * Allow prefix on partial document at the end * WIP: add prefix per row feature * Document the use of None * Woops * Handle empty document better * We might not be able to concat empty tensors * Handle empty tensor seperately * Debug * Test * Add loss masking as script argument * Turns out deepspeed integration of attention matrices prevented dynamic masks * Add more asserts * Prefix can only see the prefix, it cannot see target * Remove prefix-lm argument as we split the pretrain script * Iz PR review * Make masking row dependent when using prefix * Revert "Merge remote-tracking branch 'origin/master' into prefix_lm" This reverts commit d49d6e5, reversing changes made to 28a712d. * Tests (#1) * WIP: test * Still trying to figure out deepspeed * WIP * Test test * Test how to setup deepspeed in unit tests * Test something else * Empty strings might be problematic * Remove unecessary arguments * Woops * Remove global variables at the end of each test and init deepspeed * Woops * Maybe adding classmethod * Woops * Add debug print to check that tear down happends * Reset global variables before * Let's test this * Try something else * WIP * More fix * More fix * More stuff to fix * We really want to compare vectors and not coordinates * Reformat * check something out * fix test * Remove prefix-lm flag as it's integrated * Woops * Add test for without reset attention mask * Fix test for non reset attention mask * Fix test * Update code for prefix lm Co-authored-by: Mostofa Patwary <mostofa.patwary@gmail.com> Co-authored-by: Mostofa Patwary <mpatwary@nvidia.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Devrim <46989091+devrimcavusoglu@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Vijay Korthikanti <vkorthikanti@nvidia.com> Co-authored-by: Jared Casper <jcasper@nvidia.com> Co-authored-by: Mohammad Shoeybi <mshoeybi@nvidia.com> Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>

* ICT zeroshot evaluation code * made more generic, aligned with other tasks * Fixed based on review recoemmendation * fixed another issue * implementing DPR * implementation dpr * adding dpr code * removed commnets * removed commnets * removed commnets * DPR evaluation debugging * DPR ongoing * DPR finetune and evaluation * fixing model evaluation of retriver * added pre ad post process * added pre ad post process * evaluation works! * debugging DPR * fix copy-n-paste error remove erroneous arg. * Typo fix in readme * t5 fixes * before cleaning the comments * vit pipeline fixes * cleaning the code * additional cleaning * renaming the folders * Add temporary assert to finetuning until it can be fixed. * Fixed issues with ICT pretraining * updated the evaluation script for retriver * updated the evaluation script for retriver * updated the evaluation script for retriver * updated the evaluation script for retriver * added exit interval for finetuning * updating the scripts * updating no load rng * updating script * Update T5 scripts * resolved hang issue * fixed the tensor size miss-mass issue * fixed the evaluation hangs * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Adding readme * Clean up README.md a bit * addressed comments * updated readme * updated readme * updated readme * updated readme * Basic handling of prefix lm by updating the mask * Add prefix option to gpt temporarily and prevent it to use custom kernel * Add argument for prefix lm, in order to configure masking strategy * Woops * loss_on_targets_only flag, assert that current prefix implementation only works with reset_attention_mask set to True and attempt to fix empty slice issue * Format * Reverse renaming * Allow prefix on partial document at the end * WIP: add prefix per row feature * Document the use of None * Woops * Handle empty document better * We might not be able to concat empty tensors * Handle empty tensor seperately * Debug * Test * Add loss masking as script argument * Turns out deepspeed integration of attention matrices prevented dynamic masks * Add more asserts * Prefix can only see the prefix, it cannot see target * Remove prefix-lm argument as we split the pretrain script * Iz PR review * Make masking row dependent when using prefix * Revert "Merge remote-tracking branch 'origin/master' into prefix_lm" This reverts commit d49d6e5, reversing changes made to 28a712d. * Tests (bigscience-workshop#1) * WIP: test * Still trying to figure out deepspeed * WIP * Test test * Test how to setup deepspeed in unit tests * Test something else * Empty strings might be problematic * Remove unecessary arguments * Woops * Remove global variables at the end of each test and init deepspeed * Woops * Maybe adding classmethod * Woops * Add debug print to check that tear down happends * Reset global variables before * Let's test this * Try something else * WIP * More fix * More fix * More stuff to fix * We really want to compare vectors and not coordinates * Reformat * check something out * fix test * Remove prefix-lm flag as it's integrated * Woops * Add test for without reset attention mask * Fix test for non reset attention mask * Fix test * Update code for prefix lm Co-authored-by: Mostofa Patwary <mostofa.patwary@gmail.com> Co-authored-by: Mostofa Patwary <mpatwary@nvidia.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Devrim <46989091+devrimcavusoglu@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Vijay Korthikanti <vkorthikanti@nvidia.com> Co-authored-by: Jared Casper <jcasper@nvidia.com> Co-authored-by: Mohammad Shoeybi <mshoeybi@nvidia.com> Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>

Add LUMI eval compat

thomasw21 mentioned this pull request Jul 15, 2021

WIP: Preprocessing using child process to write instead of aggregating values in main process #3

Closed

thomasw21 closed this Jul 16, 2021

thomasw21 force-pushed the main branch from 4de0380 to 5502865 Compare July 16, 2021 00:20

thomasw21 mentioned this pull request Jul 16, 2021

Adding semaphore to lower memory usage #1 #4

Closed

Muennighoff pushed a commit that referenced this pull request Nov 23, 2022

Merge pull request #1 from bigscience-workshop/lumi_eval

fdd57c4

Add LUMI eval compat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding semaphore to lower memory usage #1

Adding semaphore to lower memory usage #1

Uh oh!

thomasw21 commented Jul 15, 2021 •

edited

Loading

Uh oh!

TevenLeScao commented Jul 15, 2021

Uh oh!

thomasw21 commented Jul 15, 2021

Uh oh!

thomasw21 commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adding semaphore to lower memory usage #1

Adding semaphore to lower memory usage #1

Uh oh!

Conversation

thomasw21 commented Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TevenLeScao commented Jul 15, 2021

Uh oh!

thomasw21 commented Jul 15, 2021

Uh oh!

thomasw21 commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thomasw21 commented Jul 15, 2021 •

edited

Loading