Skip to content

Conversation

@adammoody
Copy link
Contributor

@adammoody adammoody commented Aug 18, 2021

@stas00 and @thomasw21 , I introduced a bug while computing the pointers list with the switch to use numpy. This fixes that.

Before submitting that previous PR, I remember double checking the resulting index files with cmp. I must have mistakenly checked two files that were both produced with the incorrect code.

So any mmap files produced since merging that last PR will have bad data. Sorry about that!

Copy link
Member

@thomasw21 thomasw21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I missed that too!

@thomasw21 thomasw21 merged commit 191a96b into bigscience-workshop:main Aug 18, 2021
@thomasw21
Copy link
Member

thomasw21 commented Aug 18, 2021

Can we add a test for stability of what you're doing with data processing? Typically we can run a preprocess from the official repo on a very small file, and test that our version outputs the same thing. See next comment

@thomasw21
Copy link
Member

I've added a test in order to maintain stability with microsoft/Megatron-DeepSpeed:

I checked that commit previous to this PR fails the test, and following that PR checks.

Maybe we want to have a smaller dataset than openwebtext. I'm currently using this as it was the easiest to work with. cc @stas00

@adammoody
Copy link
Contributor Author

Thank you, @thomasw21 !

@adammoody adammoody deleted the pointerfix branch August 18, 2021 15:42
thomasw21 referenced this pull request Aug 20, 2021
 - Add a test to stability compared to official repo.
adammoody pushed a commit to adammoody/Megatron-DeepSpeed that referenced this pull request Oct 27, 2022
* Enable Megatron-LM workload on ROCm (#1)

* Enable Megatron workload on ROCm

* Added ds_pretrain_gpt_350M_dense_pipeclean.sh

* removed a file

* Removed an extra line

* Fix to resolve the below rsqrtf() error on ROCm

/root/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip:298:10: error: no matching function for call to 'rsqrtf'
  return rsqrtf(v);
         ^~~~~~
/opt/rocm-5.2.0/llvm/lib/clang/14.0.0/include/__clang_hip_math.h:521:7: note: candidate function not viable: call to __device__ function from __host__ function
float rsqrtf(float __x) { return __ocml_rsqrt_f32(__x); }
      ^

* Simplified code

* Simplified the code

* Removed extra spaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants