Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump version to 2.xx #2301

Merged
merged 8 commits into from
Jan 18, 2024
Merged

bump version to 2.xx #2301

merged 8 commits into from
Jan 18, 2024

Conversation

Mddct
Copy link
Collaborator

@Mddct Mddct commented Jan 15, 2024

#2298

NOTE: 未来会支持FSDP,model parallel 前置条件为省升级到torch2.xx

  • sox works in windows and mac
  • make_shard and related files
  • unit test
    • test read shatd
    • test read raw
  • reproduce exp
    • raw
    • shard

@Mddct Mddct force-pushed the Mddct-torch-2xx branch 14 times, most recently from dd16999 to ba598fb Compare January 15, 2024 12:48
@Mddct
Copy link
Collaborator Author

Mddct commented Jan 15, 2024

QA

  • 为什么要用miniconda 的action?
    sox的动态库libsox, torchaudio2.xx 会显示open filed
    使用
    conda install  conda-forge::sox # 解决so版本问题

update:
ubuntu 系统也可以

apt-get install libsox-dev

@Mddct Mddct force-pushed the Mddct-torch-2xx branch 4 times, most recently from 0eebb0f to bc5ebce Compare January 15, 2024 15:57
@xingchensong
Copy link
Member

xingchensong commented Jan 16, 2024

加个读取不同格式数据的unittest会更好点,test_load_raw.py test_load_shard.py etc, 我用shard的时候就遇到过这个错误 (python3.10, torch2.1.2+cu118)
https://stackoverflow.com/questions/71617570/pytorchstreamreader-failed-reading-zip-archive-failed-finding-central-directory

解决方法是:

https://stackoverflow.com/questions/59155883/how-to-stream-files-from-tarfile-for-reading

tarfile.open(fileobj=sample['stream'], mode="r|*")

-->

tarfile.open(fileobj=sample['stream'], mode="r:*")

周哥这边如果能复现这个错误,可以顺带改下

@Mddct
Copy link
Collaborator Author

Mddct commented Jan 16, 2024

tarfile.open(fileobj=sample['stream'], mode="r|*")

写ut的时候遇到这个错:
Screenshot 2024-01-16 at 21 07 37

修改为

tarfile.open(fileobj=sample['stream'], mode="r:*")

ut 通过

@Mddct Mddct force-pushed the Mddct-torch-2xx branch 3 times, most recently from c3d4ed5 to 816e72a Compare January 17, 2024 02:20
@xingchensong
Copy link
Member

shard模式训练验证:

train_transformer.yaml 训练120epoch结果:
0f6c1893802da56a2e0d3773a33d644

和 240epcoh结果相比差不多
d2c84b429bab1aaf7868741bf15047d

@Mddct
Copy link
Collaborator Author

Mddct commented Jan 18, 2024

raw 模式
greedy search
Screenshot 2024-01-18 at 12 10 18

attention
Screenshot 2024-01-18 at 12 10 55

shard模式训练验证:

train_transformer.yaml 训练120epoch结果: 0f6c1893802da56a2e0d3773a33d644

和 240epcoh结果相比差不多 d2c84b429bab1aaf7868741bf15047d

@Mddct Mddct changed the title [WIP] bump version to 2.xx bump version to 2.xx Jan 18, 2024
@Mddct Mddct merged commit f71e80b into main Jan 18, 2024
6 checks passed
@Mddct Mddct deleted the Mddct-torch-2xx branch January 18, 2024 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants