-
Notifications
You must be signed in to change notification settings - Fork 305
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Minor optimizations to reduce CPU overheads in modules
enhancement
New feature or request
#1191
opened Sep 18, 2024 by
timmoon10
Loading…
6 of 13 tasks
[PyTorch] Fix detection of 3 in 3hd/h3d layouts
#1187
opened Sep 16, 2024 by
cyanguwa
Loading…
8 of 13 tasks
Allow to pass architectures like 90a, without being overriden
#1178
opened Sep 11, 2024 by
aurianer
Loading…
[PyTorch] Miscellaneous fixes for FA3 FP8 attention
#1174
opened Sep 10, 2024 by
cyanguwa
Loading…
9 of 13 tasks
[PyTorch] Fused dbias-cast-transpose in bias operation
#1168
opened Sep 6, 2024 by
timmoon10
Loading…
7 of 13 tasks
[PyTorch/C] Exposed Userbuffers configuration option to control comm and compute stream priorities
enhancement
New feature or request
#1149
opened Aug 29, 2024 by
denera
Loading…
8 of 13 tasks
[PyTorch] Avoid saving fp8_tensors in certain scenarios
#1143
opened Aug 28, 2024 by
cyanguwa
Loading…
8 of 13 tasks
[PyTorch] Userbuffers support in operation-based API
#1142
opened Aug 27, 2024 by
timmoon10
Loading…
7 of 13 tasks
Don't save fp8 q/k/v/out tensors when using bf16 bprop
#1139
opened Aug 27, 2024 by
guyueh1
Loading…
13 tasks
Fix param input order for cudagraph
bug
Something isn't working
#1138
opened Aug 27, 2024 by
yifeis-nv
Loading…
4 of 13 tasks
Add high_precision_init_val to model params when using fp8_model_init
#1121
opened Aug 19, 2024 by
kunlunl
Loading…
8 of 13 tasks
[PyTorch] Debug CUDA graph support with operation-based API
bug
Something isn't working
#1117
opened Aug 16, 2024 by
timmoon10
Loading…
7 of 13 tasks
Draft: Support using fp16 master weights and fp16/fp8 optimizer states in FusedAdam
#1078
opened Aug 5, 2024 by
kunlunl
Loading…
7 of 13 tasks
[C/PyTorch] Userbuffers and comm+GEMM overlap algorithms refactored and moved to TE/common
enhancement
New feature or request
#1067
opened Jul 31, 2024 by
denera
Loading…
8 of 13 tasks
[PyTorch] Debug checkpointing with operation-based API
bug
Something isn't working
#1063
opened Jul 31, 2024 by
timmoon10
Loading…
8 of 13 tasks
Use pyproject.toml to specify build requirements
build
Build system
#1061
opened Jul 30, 2024 by
ksivaman
Loading…
6 of 13 tasks
[JAX] Support Ring Attention (Context Parallelism)
#1059
opened Jul 30, 2024 by
mingxu1067
•
Draft
1 of 13 tasks
[PyTorch] Normalization ops
enhancement
New feature or request
#1033
opened Jul 22, 2024 by
timmoon10
Loading…
8 of 13 tasks
Previous Next
ProTip!
Adding no:label will show everything without a label.