{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":539057023,"defaultBranch":"main","name":"TransformerEngine","ownerLogin":"NVIDIA","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2022-09-20T15:20:26.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1728152?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1726598240.0","currentOid":""},"activityList":{"items":[{"before":"63fd8ac4ab924c96c4474ae83f3a8dc0efcd8456","after":"6a2109fd6f8922b7fb6e58d05577e024fe9adf97","ref":"refs/heads/release_v1.11","pushedAt":"2024-09-20T23:05:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Restore compatibility with Python 3.8 (#1189)\n\n* Restore compatibility with Python 3.8\r\n\r\nSigned-off-by: Przemyslaw Tredak \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n---------\r\n\r\nSigned-off-by: Przemyslaw Tredak \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>","shortMessageHtmlLink":"Restore compatibility with Python 3.8 (#1189)"}},{"before":"195d703287bf7be49437ce5d458d9342e58230ef","after":"0c74535e1addfcc93e0f871cef06f13f3c88040a","ref":"refs/heads/main","pushedAt":"2024-09-20T23:05:05.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Restore compatibility with Python 3.8 (#1189)\n\n* Restore compatibility with Python 3.8\r\n\r\nSigned-off-by: Przemyslaw Tredak \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n---------\r\n\r\nSigned-off-by: Przemyslaw Tredak \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>","shortMessageHtmlLink":"Restore compatibility with Python 3.8 (#1189)"}},{"before":"4fb25ccfea9e2ad1227fe3a82712849a2dbd5131","after":"63fd8ac4ab924c96c4474ae83f3a8dc0efcd8456","ref":"refs/heads/release_v1.11","pushedAt":"2024-09-20T20:38:45.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Allow downloading of model weights automatically (#1172)\n\n* allow tutorial to download the model weights automatically\r\n\r\nSigned-off-by: Sudhakar Singh \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* allow users to provide weight cache directory\r\n\r\nSigned-off-by: Sudhakar Singh \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n---------\r\n\r\nSigned-off-by: Sudhakar Singh \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>","shortMessageHtmlLink":"Allow downloading of model weights automatically (#1172)"}},{"before":"0ee5ccda2eacf96fbc9c7ef1f7c084a71e0df7a6","after":"195d703287bf7be49437ce5d458d9342e58230ef","ref":"refs/heads/main","pushedAt":"2024-09-20T20:38:00.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Allow downloading of model weights automatically (#1172)\n\n* allow tutorial to download the model weights automatically\r\n\r\nSigned-off-by: Sudhakar Singh \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* allow users to provide weight cache directory\r\n\r\nSigned-off-by: Sudhakar Singh \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n---------\r\n\r\nSigned-off-by: Sudhakar Singh \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>","shortMessageHtmlLink":"Allow downloading of model weights automatically (#1172)"}},{"before":"1d8752dda743b3d5667cbdbd4aa2e56cc02e9fdb","after":"60735d79954a0867fc0c5dc7619e901960a3b71e","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-19T22:54:34.000Z","pushType":"push","commitsCount":8,"pusher":{"login":"sudhakarsingh27","name":"Sudhakar Singh","path":"/sudhakarsingh27","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4879686?s=80&v=4"},"commit":{"message":"Merge branch 'main' into te_llama_tutorial_enhancement","shortMessageHtmlLink":"Merge branch 'main' into te_llama_tutorial_enhancement"}},{"before":"c0caadbe1cdba42efb09aba205f4de0905739f26","after":"0ee5ccda2eacf96fbc9c7ef1f7c084a71e0df7a6","ref":"refs/heads/main","pushedAt":"2024-09-19T00:55:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cyanguwa","name":"Charlene Yang","path":"/cyanguwa","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8636796?s=80&v=4"},"commit":{"message":"[PyTorch] Relax the contiguous check for flash attention (#1176)\n\n* relax contiguous check for flash attention\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* force contiguous for cp\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n---------\r\n\r\nSigned-off-by: Xin Yao ","shortMessageHtmlLink":"[PyTorch] Relax the contiguous check for flash attention (#1176)"}},{"before":"841634cab9662581ed0decaa2d3e6dac2b8b544b","after":"c0caadbe1cdba42efb09aba205f4de0905739f26","ref":"refs/heads/main","pushedAt":"2024-09-18T22:31:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Expose `rotary_base` as an arg instead of hardcoding (#944)\n\n* make rotary_base arg\r\n\r\nSigned-off-by: Sudhakar Singh \r\n\r\n* rotary base can be a float\r\n\r\nCo-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>\r\nSigned-off-by: Sudhakar Singh \r\n\r\n---------\r\n\r\nSigned-off-by: Sudhakar Singh \r\nCo-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>","shortMessageHtmlLink":"Expose rotary_base as an arg instead of hardcoding (#944)"}},{"before":"7e1068b372745aead2d05c93eb95d3b5726a4432","after":"841634cab9662581ed0decaa2d3e6dac2b8b544b","ref":"refs/heads/main","pushedAt":"2024-09-18T18:09:20.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"denera","name":"Alp Dener","path":"/denera","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7097165?s=80&v=4"},"commit":{"message":"[PyTorch] Check network interface name when initializing Userbuffers (#1175)\n\n* Check if network interface name is valid and show useful warning message when initializing Userbuffers\r\n\r\nSigned-off-by: Alp Dener \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* Fix formatting issue in warning message.\r\n\r\nCo-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>\r\nSigned-off-by: Alp Dener \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n---------\r\n\r\nSigned-off-by: Alp Dener \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>\r\nCo-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>","shortMessageHtmlLink":"[PyTorch] Check network interface name when initializing Userbuffers (#…"}},{"before":"eb60b1ab817953d49f55766918316e9fe0d92cf4","after":"7e1068b372745aead2d05c93eb95d3b5726a4432","ref":"refs/heads/main","pushedAt":"2024-09-18T01:22:21.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"timmoon10","name":"Tim Moon","path":"/timmoon10","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4406448?s=80&v=4"},"commit":{"message":"[PyTorch] Port fused optimizer tests to pytest (#1185)\n\nPort optimizer tests to pytest\r\n\r\nSigned-off-by: Tim Moon ","shortMessageHtmlLink":"[PyTorch] Port fused optimizer tests to pytest (#1185)"}},{"before":"28f95bdca76556caca8a600236b4b4248c23efeb","after":"eb60b1ab817953d49f55766918316e9fe0d92cf4","ref":"refs/heads/main","pushedAt":"2024-09-17T19:34:05.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Add docs for installing from PyPI (#1184)\n\n* Add PyPI install instructions\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Review from @timmoon10\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n---------\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Add docs for installing from PyPI (#1184)"}},{"before":"528d44bee97bc39d2335a8ee48c3a854aeb820c9","after":"28f95bdca76556caca8a600236b4b4248c23efeb","ref":"refs/heads/main","pushedAt":"2024-09-17T18:59:25.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"timmoon10","name":"Tim Moon","path":"/timmoon10","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4406448?s=80&v=4"},"commit":{"message":"Allow specifying cmake setup directory (#1186)\n\nAllow specifying cmake directory\r\n\r\nSigned-off-by: Ryan Li \r\nCo-authored-by: Ryan Li ","shortMessageHtmlLink":"Allow specifying cmake setup directory (#1186)"}},{"before":"e1252b1ecbc1043c67d1281e13f8998165c71c46","after":null,"ref":"refs/heads/pr_python3.8_compat","pushedAt":"2024-09-17T18:37:20.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"}},{"before":null,"after":"e1252b1ecbc1043c67d1281e13f8998165c71c46","ref":"refs/heads/pr_python3.8_compat","pushedAt":"2024-09-17T18:36:50.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Restore compatibility with Python 3.8\n\nSigned-off-by: Przemyslaw Tredak ","shortMessageHtmlLink":"Restore compatibility with Python 3.8"}},{"before":"44fd316f972a50be8105fdc41d2e5cd9efcf1a82","after":"528d44bee97bc39d2335a8ee48c3a854aeb820c9","ref":"refs/heads/main","pushedAt":"2024-09-17T17:18:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Changed VERSION to 1.12.0.dev\n\nSigned-off-by: Przemyslaw Tredak ","shortMessageHtmlLink":"Changed VERSION to 1.12.0.dev"}},{"before":null,"after":"4fb25ccfea9e2ad1227fe3a82712849a2dbd5131","ref":"refs/heads/release_v1.11","pushedAt":"2024-09-17T17:18:22.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Changed VERSION to 1.11.0\n\nSigned-off-by: Przemyslaw Tredak ","shortMessageHtmlLink":"Changed VERSION to 1.11.0"}},{"before":"bc9d706c627fbc5e2e33fdb4a62fe5d81b9f0ae0","after":"1d8752dda743b3d5667cbdbd4aa2e56cc02e9fdb","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-17T15:46:46.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"sudhakarsingh27","name":"Sudhakar Singh","path":"/sudhakarsingh27","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4879686?s=80&v=4"},"commit":{"message":"Merge branch 'main' into te_llama_tutorial_enhancement","shortMessageHtmlLink":"Merge branch 'main' into te_llama_tutorial_enhancement"}},{"before":"9101a78f124258ef76c598410a846645bff359c9","after":"44fd316f972a50be8105fdc41d2e5cd9efcf1a82","ref":"refs/heads/main","pushedAt":"2024-09-17T14:58:56.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"denera","name":"Alp Dener","path":"/denera","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7097165?s=80&v=4"},"commit":{"message":"[Common] Default CUDA_HOME to /usr/local/cuda when dynamically loading cuDNN and NVRTC (#1183)\n\nDefaulted CUDA_HOME/CUDA_PATH to /usr/local/cuda when attempting to dynamically load cuDNN and NVRTC\r\n\r\nSigned-off-by: Alp Dener ","shortMessageHtmlLink":"[Common] Default CUDA_HOME to /usr/local/cuda when dynamically loadin…"}},{"before":"d2d4cf9142d522562f6e05b2e5768619bd6a1356","after":"9101a78f124258ef76c598410a846645bff359c9","ref":"refs/heads/main","pushedAt":"2024-09-17T14:26:15.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mgoldfarb-nvidia","name":"Michael Goldfarb","path":"/mgoldfarb-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/171198277?s=80&v=4"},"commit":{"message":"[JAX] Context Parallel Attention with All-Gather (#1106)\n\nImplementation of context parallel fused attention using all-gather.\r\n\r\nSigned-off-by: Michael Goldfarb ","shortMessageHtmlLink":"[JAX] Context Parallel Attention with All-Gather (#1106)"}},{"before":"af5daa09e1bc6e29779ebfb1fc9bd56634218de5","after":"d2d4cf9142d522562f6e05b2e5768619bd6a1356","ref":"refs/heads/main","pushedAt":"2024-09-16T20:46:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"timmoon10","name":"Tim Moon","path":"/timmoon10","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4406448?s=80&v=4"},"commit":{"message":"Update CI users (#1181)\n\nUpdate list of CI users\r\n\r\nSigned-off-by: Tim Moon ","shortMessageHtmlLink":"Update CI users (#1181)"}},{"before":"674e499abe6231eb2ac442b01eb0804eddfc9910","after":"bc9d706c627fbc5e2e33fdb4a62fe5d81b9f0ae0","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-16T18:49:10.000Z","pushType":"push","commitsCount":7,"pusher":{"login":"sudhakarsingh27","name":"Sudhakar Singh","path":"/sudhakarsingh27","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4879686?s=80&v=4"},"commit":{"message":"Merge branch 'main' into te_llama_tutorial_enhancement","shortMessageHtmlLink":"Merge branch 'main' into te_llama_tutorial_enhancement"}},{"before":"27a8d84b350a7678affa4f37ef4f82ede26872ad","after":"674e499abe6231eb2ac442b01eb0804eddfc9910","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-16T18:48:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pre-commit-ci[bot]","name":null,"path":"/apps/pre-commit-ci","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/68672?s=80&v=4"},"commit":{"message":"[pre-commit.ci] auto fixes from pre-commit.com hooks\n\nfor more information, see https://pre-commit.ci","shortMessageHtmlLink":"[pre-commit.ci] auto fixes from pre-commit.com hooks"}},{"before":"0c63affc51303ff9e1de0654739dc0c045d7274d","after":"27a8d84b350a7678affa4f37ef4f82ede26872ad","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-16T18:48:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sudhakarsingh27","name":"Sudhakar Singh","path":"/sudhakarsingh27","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4879686?s=80&v=4"},"commit":{"message":"allow users to provide weight cache directory\n\nSigned-off-by: Sudhakar Singh ","shortMessageHtmlLink":"allow users to provide weight cache directory"}},{"before":"df699655e696f9f58c87576c44a58b393aebadb3","after":"af5daa09e1bc6e29779ebfb1fc9bd56634218de5","ref":"refs/heads/main","pushedAt":"2024-09-16T17:08:55.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Add dtensor support for TE optimizers (#1171)\n\nadd dtensor support for te optimizers\r\n\r\nSigned-off-by: jasonwan \r\nCo-authored-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Add dtensor support for TE optimizers (#1171)"}},{"before":"c55007b85aa1a6563dfd2d45f5353ba5e3cfe54f","after":"df699655e696f9f58c87576c44a58b393aebadb3","ref":"refs/heads/main","pushedAt":"2024-09-16T15:06:09.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"denera","name":"Alp Dener","path":"/denera","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7097165?s=80&v=4"},"commit":{"message":"[JAX] Fix unit tests to work around cuDNN 9.4 regression of 0 length sequences (#1179)\n\nModify unit tests to work around cuDNN 9.4 regression.\r\n\r\nSigned-off-by: Michael Goldfarb ","shortMessageHtmlLink":"[JAX] Fix unit tests to work around cuDNN 9.4 regression of 0 length …"}},{"before":"e79d915aa0ae1bfc69a6c6491357ce4dfc7befe8","after":"08a85d3b2657f1d4e0b478f6682c17fe6bba8b05","ref":"refs/heads/stable","pushedAt":"2024-09-11T21:36:59.000Z","pushType":"push","commitsCount":73,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8398980?s=80&v=4"},"commit":{"message":"Release v1.10","shortMessageHtmlLink":"Release v1.10"}},{"before":"e6e060303dd3b4614c6fe85b8fcf0063108b7fc8","after":"c55007b85aa1a6563dfd2d45f5353ba5e3cfe54f","ref":"refs/heads/main","pushedAt":"2024-09-11T19:13:21.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"timmoon10","name":"Tim Moon","path":"/timmoon10","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4406448?s=80&v=4"},"commit":{"message":"Update CI users (#1180)\n\nSigned-off-by: Tim Moon ","shortMessageHtmlLink":"Update CI users (#1180)"}},{"before":"2d57db8bcc5cf5562e726e978c875877c478a139","after":"e6e060303dd3b4614c6fe85b8fcf0063108b7fc8","ref":"refs/heads/main","pushedAt":"2024-09-11T14:36:53.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"[PyTorch] Lower atol/rtol for F16 attention tests (#1157)\n\n* reduce atol/rtol for F16 tests\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* relax the tols for Ampere\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n---------\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>","shortMessageHtmlLink":"[PyTorch] Lower atol/rtol for F16 attention tests (#1157)"}},{"before":"40dda924a52866c3a5e9b56f1907b4a2602f2fac","after":"2d57db8bcc5cf5562e726e978c875877c478a139","ref":"refs/heads/main","pushedAt":"2024-09-11T13:12:04.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"[PyTorch] Proxy class for low-precision tensor (#1127)\n\n* Add base class for tensor proxies\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Move tensor detaching logic to tensor proxy base class\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Use Python wrappers to PyTorch extensions\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Include transpose caching logic in proxy encode function\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Debug dimension mismatch with amax history\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Move dequantize logic to proxy_decode func\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Rename to \"QuantizedTensor\"\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Rename \"proxy_detach\" to \"detach\"\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Include transpose cache in detach and clone funcs\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Fix linter warnings\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* Update FP8 workspaces with QuantizedTensor functions\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Move logic for FP8 transpose cache in FP8 workspaces to base class\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Remove cast-transpose logic from linear op\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Remove unnecessary args for Float8Tensor when using FP8 attr dict\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Remove __torch_function__ to QuantizedTensor\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Fix linter warnings\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Update tests/pytorch/test_float8tensor.py\r\n\r\nSigned-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>\r\n\r\n* Debug FP8 transpose test\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n* Debug cast functions\r\n\r\nSigned-off-by: Tim Moon \r\n\r\n---------\r\n\r\nSigned-off-by: Tim Moon \r\nSigned-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>\r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>\r\nCo-authored-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"[PyTorch] Proxy class for low-precision tensor (#1127)"}},{"before":"3b9db334ba2c8ef1d063d8d859c6cb5b7414bd5a","after":null,"ref":"refs/heads/revert-1158-suppress_compile_warning","pushedAt":"2024-09-10T17:28:23.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"}},{"before":"2a9845e1d93440d3c0f65427985e66208d09eff8","after":"40dda924a52866c3a5e9b56f1907b4a2602f2fac","ref":"refs/heads/main","pushedAt":"2024-09-09T20:50:47.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cyanguwa","name":"Charlene Yang","path":"/cyanguwa","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8636796?s=80&v=4"},"commit":{"message":"Add a context parallelism implementation with QKVO all-to-all (#1160)\n\n* clean code for CP function args\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add a placeholder for Ulysses implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* commit code change to CP+A2A\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* finish the draft fwd implementation of Ulysses\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add draft bwd implementation of Ulysses\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* make swa work with ulysses\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* commit FP8 code for Ulysses\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix qkv type in the bwd of FP8+CP\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* typo fix\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix qkv_dtype of FP8+CP\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* code refactoring\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* minor code change\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* config cp correction dtype of FP8+CP\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* code style change\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* save chunk_ids\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* try to make Ulysses A2A async\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* make more a2a async\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix a2a_outputs\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix chunk_ids generation for A2A\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* avoid code duplication of a2a before attn\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* remove code duplication of a2a after attn\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add cp_stream in A2A implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* bug fix\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix qkv of fp8_fwd + bf16_bwd\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix kernel order in cp a2a communication\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* code cleaning for CP a2a\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix merging with main\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix a2a communication order\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* adjust sequence chunk reordering for a2a\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add docstring for A2A implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* change an assert info\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add unit tests of A2A implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add more A2A unit test\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix CP unit tests\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add more cp unit tests\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix window size of no_mask\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fused attn does not support swa+no_mask\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* change num_gqa_groups to 2 for A2A implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* function and variable renaming\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* code cleaning for CP all-gather implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* some function renaming\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* remove redundant code\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* commit code change for kv all-gather implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix all-gather implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add a window size check\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* code cleaning\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add unit test of all_gather+no_mask\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix all-gather cp implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* code cleaning\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* code format fix\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* code format fix\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix FP8 with A2A implementation\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* add paper references to CP implementations with all-gather and all-to-all\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* change pdf to abs\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* elaborate cp_comm_type\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* fix CP docstring\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n---------\r\n\r\nSigned-off-by: Xiaowei Ren \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>","shortMessageHtmlLink":"Add a context parallelism implementation with QKVO all-to-all (#1160)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yMFQyMzowNTozMS4wMDAwMDBazwAAAAS8be1o","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yMFQyMzowNTozMS4wMDAwMDBazwAAAAS8be1o","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0wOVQyMDo1MDo0Ny4wMDAwMDBazwAAAASxWXzb"}},"title":"Activity · NVIDIA/TransformerEngine"}