Releases · microsoft/DeepSpeed

23 Feb 22:58

mrwyattii

v0.13.3

afdf028

v0.13.3 Patch release

What's Changed

Update version.txt after 0.13.2 release by @mrwyattii in #5119
Stop tracking backward chain of broadcast (ZeRO3) by @tohtana in #5113
[NPU]ZeRO-Infinity feature compatibility by @misstek in #5077
BF16 optimizer: Improve device utilization by immediate grad update by @deepcharm in #4975
removed if condition in if collate_fn is None by @bm-synth in #5107
disable compile tests for torch<2.1 by @mrwyattii in #5121
Update inference test model names by @mrwyattii in #5127
Fix issue with zero-sized file after merging file on curriculum map_reduce by @bm-synth in #5106
Update return codes in PyTest to properly error out if tests fail by @loadams in #5122
add missing methods to MPS_Accelerator by @mrwyattii in #5134
Solve tensor vs numpy dtype conflicts in data efficiency map-reduce. by @bm-synth in #5108
Fix broadcast deadlock for incomplete batches in data sample for data analysis by @bm-synth in #5117
Avoid zero-sized microbatches for incomplete minibatches when doing curriculum learning by @bm-synth in #5118
remove mandatory index key from output of metric_function in DataAnalysis map operation by @bm-synth in #5112
tensorboard logging: avoid item() outside gas to improve performance by @nelyahu in #5135
Check overflow on device without host synchronization for each tensor by @BacharL in #5115
Update nv-inference torch version by @loadams in #5128
Method run_map_reduce to fix errors when running run_map followed by run_reduce by @bm-synth in #5131
Added missing isinstance check in PR 5112 by @bm-synth in #5142
Fix UserWarning: The torch.cuda.*DtypeTensor constructors are no long… by @ShukantPal in #5018
TestEmptyParameterGroup: replace fusedAdam with torch.optim.AdamW by @nelyahu in #5139
Update deprecated HuggingFace function by @mrwyattii in #5144
Pin to PyTest 8.0.0 by @loadams in #5163
get_grad_norm_direct: fix a case of empty norm group by @nelyahu in #5148
Distributed in-memory map-reduce for data analyzer by @bm-synth in #5129
DeepSpeedZeroOptimizer_Stage3: remove cuda specific optimizer by @nelyahu in #5138
MOE: Fix save checkpoint when TP > 1 by @mosheisland in #5157
Fix gradient clipping by @tohtana in #5150
Use ninja to speed up build by @jinzhen-lin in #5088
Update flops profiler to handle attn and matmul by @KimmiShi in #4724
Fix allreduce for BF16 and ZeRO0 by @tohtana in #5170
Write multiple items to output file at once, in distributed data analyzer. by @bm-synth in #5169
Fix typos in blogs/ by @jinyouzhi in #5172
Inference V2 Human Eval by @lekurile in #4804
Reduce ds_id name length by @jomayeri in #5176
Switch cpu-inference workflow from --extra-index-url to --index-url by @loadams in #5182

New Contributors

@bm-synth made their first contribution in #5107
@KimmiShi made their first contribution in #4724

Full Changelog: v0.13.2...v0.13.3

Contributors

KimmiShi, jinyouzhi, and 13 other contributors

Assets 2

12 Feb 17:19

mrwyattii

v0.13.2

1817980

v0.13.2 Patch release

What's Changed

Update version.txt after 0.13.1 release by @mrwyattii in #5002
Support exclude_frozen_parameters for save_16bit_model by @LZHgrla in #4999
Allow nightly tests dispatch by @mrwyattii in #5014
Enable hpz based on secondary tensor presence by @HeyangQin in #4906
Enable workflow dispatch on all workflows by @loadams in #5016
[minor] improve code quality and readablilty by @ByronHsu in #5011
Update falcon fused type order by @Yejing-Lai in #5007
Fix error report of DSElasticAgent._set_master_addr_port() by @RobinDong in #4985
DS #4993 #662 : autotune single node hostfile bugfix by @oushu1zhangxiangxuan1 in #4996
[minor] Improve logging for multiprocesses by @ByronHsu in #5004
deepspeed/launcher: add launcher_helper as each rank's start portal by @YizhouZ in #4699
Graph capture support on HPU accelerators by @deepcharm in #5013
launcher/launcher_helper.py: fix PMI name and add EnvironmentError by @YizhouZ in #5025
Remove MI100 badge from landing page by @mrwyattii in #5036
Remove coverage reports from workflows and fix for inference CI by @loadams in #5028
Remove Megatron-DeepSpeed CI workflow by @mrwyattii in #5038
Fix P40 CI failures by @mrwyattii in #5037
Fix for nightly torch CI by @mrwyattii in #5039
Fix nv-accelerate and nv-torch-latest-v100. by @loadams in #5035
update inference pages to point to FastGen by @mrwyattii in #5029
launcher_helper: enable fds passing by @YizhouZ in #5042
Fix nv-torch-latest-cpu CI by @mrwyattii in #5045
[NPU] Add NPU to support hybrid engine by @CurryRice233 in #4831
MoE type hints by @ringohoffman in #5043
[doc] update inference related docs from mp_size to tensor_parallel for TP by @yundai424 in #5048
Fix broken model names in inference CI by @mrwyattii in #5053
[NPU] Change log level to debug by @CurryRice233 in #5051
Delay reduce-scatter for ZeRO3 leaf modules by @tohtana in #5008
Optimize grad_norm calculations by reducing device/host dependency by @nelyahu in #4974
load linear layer weight with given dtype by @polisettyvarma in #4044
Update import for changes to latest diffusers by @mrwyattii in #5065
adding hccl to init_distributed function description by @nelyahu in #5034
[Zero++ qgZ] Fall back to reduce_scatter if tensor.numel() % (2 * global_world_size) != 0 by @ByronHsu in #5056
Make batch size documentation clearer by @segyges in #5072
[doc/1-line change] default stage3_param_persistence_threshold is wrong in the doc by @ByronHsu in #5073
Further refactor deepspeed.moe.utils + deepspeed.moe.layer type hints by @ringohoffman in #5060
Fix verification for ZeRO3 leaf module by @tohtana in #5074
Stop tracking backward chain of broadcast in initialization by @tohtana in #5075
Update torch version for nv-torch-latest-cpu by @loadams in #5086
Add backwards compatibility w/ older versions of diffusers (<0.25.0) by @lekurile in #5083
Enable torch.compile with ZeRO (Experimental) by @tohtana in #4878
Update nv-accelerate to latest torch by @loadams in #5040
HPU Accelerator: fix supported_dtypes API by @nelyahu in #5094
[NPU] replace 'cuda' with get_accelerator().device_name() by @minchao-sun in #5095
optimize clip_grad_norm_ function by @mmhab in #4915
[xs] fix ZEROPP convergence test by @yundai424 in #5061
Switch hasattr check from compile to compiler by @loadams in #5096
Split is_synchronized_device api to multiple apis by @BacharL in #5026
47% FastGen speedup for low workload - refactor allocator by @HeyangQin in #5090
Support exclude_frozen_parameters for zero_to_fp32.py script by @andstor in #4979
Fix alignment of optimizer states when loading by @tohtana in #5105
Skip Triton import for AMD by @lekurile in #5110
Add HIP conversion file outputs to .gitignore by @lekurile in #5111
Remove optimizer step on initialization by @tohtana in #5104

New Contributors

@ByronHsu made their first contribution in #5011
@RobinDong made their first contribution in #4985
@oushu1zhangxiangxuan1 made their first contribution in #4996
@yundai424 made their first contribution in #5048
@segyges made their first contribution in #5072
@andstor made their first contribution in #4979

Full Changelog: v0.13.1...v0.13.2

Contributors

RobinDong, polisettyvarma, and 20 other contributors

Assets 2

23 Jan 23:01

mrwyattii

v0.13.1

1d35db7

v0.13.1 Patch release

What's Changed

Update version.txt after 0.13.0 release by @mrwyattii in #4982
Update FastGen blog title by @arashb in #4983
Fix the MoE-params gradient-scaling by @RezaYazdaniAminabadi in #4957
fix some typo under blogs/ by @digger-yu in #4988
Fix placeholder value in FastGen Blog by @mrwyattii in #5000
fix for DS_ENV issue by @jeffra in #4992
Delete unused --deepspeed_mpi command line argument by @ShukantPal in #4981
Make installable without torch by @mrwyattii in #5001
Implement some APIs of HPU accelerator by @mmhab in #4935
Refactor the Qwen positional emebdding config code by @ZonePG in #4955

Full Changelog: v0.13.0...v0.13.1

Contributors

arashb, jeffra, and 6 other contributors

Assets 2

19 Jan 23:25

mrwyattii

v0.13.0

1c8b8f3

DeepSpeed v0.13.0

New Features

DeepSpeed-FastGen: Introducting Mixtral, Phi-2, and Falcon support with major performance and feature enhancements.

What's Changed

Update version.txt after 0.12.6 release by @mrwyattii in #4850
doc corrections by @goodship1 in #4861
Fix exception handling in get_all_ranks_from_group() function by @HeyangQin in #4862
deepspeed engine: fp16 support validation on init by @nelyahu in #4843
Remove hooks on gradient accumulation on engine/optimizer destroy by @chiragjn in #4858
optimize grad_norm calculation in stage3.py by @mmhab in #4436
Fix f-string messages by @li-plus in #4865
[NPU] Fix npu offload bug by @CurryRice233 in #4883
Partition parameters: Minor refactoring of use_secondary_tensor condition by @deepcharm in #4868
Pipeline: Add support to eval micro bs configuration by @nelyahu in #4859
zero_to_fp32.py: Handle a case where shape doesn't have numel attr by @nelyahu in #4842
Add support of Microsoft Phi-2 model to DeepSpeed-FastGen by @arashb in #4812
Support cpu tensors without direct device invocation by @abhilash1910 in #3842
add sharded loading for safetensors in AutoTP by @sywangyi in #4854
[XPU] XPU accelerator support for Intel GPU device by @delock in #4547
enable starcode((kv_head=1)) autotp by @Yejing-Lai in #4896
Release overlap_comm & contiguous_gradients restrictions for ZeRO 1 by @li-plus in #4887
[NPU]Add ZeRO-Infinity feature for NPU by @misstek in #4809
fix num_kv_heads sharding in uneven autoTP for Falcon-40b by @Yejing-Lai in #4712
Nvme offload checkpoint by @eisene in #4707
Add WarmupCosineLR to Read the Docs by @dwyatte in #4916
Add Habana Labs HPU accelerator support by @deepcharm in #4912
Unit tests for MiCS by @zarzen in #4792
Fix SD workflow to work with latest diffusers version by @lekurile in #4918
[Fix] Fix cpu inference UT failure by @delock in #4430
Add paths to run SD tests by @loadams in #4919
Change PR/schedule triggers for CPU-inference by @loadams in #4924
fix falcon-40b accuracy issue by @Yejing-Lai in #4895
Refactor the positional emebdding config code by @arashb in #4920
Pin to triton 2.1.0 to fix issues with nv-inference by @loadams in #4929
Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen by @ZonePG in #4913
DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators by @nelyahu in #4833
Fix confusing width in simd_load by @yzhblind in #4714
Specify permissions for secrets.GITHUB_TOKEN by @mrwyattii in #4927
Enable quantizer op on ROCm by @rraminen in #4114
autoTP for Qwen by @inkcherry in #4902
Allow specifying mii branch for nv-a6000 workflow by @mrwyattii in #4936
Only run MII CI for inference changes by @mrwyattii in #4939
InfV2 - remove generation config requirement by @mrwyattii in #4938
Cache HF model list for inference tests by @mrwyattii in #4940
Fix docs inconsistency on default value for ignore_unused_parameters by @loadams in #4949
Fix bug in CI model caching by @mrwyattii in #4951
fix uneven issue & add balance autotp by @Yejing-Lai in #4697
Optimize preprocess for ragged batching by @tohtana in #4942
Fix bug where ZeRO2 never uses the reduce method. by @CurryRice233 in #4946
[docs] Add new autotp supported model in tutorial by @delock in #4960
Add missing op_builder.hpu component for HPU accelerator by @nelyahu in #4963
Stage_1_and_2.py: fix assert for reduce_scatter configurations combinations by @nelyahu in #4964
[MiCS]Add the path to support sequence_data_parallel on MiCS by @ys950902 in #4926
Update the DeepSpeed Phi-2 impl. to work with the HF latest changes by @arashb in #4950
Prevent infinite recursion when DS_ACCELERATOR is set to cuda by @ShukantPal in #4962
Fixes for training models with bf16 + freshly initialized optimizer via load_module_only by @haileyschoelkopf in #4141
params partition for skip_init by @inkcherry in #4722
Enhance query APIs for text generation by @tohtana in #4965
Add API to set a module as a leaf node when recursively setting Z3 hooks by @tohtana in #4966
Fix T5 and mistral model meta data error by @Yejing-Lai in #4958
FastGen Jan 2024 blog by @mrwyattii in #4980

New Contributors

@chiragjn made their first contribution in #4858
@li-plus made their first contribution in #4865
@misstek made their first contribution in #4809
@dwyatte made their first contribution in #4916
@ZonePG made their first contribution in #4913
@yzhblind made their first contribution in #4714
@ShukantPal made their first contribution in #4962
@haileyschoelkopf made their first contribution in #4141

Full Changelog: v0.12.6...v0.13.0

Contributors

arashb, zarzen, and 26 other contributors

Assets 2

21 Dec 00:46

mrwyattii

v0.12.6

c00388a

v0.12.6: Patch release

What's Changed

Update version.txt after 0.12.5 release by @mrwyattii in #4826
Cache metadata for TP activations and grads by @BacharL in #4360
Inference changes for incorporating meta loading checkpoint by @oelayan7 in #4692
Update CODEOWNERS by @mrwyattii in #4838
support baichuan model: by @baodii in #4721
inference engine: check if accelerator supports FP16 by @nelyahu in #4832
Update zeropp.md by @goodship1 in #4835
[NPU] load EXPORT_ENV based on different accelerators to support multi-node training on other devices by @minchao-sun in #4830
Add cuda_accelerator.py to triggers for A6000 test by @mrwyattii in #4848
Capture short kernel sequences to graph by @inkcherry in #4318
Checkpointing: Avoid assigning tensor storage with different device by @deepcharm in #4836
engine.py: remove unused _curr_save_path by @nelyahu in #4844
Mixtral FastGen Support by @cmikeh2 in #4828

New Contributors

@minchao-sun made their first contribution in #4830

Full Changelog: v0.12.5...v0.12.6

Contributors

cmikeh2, goodship1, and 8 other contributors

Assets 2

16 Dec 01:00

mrwyattii

v0.12.5

65b7727

v0.12.5: Patch release

What's Changed

Fix DS Stable Diffusion for latest diffusers version by @lekurile in #4770
Resolve any '..' in the file paths using os.path.abspath() by @rraminen in #4709
Update dockerfile with updated versions by @loadams in #4780
Run workflows when they are edited by @loadams in #4779
BF16_Optimizer: add support for bf16 grad acc by @nelyahu in #4713
fix autoTP issue for mpt (trust_remote_code=True) by @sywangyi in #4787
Fix Hybrid Engine metrics printing by @lekurile in #4789
[BUG] partition_balanced return wrong result. by @zjjMaiMai in #4312
improve the way to determine whether a variable is None by @RUAN-ZX in #4782
[NPU] Add HcclBackend for 1-bit adam, 1-bit lamb, 0/1 adam by @RUAN-ZX in #4733
Fix for stage3 when setting different communication data type by @BacharL in #4540
Add support of Falcon models (7b, 40b, 180b) to DeepSpeed-FastGen by @arashb in #4790
Switch paths-ignore to single quotes, update paths-ignore on nv-pre-compile-ops by @loadams in #4805
fix for tests using torch<2.1 by @mrwyattii in #4818
Universal Checkpoint for Sequence Parallelism by @samadejacobs in #4752
Accelerate CI fix by @mrwyattii in #4819
fix [BUG] 'DeepSpeedGPTInference' object has no attribute 'dtype' for… by @jxysoft in #4814
Update broken link in docs by @mrwyattii in #4822
Update imports from Transformers by @loadams in #4817
Minor updates to CI workflows by @mrwyattii in #4823
fix falcon model load from_config meta_data error by @baodii in #4783
mv DeepSpeedEngine param_names dict init post _configure_distributed_model by @nelyahu in #4803
Refactor launcher user arg parsing by @mrwyattii in #4824
Fix 4649 by @Alienfeel in #4650

New Contributors

@zjjMaiMai made their first contribution in #4312
@jxysoft made their first contribution in #4814
@baodii made their first contribution in #4783
@Alienfeel made their first contribution in #4650

Full Changelog: v0.12.4...v0.12.5

Contributors

arashb, jxysoft, and 12 other contributors

Assets 2

01 Dec 19:32

mrwyattii

v0.12.4

7122362

v0.12.4: Patch release

What's Changed

Update version.txt after 0.12.3 release by @mrwyattii in #4673
[MII] catch error wrt HF version and Mistral by @jeffra in #4634
[NPU] Add NPU support for unit test by @RUAN-ZX in #4569
[op-builder] use unique exceptions for cuda issues by @jeffra in #4653
Add stable diffusion unit test by @mrwyattii in #2496
[CANN] Support cpu offload optimizer for Ascend NPU by @hipudding in #4568
Inference Checkpoints in V2 by @cmikeh2 in #4664
KV Cache Improved Flexibility by @cmikeh2 in #4668
Fix for when prompt contains an odd num of apostrophes by @oelayan7 in #4660
universal-ckp: support megatron-deepspeed llama model by @mosheisland in #4666
Add new MII unit tests by @mrwyattii in #4693
[Bug fix] WarmupCosineLR issues by @sbwww in #4688
infV2 fix for OPT size variants by @mrwyattii in #4694
Add get and set APIs for the ZeRO-3 partitioned parameters by @yiliu30 in #4681
Remove unneeded dict reinit (fix for #4565) by @eisene in #4702
Update flops profiler to recurse by @loadams in #4374
Communication Optimization for Large-Scale Training by @RezaYazdaniAminabadi in #4695
[docs] Intel inference blog by @jeffra in #4734
use all_gather_into_tensor instead of all_gather by @taozhiwei in #4705
Install deepspeed-kernels only on Linux by @aphedges in #4739
Add nv-sd badge to README by @loadams in #4747
Re-organize .gitignore file to be parsed properly by @aphedges in #4740
fix mics run with offload++ by @GuanhuaWang in #4749
Fix logger formatting for partitioning flags by @OAfzal in #4728
fix: to solve #4726 by @RUAN-ZX in #4727
Add safetensors support by @jihnenglin in #4659

New Contributors

@RUAN-ZX made their first contribution in #4569
@oelayan7 made their first contribution in #4660
@sbwww made their first contribution in #4688
@yiliu30 made their first contribution in #4681
@eisene made their first contribution in #4702
@taozhiwei made their first contribution in #4705
@OAfzal made their first contribution in #4728
@jihnenglin made their first contribution in #4659

Full Changelog: v0.12.3...v0.12.4

Contributors

jeffra, cmikeh2, and 15 other contributors

Assets 2

13 Nov 17:51

lekurile

v0.12.3

6ea44d0

v0.12.3: Patch release

New Bug Fixes

Stable Diffusion now supported with latest Torch, diffusers, and Triton versions.

What's Changed

Update version.txt after 0.12.2 release by @mrwyattii in #4617
Fix figure in FlexGen blog by @tohtana in #4624
Fix figure of llama2 13B in DS-FlexGen blog by @tohtana in #4625
Fix config format by @xu-song in #4594
Guanhua/partial offload rebase v2 (#590) by @GuanhuaWang in #4636
offload++ blog (#623) by @GuanhuaWang in #4637
Update README in offloadpp blog by @GuanhuaWang in #4641
[docs] update news items by @jeffra in #4640
DeepSpeed-FastGen Chinese Blog by @HeyangQin in #4642
Fix issues with torch cpu builds by @loadams in #4639
Isolate src code and testing for DeepSpeed-FastGen by @cmikeh2 in #4610
Add Japanese blog for DeepSpeed-FastGen by @tohtana in #4651
Fix for MII unit tests by @mrwyattii in #4652
Enhance the robustness of module_state_dict by @LZHgrla in #4587
Enable ZeRO3 allgather for multiple dtypes by @tohtana in #4647
add option to disable pipeline partitioning by @nelyahu in #4322
Added HIP_PLATFORM_AMD=1 for non JIT build by @rraminen in #4585
Fix rope_theta arg for diffusers_attention by @lekurile in #4656
tl.dot(a,b, trans_b=True) is not supported by triton2.0+ , updating this api by @bmedishe in #4541
Update ds-chat workflow to work w/ deepspeed-chat install by @lekurile in #4598
Diffusers attention script update triton2.1 by @bmedishe in #4573
Fix the openfold training. by @cctry in #4657
Universal ckp fixes by @mosheisland in #4588
Update .gitignore [Adding comments , Improved documentation] by @Nadav23AnT in #4631
Update lr_schedules.py by @CoinCheung in #4563
Fix UNET and VAE implementations for new diffusers version by @lekurile in #4663
fix num_kv_heads sharding in autoTP for the new in-repo Falcon-40B by @dc3671 in #4654

New Contributors

@xu-song made their first contribution in #4594
@LZHgrla made their first contribution in #4587
@mosheisland made their first contribution in #4588
@Nadav23AnT made their first contribution in #4631
@CoinCheung made their first contribution in #4563

Full Changelog: v0.12.2...v0.12.3

Contributors

jeffra, cmikeh2, and 16 other contributors

Assets 2

04 Nov 04:08

jeffra

v0.12.2

4f7dd72

v0.12.2

What's Changed

Quick bug fix direct to master to ensure mismatched cuda environments are shown to the user 4f7dd72
Update version.txt after 0.12.1 release by @mrwyattii in #4615

Full Changelog: v0.12.1...v0.12.2

Contributors

mrwyattii

Assets 2

04 Nov 03:24

jeffra

v0.12.1

3437a5b

v0.12.1: Patch release

What's Changed

Update version.txt after 0.12.0 release by @mrwyattii in #4611
Add number for latency comparison by @tohtana in #4612
Update minor CUDA version compatibility. by @cmikeh2 in #4613

Full Changelog: v0.12.0...v0.12.1

Contributors

cmikeh2, mrwyattii, and tohtana

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

New Features

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New Bug Fixes

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: microsoft/DeepSpeed

v0.13.3 Patch release

What's Changed

New Contributors

Contributors

v0.13.2 Patch release

What's Changed

New Contributors

Contributors

v0.13.1 Patch release

What's Changed

Contributors

DeepSpeed v0.13.0

New Features

What's Changed

New Contributors

Contributors

v0.12.6: Patch release

What's Changed

New Contributors

Contributors

v0.12.5: Patch release

What's Changed

New Contributors

Contributors

v0.12.4: Patch release

What's Changed

New Contributors

Contributors

v0.12.3: Patch release

New Bug Fixes

What's Changed

New Contributors

Contributors

v0.12.2

What's Changed

Contributors

v0.12.1: Patch release

What's Changed

Contributors