Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: numel: integer multiplication overflow #38

Open
fanzz1208 opened this issue Mar 5, 2024 · 6 comments
Open

RuntimeError: numel: integer multiplication overflow #38

fanzz1208 opened this issue Mar 5, 2024 · 6 comments

Comments

@fanzz1208
Copy link

Using cache found in /root/.cache/torch/hub/intel-isl_MiDaS_master
/opt/conda/envs/FSGS/lib/python3.8/site-packages/timm/models/_factory.py:117: UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k.
model = create_fn(
Using cache found in /root/.cache/torch/hub/intel-isl_MiDaS_master
[1000, 2000, 3000, 5000, 10000]
Optimizing output/garden
Output folder: output/garden [05/03 01:51:52]
Tensorboard not available: not logging progress [05/03 01:51:52]
Reading camera 185/185 [05/03 01:51:54]
4.750064706802369 cameras_extent [05/03 01:51:54]
Loading Training Cameras [05/03 01:51:54]
24it [00:07, 3.22it/s]
Loading Test Cameras [05/03 01:52:02]
24it [00:04, 5.08it/s]
Number of points at initialisation : 3906731 [05/03 01:52:50]
Training progress: 0%| | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 279, in
training(lp.extract(args), op.extract(args), pp.extract(args), args)
File "train.py", line 90, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background)
File "/workspace/FSGS/gaussian_renderer/init.py", line 94, in render
rendered_image, radii, depth, alpha = rasterizer(
File "/opt/conda/envs/FSGS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/FSGS/lib/python3.8/site-packages/diff_gaussian_rasterization/init.py", line 215, in forward
return rasterize_gaussians(
File "/opt/conda/envs/FSGS/lib/python3.8/site-packages/diff_gaussian_rasterization/init.py", line 32, in rasterize_gaussians
return _RasterizeGaussians.apply(
File "/opt/conda/envs/FSGS/lib/python3.8/site-packages/diff_gaussian_rasterization/init.py", line 92, in forward
num_rendered, color, depth, alpha, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: numel: integer multiplication overflow
Training progress: 0%| | 0/10000 [00:00<?, ?it/s]

@fanzz1208
Copy link
Author

Using cache found in /root/.cache/torch/hub/intel-isl_MiDaS_master
/opt/conda/envs/FSGS/lib/python3.8/site-packages/timm/models/_factory.py:117: UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k.
model = create_fn(
Using cache found in /root/.cache/torch/hub/intel-isl_MiDaS_master
[1000, 2000, 3000, 5000, 10000]
Optimizing output/garden
Output folder: output/garden [05/03 02:13:22]
Tensorboard not available: not logging progress [05/03 02:13:22]
Reading camera 185/185 [05/03 02:13:24]
4.750064706802369 cameras_extent [05/03 02:13:24]
Loading Training Cameras [05/03 02:13:24]
24it [00:04, 5.71it/s]
Loading Test Cameras [05/03 02:13:28]
24it [00:02, 9.41it/s]
Number of points at initialisation : 3906731 [05/03 02:13:41]
Training progress: 0%| | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 279, in
training(lp.extract(args), op.extract(args), pp.extract(args), args)
File "train.py", line 90, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background)
File "/workspace/FSGS/gaussian_renderer/init.py", line 94, in render
rendered_image, radii, depth, alpha = rasterizer(
File "/opt/conda/envs/FSGS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/FSGS/lib/python3.8/site-packages/diff_gaussian_rasterization/init.py", line 215, in forward
return rasterize_gaussians(
File "/opt/conda/envs/FSGS/lib/python3.8/site-packages/diff_gaussian_rasterization/init.py", line 32, in rasterize_gaussians
return _RasterizeGaussians.apply(
File "/opt/conda/envs/FSGS/lib/python3.8/site-packages/diff_gaussian_rasterization/init.py", line 92, in forward
num_rendered, color, depth, alpha, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Training progress: 0%| | 0/10000 [00:00<?, ?it/s]

@Mysterious-handsome-man

Hello! I encountered the exact same problem as you. When training ”nerf_llff_data“, I encountered the error "RuntimeError: CUDA error: an illegal memory access was encountered." Then, when switching to training ”mipnerf360“, I got the error "RuntimeError: numel: integer multiplication overflow."

Initially, I thought it might be due to limitations of the GPU. However, even after renting a cloud server with an A100 GPU and 40GB of memory, the problem persisted without any improvement. This has left me quite frustrated.

Did you manage to solve this issue in the end? If you're willing to discuss it with me, it would be of great help to me!

@zhanghaoyu816
Copy link

Hello, I also struggled for this problem "RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1."
I have successfully rendered the image, but this error happened when calculating the ssim in loss function

@Mysterious-handsome-man
Copy link

Hello, I also struggled for this problem "RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1." I have successfully rendered the image, but this error happened when calculating the ssim in loss function

你好,这个问题我已经解决了,并且成功运行出了最后结果。给您如下建议进行参考:

1、检查python、pytorch、cuda等版本、环境配置问题

2、参考这篇issue:graphdeco-inria/gaussian-splatting#41 (comment)
image

没有记错的话,我在重新配置环境后,然后按照这篇issue里的方案,就成功解决了illegal memory。您可以尝试一下。如果issue方案不起作用,可以进一步咨询我的环境配置。

@baoachun
Copy link

@Mysterious-handsome-man Hi, could you share your configuration information? For example, Python version, PyTorch version, CUDA version, etc. I still haven't resolved the memory error issue despite trying the solutions mentioned above.

@trainable
Copy link

@Mysterious-handsome-man 可以给我参考一下你的环境吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants