Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volcano restarts due to panic #3710

Open
Ramonaaaaa opened this issue Sep 9, 2024 · 7 comments
Open

volcano restarts due to panic #3710

Ramonaaaaa opened this issue Sep 9, 2024 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Ramonaaaaa
Copy link

Ramonaaaaa commented Sep 9, 2024

Description

When Volcano is used for scheduling, the system restarts for multiple times.Restart cause: out-of-bounds array causes panic.

error details:
kubectl logs -f -n volcano-system volcaho-scheduler-xx -p
2024/08/27 10:47:39 maxprocs: Updating GOMAXPROCS=20: determined from CPU quota
W0827 10:47:39.514559 1 client config go:617] Neither --kubeconfig nor --master was specified. Using the inclusterConfig.s This might not work.
I0827 10:48:01.142318 1 trace.go:205] Trace[1974509093]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169 (27-Aug-2024 10:47:39.584)(total time: 2155
7ms):
Trace[1974509093]: ---"Objects listed" error: 21521ms (10:48:01.105)
Trace[1974509093]: [21.55737372s1 [21.55737372s] END
I0827 10:48:03.669284 1 trace.go:205] Trace[585251636]: "DeltaFIFO Pop Process" ID:kube-system/mindx-dl-deviceinfo-kcs-haerbin-agi-s-thbdc,Depth:16,Reason:slow event handlers blocking the queue
(27-Aug-2024 10:48:03.472)(total time: 195ms):
Trace[585251636]: [195.826968ms][195.826968ms] END
E0827 10:48:10.501173 1 cache.go:1245] error occurred in updating Queue : Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please
apply your changes to the latest version and try again
E0827 10:48:10.501199 1 session.go:216] failed to update queue status: Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please ap
ply your changes to the latest version and try again
10827 10:48:13.363051 1 trace.go:205] Trace[616662840]: "DeltaFIFO Pop Process" ID:kube-system/mindx-dl-deviceinfo-kcs-haerbin-agi-s-jh95z,Depth:32,Reason:slow event handlers blocking the queue
(27-Aug-2024 10:48:13.181)(total time: 181ms):
Trace[616662840]: [181.390665ms] [181.390665ms] END
E0827 10:48:24.691004 1 cache go:1245] error occurred in updating Queue : Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please
apply your changes to the latest version and try again
E0827 10:48:24.691046 1 session.go:216] failed to update queue status: Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please ap
ply your changes to the latest version and try again
panic: runtime error: index out of range [4] with length 4

goroutine 81093 [running]:
k8s.io/api/core/v1.(*PodStatus).DeepCopyInto(Ox402dd59be8, 0x4019da0af8)
/devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/pkg/mod/k8s.io/[email protected]/core/v1/zz generated.deepcopy.go:3982 +0x5cc
kgs.io/api/core/v1.(*Pod).DeepCopyInto(Ox402dd598f0, 0x4019da0800)
/devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/pkg/mod/k8s.io/[email protected]/core/v1/zz_generated.deepcopy.go:3309 +0x100
k8s.io/api/core/v1.(*Pod).DeepCopy(...)
/devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/pkg/mod/k8s.io/[email protected]/core/v1/zz generated deepcopy go:3319
volcano.sh/volcano/pkg/scheduler/cache.(*defaultEvictor).Evict(0x400088e2e8, 0x402dd598f0, {0x4046557f68, 0x4})
/devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:209 +0x1a4
volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).Evict.func1()
/devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:751 +0x44
created by volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).Evict
/devcloud/slavespace/slave1-new/workspace/j CmcZc9ms/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:750 +0x338

Steps to reproduce the issue

Describe the results you received and expected

solve the panic problem

What version of Volcano are you using?

1.7

Any other relevant information

No response

@Ramonaaaaa Ramonaaaaa added the kind/bug Categorizes issue or PR as related to a bug. label Sep 9, 2024
@Monokaix
Copy link
Member

Monokaix commented Sep 9, 2024

Is this from volcano repo?

@Ramonaaaaa
Copy link
Author

Is this from volcano repo?

yes

@hwdef
Copy link
Member

hwdef commented Sep 9, 2024

Did you install volcano-admission correctly?

@Monokaix
Copy link
Member

Monokaix commented Sep 9, 2024

Is this from volcano repo?

yes

You add a custom plugin?

@Ramonaaaaa
Copy link
Author

Did you install volcano-admission correctly?

yes,and add a custom plugin

@Ramonaaaaa
Copy link
Author

Is this from volcano repo?

yes

You add a custom plugin?

yes. I've been using the new plugin for two years. I found this problem recently.

@Monokaix
Copy link
Member

From the picture you pasted, seems it's caused by your own plugin, maybe you can check the custom plugin first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants