Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic loading amdgpu #255

Open
rkitover opened this issue Dec 24, 2020 · 30 comments
Open

panic loading amdgpu #255

rkitover opened this issue Dec 24, 2020 · 30 comments

Comments

@rkitover
Copy link

I get this in /var/log/messages:

Dec 24 01:14:03 epycfbsd kernel: drmn1: failed to link firmware kernel module with mapped name: amdgpu_navi10_gpu_info_bin
Dec 24 01:14:03 epycfbsd kernel: amdgpu/navi10_gpu_info.bin: could not load firmware image, error 2
Dec 24 01:14:03 epycfbsd syslogd: last message repeated 1 times
Dec 24 01:14:03 epycfbsd kernel: drmn1: failed to load firmware with name: amdgpu/navi10_gpu_info.bin
Dec 24 01:14:03 epycfbsd kernel: drmn1: Failed to load gpu_info firmware "amdgpu/navi10_gpu_info.bin"
Dec 24 01:14:03 epycfbsd kernel: drmn1: Fatal error during GPU init
Dec 24 01:14:03 epycfbsd kernel: [drm] amdgpu: finishing device.
Dec 24 01:14:03 epycfbsd kernel: Warning: can't remove non-dynamic nodes (dri)!

I am using latest git current with latest git drm-devel-kmod and gpu-firmware-kmod from ports.

My card is a 5700xt.

Any help much appreciated.

@valpackett
Copy link
Contributor

Well do you have the firmware installed? It's packaged as gpu-firmware-kmod.

@rkitover
Copy link
Author

Yes I have the latest drm-devel-kmod and gpu-firmware-kmod installed from the ports git.

@valpackett
Copy link
Contributor

Weird. Can you manually kldload amdgpu_navi10_gpu_info_bin?

@rkitover
Copy link
Author

I will try right now.

@rkitover
Copy link
Author

So that loads fine, but it seems that amdgpu dies with a backtrace which I need to get with hw.syscons.disable=1 which might be tricky.

@valpackett
Copy link
Contributor

Drop the syscons.disable and just make sure the loader is not in the highest/"native" resolution (check with the gop command in the loader prompt) (set efi_max_resolution="1080p" or efi_max_resolution="720p" or whatever would make it smaller in loader.conf`), there should be no framebuffer problems in that case

@rkitover
Copy link
Author

Nice, thank you, I will try that.

@rkitover
Copy link
Author

rkitover commented Dec 25, 2020

I set the resolution to 1440p and the framebuffer started at 1600x1200.

I ran:

kldload amdgpu_navi10_gpu_info_bin
kldload amdgpu

I got this backtrace:

backtrace

Here is my system information:

Motherboard: Supermicro H11DSi
CPUs: 2x 32 core first gen AMD epyc
RAM: 128gb ECC
GPUs: 2x 5700xt
FreeBSD: latest current git
Packages: latest from pkg
Ports: latest git versions of drm-devel-kmod and gpu-firmware-kmod

If you would like to debug this, I'll be happy to do whatever is needed.

@rkitover rkitover changed the title failed to load firmware with name: amdgpu/navi10_gpu_info.bin panic loading amdgpu Dec 25, 2020
@valpackett
Copy link
Contributor

2x 5700xt

huh. well the panic is that the driver is trying to create /dev/dri/renderD128 twice (error code 17 is EEXIST). Looks like the first GPU failed to attach for some actual, more serious reason (drmn0 attach returned 2 at the very top line), and we don't clean everything up in that case, so the second GPU fails with that.

Can you scroll up (with Scroll Lock) in the console to see what's up with the first GPU?

@rkitover
Copy link
Author

This time I got somewhat different behavior, I ran:

kldload amdgpu

and it initialized the first card and loaded the firmware, I could see the firmware modules in kldstat. However, it failed to initialize the second card.

Here is the first card being successfully initialized:

log

Here is the second card failing to initialize:

log

I then tried running xorg with amdgpu to see if it would start on the first card, but I got a panic:

panic

@valpackett
Copy link
Contributor

hm, same trace as freebsd/drm-kmod#36 with the vm_page_busy_acquire

Try building the newer driver from freebsd/drm-kmod#40 (https://github.com/myfreeweb/drm-kmod/tree/5.5-wip-amd-pr)

@rkitover
Copy link
Author

I built and installed that branch, and now I get this backtrace on kldload amdgpu, this is the first page:

page1 of backtrace

and this is the second page:

page 2 of backtrace

@valpackett
Copy link
Contributor

welcome to the navi FPU kernel context issues suffering club :D (freebsd/drm-kmod#42 etc)

Please try again (git pull to get the latest commit freebsd/drm-kmod@c4cc838)

@rkitover
Copy link
Author

Thank you, that seems to have gotten further, but still panics:

page 1 of crashdump

page 2 of crashdump

Looks to be dying in the same function.

@valpackett
Copy link
Contributor

Oh, right. *facepalm* Try again freebsd/drm-kmod@7693e3a

@rkitover
Copy link
Author

Just tried this, module loads and initializes the first GPU, fails to initialize the second GPU, then locks up hard when I start xorg.

console log page 1

console log page 2

@valpackett
Copy link
Contributor

hm. Does it work with only one GPU installed?

@rkitover
Copy link
Author

Will try today!

@rkitover
Copy link
Author

Miraculously, it works. I am typing this in KDE on amdgpu now!

Some problems with KDE, but I'll try to work on that.

So what are the next steps here.

I can play with the code a bit if you tell me where to look and what to look at.

@novolaska
Copy link

novolaska commented Dec 26, 2020

Oh, right. facepalm Try again freebsd/drm-kmod@7693e3a

Works for Renoir 4750G.

dmesg.log
Xorg.0.log

Update:
drm-kmod source, https://github.com/unrelentingtech/drm-kmod/commits/5.5-wip-amd-pr (7693e3a492da031171f33cd2d239392a6ae861f1)

FreeBSD current, https://github.com/freebsd/freebsd-src, before commit 50180d2b52cc16ecb6a6617fdc53f5d83c71a8b4 (included), and patched with commit 9f47eeffa3cfdcb512e2011fb00fc23c7c1a7d75 for this issue.

@rkitover
Copy link
Author

As a temporary measure, since I do want to put my second GPU back in, is there some way, perhaps in loader.conf, to disable my second GPU so that amdgpu does not try to initialize it?

@valpackett
Copy link
Contributor

Maybe in /boot/device.hints https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/device-hints.html

Something like hint.drmn.1.disabled=1 or hint.drm.1.disabled=1 could work? (not sure what exactly the driver name would be)

@rkitover
Copy link
Author

I will try, thank you very much.

@rkitover
Copy link
Author

Before I put the second GPU back in, I determined that the correct loader.conf invocation is as you said:

hint.drmn.1.disabled=1

with the second GPU back in, it initializes the first GPU but panics when xorg is being started:

amdgpu panic

@valpackett
Copy link
Contributor

Huh. So reproducibly, always when the second GPU is present (but not even initialized, no dmesg lines for it), there are vm_fault panics, but they don't happen without the second GPU? I guess something in our memory code doesn't handle multiple GPUs :/

@rkitover
Copy link
Author

Well I could try playing with the code to at least get more information about this, do you have any tips for working with this codebase and debugging etc. and any specific places I should look at to start?

Also I wanted to say that I really appreciate all your help on this, can I buy you a case of beer?

@valpackett
Copy link
Contributor

tips for working with this codebase and debugging etc

https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

any specific places I should look at

Well the functions in the backtrace here I guess..

(Really, first just confirm that this is reproducible, i.e. every time you have a second GPU this crash happens and every time you don't have it, it doesn't.)

can I buy you a case of beer?

I don't drink :P

@rkitover
Copy link
Author

Sure that's easy enough to do, I can just unplug the power cable.

@rkitover
Copy link
Author

rkitover commented Jan 1, 2021

I don't drink :P

I meant like, do you have a paypal or patreon or whatever link for sponsoring freebsd development, using your beverage of choice.

@rkitover
Copy link
Author

rkitover commented Jan 9, 2021

@myfreeweb I have once again verified that this is the case. If I unplug the pcie power from my second GPU then I can start xorg on amdgpu.

Also I realized that I can just do this for now, unplug the pcie power when I want to boot FreeBSD. At least for now.

This is a huge improvement over my previous situation where my only choices where an NVIDIA GPU or scfb, thank you very much.

KDE does not seem to be working very well for me here, I'll do the necessary follow up work on that, but in the meantime if I can't fix it I"ll just install xfce so I have a working desktop and can start playing with FreeBSD as a daily driver.

Once I do that, I will look at this backtrace and see if I can do anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants