Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Threading for NFFT | does not work and produces NaNs #117

Open
roflmaostc opened this issue Apr 12, 2023 · 6 comments
Open

Disable Threading for NFFT | does not work and produces NaNs #117

roflmaostc opened this issue Apr 12, 2023 · 6 comments

Comments

@roflmaostc
Copy link

roflmaostc commented Apr 12, 2023

Hi!

I wanted to disable threading such that I can calculate the NFFT along the first two dimensions, and loop myself along the third dim.
The for loop should be threaded since that should be more beneficial than threading the NFFT itself.

However, I can still see some threading happening.

Even worse, if I wrap a manual Threads.@threads in a loop, the results contain NaNs. Interestingly, using nfft directly, and not the plan, no NaNs.

Code:

# ╔═╡ fe43bf1c-d923-11ed-0a5a-49c652d9550e
using NFFT, FFTW

# ╔═╡ 1d50c640-1a26-4e0c-bfc2-9238f0f65f4f
FFTW.set_num_threads(1)

# ╔═╡ 5c0a30ab-dd03-4693-8411-1503e0f8d438
NFFT._use_threads[] = false

# ╔═╡ 5ee3ea66-123a-4875-8139-00f0571c86d9
arr = randn(ComplexF64, (1024, 1024, 10));

# ╔═╡ 5124e68a-2233-4653-831e-0020c7b490e2
coords = rand(-0.4:0.001:0.4, (2, 1024 * 1024))

# ╔═╡ d566e0b9-5e48-41d9-b59d-955aed49b036
function f(x, coords)
	p = NFFT.plan_nfft(coords, (size(x,1), size(x,2)))

	out = similar(x)
        # removing the @threads is ok but slow
	Threads.@threads for i in axes(x, 3)
		out[:, :, i] = reshape(p * x[:, :, i], size(x, 1), size(x,2))
	end
	return out
end

# ╔═╡ 878c4fc3-fafc-4266-b391-bc367dd70b08
abs.(f(arr, coords))

@roflmaostc roflmaostc changed the title Disable Threading for NFFT Disable Threading for NFFT | does not work and produces NaNs Apr 12, 2023
@tknopp
Copy link
Member

tknopp commented Apr 12, 2023

The NFFT plan has state and is therefore not thread safe. You need to make one plan per thread. We do this within MRIReco.

@roflmaostc
Copy link
Author

Ah, thanks! 😄

But still, NFFT is using more than 1 core if @threads is removed. Any idea about that?

@tknopp
Copy link
Member

tknopp commented Apr 12, 2023

yes, that is probably because in
https://github.com/JuliaMath/NFFT.jl/blob/master/src/implementation.jl#L87
we are missing a check.

The more deeper problem is that NFFT._use_threads[] will likely not work as one hopes. I think our threaded for-loops don't change after the first compilation. It seems that currently it is not possible to change the number of threads during runtime of a for loop in Julia. In other words: We can't do what FFTW.jl does, which is quite unfortunate.

@roflmaostc
Copy link
Author

For FFTW I believe that FFTW.forget_wisdom exists such that threading can be changed during runtime (however, I have the feeling that's also buggy ...).

How do you do it in MRIReco? Is that NFFT threaded and you thread the for loop over it?

@tknopp
Copy link
Member

tknopp commented Apr 12, 2023

In general we try to use nested parallelism, which seems to work ok. This means we have cases that are highly oversubscribed. I just recall that there was some issue with FFTW. Therefore if we copy the NFFTPlan, we do set the number of FFTW threads to 1 (implicitly) in this line:
https://github.com/JuliaMath/NFFT.jl/blob/master/src/implementation.jl#L60

Nested parallelism requires the @spawn based threads. @floops is compatible with them. @threads seems to be not (might change with 1.9).

@roflmaostc
Copy link
Author

I see.

Do you have an idea what's the best performance for CUDA in this case?

A naive for loop and calling CuNFFT in each iteration is not that fast. Hence, I was wondering if we can speed this up by using KernelAbstractions to parallelize the loop with CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants