Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DepthwiseConv does not run on GPU #459

Closed
terasakisatoshi opened this issue Oct 24, 2018 · 24 comments · Fixed by #1921
Closed

DepthwiseConv does not run on GPU #459

terasakisatoshi opened this issue Oct 24, 2018 · 24 comments · Fixed by #1921
Milestone

Comments

@terasakisatoshi
Copy link

I'm happy to hear Adds support for Depthwise Convolutions #279 is merged into master.

I updated the latest flux i.e. pkg> add Flux#master.

Here is a sample code I tested

using Flux:DepthwiseConv
num_ch=10
test_input = rand(224,224,num_ch,2) #WHCB style
depthwiseconv = DepthwiseConv((3,3),num_ch,stride=1,pad=1)

@show size(depthwiseconv(test_input)) # Works fine

This works fine, but once I copy depthwiseconv layer into GPU via gpu function

using CuArrays
depthwiseconv = depthwiseconv |> gpu
test_input = test_input |> gpu
@show size(depthwiseconv(test_input)) # Oops

the code above occurs error with the following message:

ERROR: LoadError: UndefVarError: gpu not defined
Stacktrace:
 [1] top-level scope at none:0
 [2] include_string(::Module, ::String, ::String) at ./loading.jl:1002
 [3] (::getfield(Atom, Symbol("##129#135")){String,String,Module})() at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:125
 [4] withpath(::getfield(Atom, Symbol("##129#135")){String,String,Module}, ::String) at /home/terasaki/.julia/packages/CodeTools/hB4Hy/src/utils.jl:30
 [5] withpath at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:46 [inlined]
 [6] #128 at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:122 [inlined]
 [7] with_logstate(::getfield(Atom, Symbol("##128#134")){String,String,Module}, ::Base.CoreLogging.LogState) at ./logging.jl:397
 [8] with_logger at ./logging.jl:493 [inlined]
 [9] #127 at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:121 [inlined]
 [10] hideprompt(::getfield(Atom, Symbol("##127#133")){String,String,Module}) at /home/terasaki/.julia/packages/Atom/7rQ1O/src/repl.jl:85
 [11] macro expansion at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:120 [inlined]
 [12] (::getfield(Atom, Symbol("##126#132")){Dict{String,Any}})() at ./task.jl:85
in expression starting at /home/terasaki/work/juliaExer/neuralNetwork/fluxExer/layers/depthwiseconv.jl:10
@MikeInnes
Copy link
Member

Try using Flux: gpu. If you write using Flux: depthwiseconv then you'll only get that specific name.

You can also use cu instead of gpu for single arrays.

@terasakisatoshi
Copy link
Author

terasakisatoshi commented Oct 24, 2018

I'm sorry I should have write using Flux.

The following code is what I wanted to say ......

using Flux:DepthwiseConv

num_ch=10
test_input = rand(224,224,num_ch,2) #WHCB style
depthwiseconv = DepthwiseConv((3,3),num_ch,stride=1,pad=1)

@show size(depthwiseconv(test_input)) # Works fine

using Flux
using CuArrays
depthwiseconv = depthwiseconv |> gpu
test_input = test_input |> gpu
@show size(depthwiseconv(test_input)) # Oops

EDITED this occurs the following error.

ERROR: LoadError: conversion to pointer not defined for CuArray{Float32,2}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] unsafe_convert(::Type{Ptr{Float32}}, ::CuArray{Float32,2}) at ./pointer.jl:67
 [3] pointer at ./abstractarray.jl:861 [inlined]
 [4] #depthwiseconv2d!#37(::Tuple{Int64,Int64}, ::Tuple{Int64,Int64}, ::Int64, ::Float32, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at /home/terasaki/.julia/packages/NNlib/0EAe7/src/impl/conv.jl:200
 [5] #depthwiseconv2d! at ./none:0 [inlined]
 [6] #depthwiseconv!#69 at /home/terasaki/.julia/packages/NNlib/0EAe7/src/conv.jl:100 [inlined]
 [7] #depthwiseconv! at ./none:0 [inlined]
 [8] #depthwiseconv#68(::Tuple{Int64,Int64}, ::Tuple{Int64,Int64}, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at /home/terasaki/.julia/packages/NNlib/0EAe7/src/conv.jl:96

@MikeInnes
Copy link
Member

I guess we may not have an implementation in CuArrays yet; cc @avik-pal

@avik-pal
Copy link
Member

Yes that is the case. Since the CPU code is by calling the im2col (thats were the conversion to pointer error is coming from) and corresponding functions, the implementation does not work on GPU. However, this can be fixed easily by using CUDNN (I recently came to know about this feature).
This function needs to be added and group count should be equal to the no of channels.
Also this cudnn function will allow us to support grouped convolutions. :)

@Sleort
Copy link
Contributor

Sleort commented Jul 27, 2019

What's the status on this issue, @avik-pal ? My current research depends on DepthwiseConv, thus I'm currently limited to the CPU...

Unfortunately, I'm not really familiar with CUDNN/CuArrays.jl "under the hood", but if there's anything I'm able to assist with, I'm happy to help...

@dbadrian
Copy link

dbadrian commented Aug 9, 2019

I second Sleort's comment. I'm highly interested in getting this available on the GPU. Just CPU performance is prohibitively slow except for the tiniest of problems.

@avik-pal
Copy link
Member

avik-pal commented Aug 9, 2019

@Sleort @dbadrian I am currently working on some other problems so won't be able to get to this immediately. I had linked the function which needs to be wrapped for this functionality.

You could look into the functions in libcudnn.jl to see how these are wrapped.

@dbadrian
Copy link

dbadrian commented Aug 9, 2019

Admittedly, at this point, this is out of my depth to fully grasp what to do.

I gather you are referring to https://github.com/JuliaGPU/CuArrays.jl/blob/master/src/dnn/libcudnn.jl ?

Is the linked function the only thing you require to be wrapped? Where do you think this should go? Essentially libcudnn would be the appropriate place, right? Tell me where, Ill do the PR if so desired.

function cudnnSetConvolutionGroupCount(convDesc,groupCount)
    @check ccall((:cudnnSetConvolutionGroupCount,libcudnn),
                 cudnnStatus_t,
                 (cudnnConvolutionDescriptor_t,Cint),
                 convDesc,groupCount)
end

function cudnnGetConvolutionGroupCount(convDesc,groupCount)
    @check ccall((:cudnnGetConvolutionGroupCount,libcudnn),
                 cudnnStatus_t,
                 (cudnnConvolutionDescriptor_t,Ptr{Cint}),
                 convDesc,groupCount)
end

Would be great to see this supported in Flux, as its an extremely useful feature for various architectures.

Would then, as you said regarding group conv, also resolve #330.

@avik-pal
Copy link
Member

avik-pal commented Aug 9, 2019

Yes this is the only function to be wrapped. You should open the pr in CuArrays.jl and we can discuss there.

The next thing would be to modify the ConvDesc call here to call the wrapped function first.

@dbadrian
Copy link

dbadrian commented Aug 9, 2019

Great, opened the PR as you probably been notified about. Happy to have a crack at other changes required as well (albeit with some help probably :)).

@00sapo
Copy link

00sapo commented Jul 13, 2021

Hello, any news?

@ToucheSir
Copy link
Member

ToucheSir commented Jul 13, 2021

The PRs to follow are #948 and FluxML/NNlib.jl#146. If you're interested in getting this through, contributions are always welcome ;)

@mateuszatki
Copy link

Looks like last comments on this are pretty old, is there any walk-around currently to run DepthwiseConv on GPU? Do you need help with testing?

@DhairyaLGandhi
Copy link
Member

Yes please, this should be usable now.

@camillasterud-sintef
Copy link

Any progress on this?

@ToucheSir
Copy link
Member

You can use depthwise convolutions on GPU today by simply setting groups = input channels in Conv as is done in other frameworks. DepthwiseConv is a bit of an aberration and it's an open question whether that type should exist at all :)

@camillasterud-sintef
Copy link

Great! Thanks :)

@CarloLucibello
Copy link
Member

We could deprecate DepthwiseConv or rewire it to Conv. Marking this has a decision to be made for next release.

@CarloLucibello CarloLucibello added this to the v0.13 milestone Mar 11, 2022
@mcabbott
Copy link
Member

So we just need this, and delete it all?

function FakeDepthwiseConv(k::NTuple{<:Any,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity; 
    stride=1, pad=0, dilation=1, bias=true, init=glorot_uniform)
  Conv(k, ch, σ; groups=ch.first, stride, pad, dilation, bias, init)
end

function FakeDepthwiseConv(w::AbstractArray{T,N}, bias = true, σ = identity;
                      stride = 1, pad = 0, dilation = 1) where {T,N}
  w2 = reshape(w, size(w)[1:end-2]..., 1, :)
  Conv(w2, bias, σ; groups = size(w)[end-1], stride, pad, dilation)
end

@CarloLucibello
Copy link
Member

I think so

@ToucheSir
Copy link
Member

We need to setup forwarding to the CPU depthwise kernels in NNlib based on group count. That's the tricky bit and I'm not sure how best to do it.

@mcabbott
Copy link
Member

Are you thinking that these are more efficient than the groups code?

@CarloLucibello
Copy link
Member

We need to setup forwarding to the CPU depthwise kernels in NNlib based on group count. That's the tricky bit and I'm not sure how best to do it.

choosing the most performant implementation based on the group count (if there is any need for that) should be handled by NNlib itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.