DepthwiseConv does not run on GPU #459

terasakisatoshi · 2018-10-24T15:11:59Z

I'm happy to hear Adds support for Depthwise Convolutions #279 is merged into master.

I updated the latest flux i.e. pkg> add Flux#master.

Here is a sample code I tested

using Flux:DepthwiseConv
num_ch=10
test_input = rand(224,224,num_ch,2) #WHCB style
depthwiseconv = DepthwiseConv((3,3),num_ch,stride=1,pad=1)

@show size(depthwiseconv(test_input)) # Works fine

This works fine, but once I copy depthwiseconv layer into GPU via gpu function

using CuArrays
depthwiseconv = depthwiseconv |> gpu
test_input = test_input |> gpu
@show size(depthwiseconv(test_input)) # Oops

the code above occurs error with the following message:

ERROR: LoadError: UndefVarError: gpu not defined
Stacktrace:
 [1] top-level scope at none:0
 [2] include_string(::Module, ::String, ::String) at ./loading.jl:1002
 [3] (::getfield(Atom, Symbol("##129#135")){String,String,Module})() at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:125
 [4] withpath(::getfield(Atom, Symbol("##129#135")){String,String,Module}, ::String) at /home/terasaki/.julia/packages/CodeTools/hB4Hy/src/utils.jl:30
 [5] withpath at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:46 [inlined]
 [6] #128 at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:122 [inlined]
 [7] with_logstate(::getfield(Atom, Symbol("##128#134")){String,String,Module}, ::Base.CoreLogging.LogState) at ./logging.jl:397
 [8] with_logger at ./logging.jl:493 [inlined]
 [9] #127 at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:121 [inlined]
 [10] hideprompt(::getfield(Atom, Symbol("##127#133")){String,String,Module}) at /home/terasaki/.julia/packages/Atom/7rQ1O/src/repl.jl:85
 [11] macro expansion at /home/terasaki/.julia/packages/Atom/7rQ1O/src/eval.jl:120 [inlined]
 [12] (::getfield(Atom, Symbol("##126#132")){Dict{String,Any}})() at ./task.jl:85
in expression starting at /home/terasaki/work/juliaExer/neuralNetwork/fluxExer/layers/depthwiseconv.jl:10

The text was updated successfully, but these errors were encountered:

MikeInnes · 2018-10-24T15:17:27Z

Try using Flux: gpu. If you write using Flux: depthwiseconv then you'll only get that specific name.

You can also use cu instead of gpu for single arrays.

terasakisatoshi · 2018-10-24T15:17:34Z

I'm sorry I should have write using Flux.

The following code is what I wanted to say ......

using Flux:DepthwiseConv

num_ch=10
test_input = rand(224,224,num_ch,2) #WHCB style
depthwiseconv = DepthwiseConv((3,3),num_ch,stride=1,pad=1)

@show size(depthwiseconv(test_input)) # Works fine

using Flux
using CuArrays
depthwiseconv = depthwiseconv |> gpu
test_input = test_input |> gpu
@show size(depthwiseconv(test_input)) # Oops

EDITED this occurs the following error.

ERROR: LoadError: conversion to pointer not defined for CuArray{Float32,2}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] unsafe_convert(::Type{Ptr{Float32}}, ::CuArray{Float32,2}) at ./pointer.jl:67
 [3] pointer at ./abstractarray.jl:861 [inlined]
 [4] #depthwiseconv2d!#37(::Tuple{Int64,Int64}, ::Tuple{Int64,Int64}, ::Int64, ::Float32, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at /home/terasaki/.julia/packages/NNlib/0EAe7/src/impl/conv.jl:200
 [5] #depthwiseconv2d! at ./none:0 [inlined]
 [6] #depthwiseconv!#69 at /home/terasaki/.julia/packages/NNlib/0EAe7/src/conv.jl:100 [inlined]
 [7] #depthwiseconv! at ./none:0 [inlined]
 [8] #depthwiseconv#68(::Tuple{Int64,Int64}, ::Tuple{Int64,Int64}, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at /home/terasaki/.julia/packages/NNlib/0EAe7/src/conv.jl:96

MikeInnes · 2018-10-24T15:23:45Z

I guess we may not have an implementation in CuArrays yet; cc @avik-pal

avik-pal · 2018-10-24T16:20:36Z

Yes that is the case. Since the CPU code is by calling the im2col (thats were the conversion to pointer error is coming from) and corresponding functions, the implementation does not work on GPU. However, this can be fixed easily by using CUDNN (I recently came to know about this feature).
This function needs to be added and group count should be equal to the no of channels.
Also this cudnn function will allow us to support grouped convolutions. :)

Sleort · 2019-07-27T07:27:56Z

What's the status on this issue, @avik-pal ? My current research depends on DepthwiseConv, thus I'm currently limited to the CPU...

Unfortunately, I'm not really familiar with CUDNN/CuArrays.jl "under the hood", but if there's anything I'm able to assist with, I'm happy to help...

dbadrian · 2019-08-09T08:18:04Z

I second Sleort's comment. I'm highly interested in getting this available on the GPU. Just CPU performance is prohibitively slow except for the tiniest of problems.

avik-pal · 2019-08-09T08:27:06Z

@Sleort @dbadrian I am currently working on some other problems so won't be able to get to this immediately. I had linked the function which needs to be wrapped for this functionality.

You could look into the functions in libcudnn.jl to see how these are wrapped.

dbadrian · 2019-08-09T11:54:31Z

Admittedly, at this point, this is out of my depth to fully grasp what to do.

I gather you are referring to https://github.com/JuliaGPU/CuArrays.jl/blob/master/src/dnn/libcudnn.jl ?

Is the linked function the only thing you require to be wrapped? Where do you think this should go? Essentially libcudnn would be the appropriate place, right? Tell me where, Ill do the PR if so desired.

function cudnnSetConvolutionGroupCount(convDesc,groupCount)
    @check ccall((:cudnnSetConvolutionGroupCount,libcudnn),
                 cudnnStatus_t,
                 (cudnnConvolutionDescriptor_t,Cint),
                 convDesc,groupCount)
end

function cudnnGetConvolutionGroupCount(convDesc,groupCount)
    @check ccall((:cudnnGetConvolutionGroupCount,libcudnn),
                 cudnnStatus_t,
                 (cudnnConvolutionDescriptor_t,Ptr{Cint}),
                 convDesc,groupCount)
end

Would be great to see this supported in Flux, as its an extremely useful feature for various architectures.

Would then, as you said regarding group conv, also resolve #330.

avik-pal · 2019-08-09T12:09:51Z

Yes this is the only function to be wrapped. You should open the pr in CuArrays.jl and we can discuss there.

The next thing would be to modify the ConvDesc call here to call the wrapped function first.

dbadrian · 2019-08-09T12:54:12Z

Great, opened the PR as you probably been notified about. Happy to have a crack at other changes required as well (albeit with some help probably :)).

00sapo · 2021-07-13T18:05:28Z

Hello, any news?

ToucheSir · 2021-07-13T19:32:35Z

The PRs to follow are #948 and FluxML/NNlib.jl#146. If you're interested in getting this through, contributions are always welcome ;)

mateuszatki · 2021-11-30T09:02:16Z

Looks like last comments on this are pretty old, is there any walk-around currently to run DepthwiseConv on GPU? Do you need help with testing?

DhairyaLGandhi · 2021-11-30T09:21:53Z

Yes please, this should be usable now.

camillasterud-sintef · 2022-03-10T08:46:25Z

Any progress on this?

ToucheSir · 2022-03-10T15:07:09Z

You can use depthwise convolutions on GPU today by simply setting groups = input channels in Conv as is done in other frameworks. DepthwiseConv is a bit of an aberration and it's an open question whether that type should exist at all :)

camillasterud-sintef · 2022-03-11T13:32:39Z

Great! Thanks :)

CarloLucibello · 2022-03-11T13:44:36Z

We could deprecate DepthwiseConv or rewire it to Conv. Marking this has a decision to be made for next release.

mcabbott · 2022-03-20T12:19:00Z

So we just need this, and delete it all?

function FakeDepthwiseConv(k::NTuple{<:Any,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity; 
    stride=1, pad=0, dilation=1, bias=true, init=glorot_uniform)
  Conv(k, ch, σ; groups=ch.first, stride, pad, dilation, bias, init)
end

function FakeDepthwiseConv(w::AbstractArray{T,N}, bias = true, σ = identity;
                      stride = 1, pad = 0, dilation = 1) where {T,N}
  w2 = reshape(w, size(w)[1:end-2]..., 1, :)
  Conv(w2, bias, σ; groups = size(w)[end-1], stride, pad, dilation)
end

CarloLucibello · 2022-03-20T12:49:15Z

I think so

ToucheSir · 2022-03-20T15:19:42Z

We need to setup forwarding to the CPU depthwise kernels in NNlib based on group count. That's the tricky bit and I'm not sure how best to do it.

mcabbott · 2022-03-20T15:22:13Z

Are you thinking that these are more efficient than the groups code?

CarloLucibello · 2022-03-21T07:07:51Z

We need to setup forwarding to the CPU depthwise kernels in NNlib based on group count. That's the tricky bit and I'm not sure how best to do it.

choosing the most performant implementation based on the group count (if there is any need for that) should be handled by NNlib itself

ToucheSir · 2022-03-21T15:19:04Z

The main differences to note are https://github.com/FluxML/NNlib.jl/blob/master/src/impl/depthwiseconv_im2col.jl#L27-L46 vs https://github.com/FluxML/NNlib.jl/blob/master/src/impl/conv_im2col.jl#L45-L58, https://github.com/FluxML/NNlib.jl/blob/master/src/conv.jl#L266-L270 vs https://github.com/FluxML/NNlib.jl/blob/master/src/conv.jl#L186-L200, and https://github.com/FluxML/NNlib.jl/blob/master/src/dim_helpers/DepthwiseConvDims.jl. Unsure if it would be easier to try unifying the dims code or the implementations first.

MikeInnes added the help wanted label Oct 24, 2018

dbadrian mentioned this issue Aug 9, 2019

WIP: Add wrapper functions for setting/getting convolution group count JuliaGPU/CuArrays.jl#388

Closed

k8lion mentioned this issue Feb 6, 2021

Ability to pass groupcount parameter for depthwise and groupwise convolutions JuliaGPU/CuArrays.jl#523

Closed

1 task

CarloLucibello added this to the v0.13 milestone Mar 11, 2022

CarloLucibello mentioned this issue Mar 31, 2022

remove DepthwiseConv type in favor of Conv #1921

Merged

CarloLucibello closed this as completed in #1921 Apr 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DepthwiseConv does not run on GPU #459

DepthwiseConv does not run on GPU #459

terasakisatoshi commented Oct 24, 2018

MikeInnes commented Oct 24, 2018

terasakisatoshi commented Oct 24, 2018 •

edited

Loading

MikeInnes commented Oct 24, 2018

avik-pal commented Oct 24, 2018

Sleort commented Jul 27, 2019

dbadrian commented Aug 9, 2019

avik-pal commented Aug 9, 2019

dbadrian commented Aug 9, 2019 •

edited

Loading

avik-pal commented Aug 9, 2019

dbadrian commented Aug 9, 2019

00sapo commented Jul 13, 2021

ToucheSir commented Jul 13, 2021 •

edited

Loading

mateuszatki commented Nov 30, 2021

DhairyaLGandhi commented Nov 30, 2021

camillasterud-sintef commented Mar 10, 2022

ToucheSir commented Mar 10, 2022

camillasterud-sintef commented Mar 11, 2022

CarloLucibello commented Mar 11, 2022

mcabbott commented Mar 20, 2022

CarloLucibello commented Mar 20, 2022

ToucheSir commented Mar 20, 2022

mcabbott commented Mar 20, 2022

CarloLucibello commented Mar 21, 2022

ToucheSir commented Mar 21, 2022

DepthwiseConv does not run on GPU #459

DepthwiseConv does not run on GPU #459

Comments

terasakisatoshi commented Oct 24, 2018

MikeInnes commented Oct 24, 2018

terasakisatoshi commented Oct 24, 2018 • edited Loading

MikeInnes commented Oct 24, 2018

avik-pal commented Oct 24, 2018

Sleort commented Jul 27, 2019

dbadrian commented Aug 9, 2019

avik-pal commented Aug 9, 2019

dbadrian commented Aug 9, 2019 • edited Loading

avik-pal commented Aug 9, 2019

dbadrian commented Aug 9, 2019

00sapo commented Jul 13, 2021

ToucheSir commented Jul 13, 2021 • edited Loading

mateuszatki commented Nov 30, 2021

DhairyaLGandhi commented Nov 30, 2021

camillasterud-sintef commented Mar 10, 2022

ToucheSir commented Mar 10, 2022

camillasterud-sintef commented Mar 11, 2022

CarloLucibello commented Mar 11, 2022

mcabbott commented Mar 20, 2022

CarloLucibello commented Mar 20, 2022

ToucheSir commented Mar 20, 2022

mcabbott commented Mar 20, 2022

CarloLucibello commented Mar 21, 2022

ToucheSir commented Mar 21, 2022

terasakisatoshi commented Oct 24, 2018 •

edited

Loading

dbadrian commented Aug 9, 2019 •

edited

Loading

ToucheSir commented Jul 13, 2021 •

edited

Loading