Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use vmap for all activations #221

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

AStupidBear
Copy link
Contributor

@AStupidBear AStupidBear commented Jul 11, 2020

fix #220

using NNlib, BenchmarkTools

ACTIVATION_FUNCTIONS = [σ, hardσ, logσ, hardtanh, relu, leakyrelu, relu6, rrelu, elu, gelu, celu, swish, lisht, selu, trelu, softplus, softsign, logcosh, mish, tanhshrink, softshrink];

for T in (Float32, Float64)
    x = rand(T, 500)
    for a in ACTIVATION_FUNCTIONS
        @show a
        @btime $a.($x)
        @btime map($a, $x)
    end
end
a = NNlib.σ
  753.493 ns (1 allocation: 2.13 KiB)
  4.052 μs (2 allocations: 2.14 KiB)
a = NNlib.hardσ
  644.024 ns (1 allocation: 2.13 KiB)
  1.653 μs (2 allocations: 2.14 KiB)
a = NNlib.logσ
  2.659 μs (1 allocation: 2.13 KiB)
  27.581 μs (2 allocations: 2.14 KiB)
a = NNlib.hardtanh
  542.984 ns (1 allocation: 2.13 KiB)
  2.261 μs (2 allocations: 2.14 KiB)
a = NNlib.relu
  576.228 ns (1 allocation: 2.13 KiB)
  716.754 ns (2 allocations: 2.14 KiB)
a = NNlib.leakyrelu
  531.237 ns (1 allocation: 2.13 KiB)
  1.965 μs (2 allocations: 2.14 KiB)
a = NNlib.relu6
  524.547 ns (1 allocation: 2.13 KiB)
  1.252 μs (2 allocations: 2.14 KiB)
a = NNlib.rrelu
  919.600 ns (1 allocation: 2.13 KiB)
  5.401 μs (2 allocations: 2.14 KiB)
a = NNlib.elu
  794.806 ns (1 allocation: 2.13 KiB)
  3.864 μs (2 allocations: 2.14 KiB)
a = NNlib.gelu
  1.525 μs (1 allocation: 2.13 KiB)
  11.793 μs (2 allocations: 2.14 KiB)
a = NNlib.celu
  824.069 ns (1 allocation: 2.13 KiB)
  3.813 μs (2 allocations: 2.14 KiB)
a = NNlib.swish
  799.932 ns (1 allocation: 2.13 KiB)
  4.438 μs (2 allocations: 2.14 KiB)
a = NNlib.lisht
  1.367 μs (1 allocation: 2.13 KiB)
  9.907 μs (2 allocations: 2.14 KiB)
a = NNlib.selu
  779.846 ns (1 allocation: 2.13 KiB)
  4.023 μs (2 allocations: 2.14 KiB)
a = NNlib.trelu
  524.082 ns (1 allocation: 2.13 KiB)
  654.231 ns (2 allocations: 2.14 KiB)
a = NNlib.softplus
  2.565 μs (1 allocation: 2.13 KiB)
  28.825 μs (2 allocations: 2.14 KiB)
a = NNlib.softsign
  565.306 ns (1 allocation: 2.13 KiB)
  807.525 ns (2 allocations: 2.14 KiB)
a = NNlib.logcosh
  2.840 μs (1 allocation: 2.13 KiB)
  31.271 μs (2 allocations: 2.14 KiB)
a = NNlib.mish
  4.533 μs (1 allocation: 2.13 KiB)
  46.708 μs (2 allocations: 2.14 KiB)
a = NNlib.tanhshrink
  1.415 μs (1 allocation: 2.13 KiB)
  9.858 μs (2 allocations: 2.14 KiB)
a = NNlib.softshrink
  528.125 ns (1 allocation: 2.13 KiB)
  2.321 μs (2 allocations: 2.14 KiB)

a = NNlib.σ
  1.169 μs (1 allocation: 4.06 KiB)
  5.805 μs (2 allocations: 4.08 KiB)
a = NNlib.hardσ
  838.711 ns (1 allocation: 4.06 KiB)
  1.612 μs (2 allocations: 4.08 KiB)
a = NNlib.logσ
  5.995 μs (1 allocation: 4.06 KiB)
  34.751 μs (2 allocations: 4.08 KiB)
a = NNlib.hardtanh
  677.167 ns (1 allocation: 4.06 KiB)
  2.299 μs (2 allocations: 4.08 KiB)
a = NNlib.relu
  745.859 ns (1 allocation: 4.06 KiB)
  722.837 ns (2 allocations: 4.08 KiB)
a = NNlib.leakyrelu
  742.216 ns (1 allocation: 4.06 KiB)
  1.854 μs (2 allocations: 4.08 KiB)
a = NNlib.relu6
  723.690 ns (1 allocation: 4.06 KiB)
  1.297 μs (2 allocations: 4.08 KiB)
a = NNlib.rrelu
  1.327 μs (1 allocation: 4.06 KiB)
  5.316 μs (2 allocations: 4.08 KiB)
a = NNlib.elu
  1.120 μs (1 allocation: 4.06 KiB)
  5.244 μs (2 allocations: 4.08 KiB)
a = NNlib.gelu
  3.206 μs (1 allocation: 4.06 KiB)
  15.002 μs (2 allocations: 4.08 KiB)
a = NNlib.celu
  1.176 μs (1 allocation: 4.06 KiB)
  5.046 μs (2 allocations: 4.08 KiB)
a = NNlib.swish
  1.181 μs (1 allocation: 4.06 KiB)
  5.662 μs (2 allocations: 4.08 KiB)
a = NNlib.lisht
  2.822 μs (1 allocation: 4.06 KiB)
  13.317 μs (2 allocations: 4.08 KiB)
a = NNlib.selu
  1.079 μs (1 allocation: 4.06 KiB)
  5.663 μs (2 allocations: 4.08 KiB)
a = NNlib.trelu
  774.219 ns (1 allocation: 4.06 KiB)
  785.364 ns (2 allocations: 4.08 KiB)
a = NNlib.softplus
  5.846 μs (1 allocation: 4.06 KiB)
  37.976 μs (2 allocations: 4.08 KiB)
a = NNlib.softsign
  718.581 ns (1 allocation: 4.06 KiB)
  993.700 ns (2 allocations: 4.08 KiB)
a = NNlib.logcosh
  6.133 μs (1 allocation: 4.06 KiB)
  45.347 μs (2 allocations: 4.08 KiB)
a = NNlib.mish
  11.400 μs (1 allocation: 4.06 KiB)
  57.121 μs (2 allocations: 4.08 KiB)
a = NNlib.tanhshrink
  2.799 μs (1 allocation: 4.06 KiB)
  13.361 μs (2 allocations: 4.08 KiB)
a = NNlib.softshrink
  729.505 ns (1 allocation: 4.06 KiB)
  2.459 μs (2 allocations: 4.08 KiB)

@CarloLucibello
Copy link
Member

could you update the OP with the benchmark result for your system?

@CarloLucibello
Copy link
Member

we are missing tanh here, unless we decide to exclude in order to avoid type piracy

@@ -118,7 +119,7 @@ elu(x::RealOrFloatType, α = one(x)) = vifelse(x ≥ 0, x / one(x), α * (exp(x)
activation function.
"""
function gelu(x::RealOrFloatType)
p = oftype(x / 1, π)
p = oftype(x / 1, Float64(π))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this hardcoding the type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zygote willl fail for oftype(..., ::Irrational)

@DhairyaLGandhi
Copy link
Member

Needs testing against Zygote and CUDA to make sure we don't break any dispatch that we are relying on.

@AStupidBear
Copy link
Contributor Author

AStupidBear commented Jul 14, 2020

Zygote is already tested against here. But testing against CUDA may be out of the scope of NNlib?

@johnnychen94
Copy link

johnnychen94 commented Jul 18, 2020

Do we need to update the lower bound for the LoopVectorization version in Project.toml? I don't know what happens under the hook, just ask for sure.

@CarloLucibello
Copy link
Member

given #224 and the discussion in FluxML/Flux.jl#1272, I no longer think this is the correct way forward. The whole vectorization logic should live in Flux's layers definitions, and we should revert NNlib to its pre-LoopVectorization state

@DhairyaLGandhi
Copy link
Member

We should revert the vectorisation stuff and release a patch that drops the packages from dependencies.

@DhairyaLGandhi
Copy link
Member

Can you also add the same benchmarks using a simple Dense(28*28, 50) to see the relative speedup

@avx f.(w * x .+ b) basically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use vmap for all activations
4 participants