Gradient calculation in paper #27

vb123er951 · 2020-05-15T09:45:21Z

Hi,
I am interested in CSPNet recently, and reading the paper: https://arxiv.org/pdf/1911.11929.pdf.
But I have a question about the gradient calculation in page 4, in the paper the gradient calculate as

w₁^' = f(w₁, g₀)
w₂^' = f(w₂, g₀, g₁)
...
w_k^' = f(w_k, g₀, g₁, g₂, ..., g_k-1)

Don't this part is calculated as this?

w₁^' = f(w₁, g₀, g₁, g₂, ..., g_k)
w₂^' = f(w₂, g₁, g₂, ..., g_k)
...
w_k^' = f(w_k, g_k)

also I want to confirm that if the definition of g_i is the partial differential of error to weight? that is, $\frac {\partial e} {\partial w_i} = g_i$

I was very confuse about this part, hope that you can help me.

WongKinYiu · 2020-05-15T09:49:18Z

baopmessi · 2021-01-12T07:04:50Z

I was still confuse about this part (what g_i). It mean :

So why " We can find that large amount of gradient information are reused for updating weights of different dense layers. This will result in different dense layers repeatedly learn copied gradient information."
Can you help me. Thank you

Pcyslist · 2021-10-17T09:44:51Z

if

so how does the gradient of weight of layer_0 express ? what's the mean of
?

Pcyslist · 2021-10-17T10:16:56Z

if you definate

then you would have to definate the g_0 of k+1_th layer as following

so the red rectangle of g_0s wil not be the same things , Your explanation of g0 is contrary to the repeated g0 in your paper.

WongKinYiu · 2021-10-17T10:53:06Z

Please notice that $g_{i}$ represents the gradient propagated to the $i^{th} layer, not the gradient generated from the $i^{th}$ layer.

The equation with red rectangle contains only one timestamp of full weights updating, so it shows the case of out-degrees of $k$-th layer. The full weight updating process will accumulate the whole timestamps of gradient.

It is too complicate to show timestamps $t_{j}$s in a equation. If you want to add gradient information of ${k+1}$-th layer in this equation, it means to add gradient generated at ${t-1}$-th timestamp. For this case, the $g$ have to add timestamp annotation too. For example $g^{t}_{0}$ and $g^{t-1}_{0}$. To understand more details about timestamp and partition of gradient, you may want to see Figure 5 and 6 of PRN paper. Edit: for general case, you have to note from_where, to_where, and timestamp of all gradients.

Pcyslist · 2021-10-17T14:45:55Z

thanks for your reply. @WongKinYiu
As you said, the gradient of the k+1_th layer is generated in the k-1_th timestamp, which is the general law of back propagation updating weight algorithm . so in equation 6, when calculating the gradient information generated by the k_th layer , we need to use only the generated gradients of (k+1~k+n) layers and the weights information of layers (1 ~ k-1), but not the gradient information of layer (1 ~ k-1) , because the gradients information of layers (1 ~ k-1) gi has not been generated (). So why do you update Wk in the formula with gi (1 < = i < = k-1) that has not been generated yet? Maybe I mean you should replace gi with wi. your explaination of gi is that gradient propagated to i_th layer , but what does the update of Wk have to do with gi ? maybe Wi will be OK?

WongKinYiu · 2021-10-17T15:21:02Z

Please notice that $g_{i}$ represents the gradient propagated to the $i^{th} layer, not the gradient generated from the $i^{th}$ layer. It means in the equation, the $g_{0}$, $g_{1}, ... are all generated by $k$-th layer at timestamp $t$, and then propagate to $0$-th, $1$-th, ... layers. In your description, the $g_{i}$ still means gradient generated by $i$-th layer, which is not same as our definition.

At a specific timestamp $t$, the gradient will propagate to all of layers which have shortcut layer connect to the current layer. Since the DenseNet has shortcut layers which connect to all previous layers, the gradient used to update $k$-th layer will also propagate to all of $0$-th, $1$-th, ... ${k-1}$-th layers. And due to the architecture has concatenations, it leads the equation become (1) and (2). From (1), you could see inputs of $k$-th layer is concatenation of outputs of all previous layers. It obviously the gradient for updating weights of $k$-th layer will propagate to all of previous layers according to their channel dimension.

Just take a glance on the figure you could know how $g_{0}$, $g_{1}$, ... are used to update weights of which layers.

Pcyslist · 2021-10-17T15:45:49Z

@WongKinYiu Thanks very much for your patient reply. Good luck to you.

Pcyslist · 2021-10-17T15:46:32Z

@WongKinYiu Thanks very much for your patient reply. Good luck to you.

NeoZng · 2022-02-06T09:26:40Z

@WongKinYiu dear author ,i still dont understand why we should use g_{0} to update w1.In your discription, g_{0} equals to

,but if we are going to update w1,we should use

in order to calculate

using the train rule.
and that is why i think there is no connection between

and

only when varieble x_k change can they affect the weights in previous layers,while the weights in later layer do nothing to previous ones.

And another question:

what do you mean by truncated？

JianjianSha · 2022-04-13T07:42:27Z

After reading the auther's interpretation above, why I think the gradient

should be propagated to i-th layer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient calculation in paper #27

Gradient calculation in paper #27

vb123er951 commented May 15, 2020

WongKinYiu commented May 15, 2020

baopmessi commented Jan 12, 2021 •

edited

Loading

Pcyslist commented Oct 17, 2021

Pcyslist commented Oct 17, 2021

WongKinYiu commented Oct 17, 2021 •

edited

Loading

Pcyslist commented Oct 17, 2021

WongKinYiu commented Oct 17, 2021 •

edited

Loading

Pcyslist commented Oct 17, 2021

Pcyslist commented Oct 17, 2021

NeoZng commented Feb 6, 2022 •

edited

Loading

JianjianSha commented Apr 13, 2022

Gradient calculation in paper #27

Gradient calculation in paper #27

Comments

vb123er951 commented May 15, 2020

WongKinYiu commented May 15, 2020

baopmessi commented Jan 12, 2021 • edited Loading

Pcyslist commented Oct 17, 2021

Pcyslist commented Oct 17, 2021

WongKinYiu commented Oct 17, 2021 • edited Loading

Pcyslist commented Oct 17, 2021

WongKinYiu commented Oct 17, 2021 • edited Loading

Pcyslist commented Oct 17, 2021

Pcyslist commented Oct 17, 2021

NeoZng commented Feb 6, 2022 • edited Loading

JianjianSha commented Apr 13, 2022

baopmessi commented Jan 12, 2021 •

edited

Loading

WongKinYiu commented Oct 17, 2021 •

edited

Loading

WongKinYiu commented Oct 17, 2021 •

edited

Loading

NeoZng commented Feb 6, 2022 •

edited

Loading