tests: add gradient tests for all backends #932

JohannesGaessler · 2024-08-26T16:30:43Z

My goal with this PR is to add tests that check gradients for the backwards pass in a generic way for all backends. My current state is that I can successfully calculate and check gradients for CUDA as long as all ops are FP32 and supported. This PR is nowhere near usable but I'm opening it anyways to discuss the best way to implement the tests. Right now I'm doing the tests via modification of tests/test-grad0 (and some hacks) but I think long-term it would make more sense to re-use the code in tests/test-backend-ops. From my perspective the easiest way to do this would be extend the existing code constructs in tests/test-backend-ops with a new mode that checks gradients numerically. @slaren your feedback would be appreciated.

slaren · 2024-08-27T18:05:54Z

I am not very familiar with the test-grad0 (or any of the training code), but I think this could be very well integrated into test-backend-ops. The eval mode could also be extended to optionally generate and compare backwards graphs, to compare the result with the CPU backend, though I don't know if that would add anything of value over just checking the gradients by computing them numerically.

JohannesGaessler · 2024-08-31T18:16:38Z

I pushed an extension to test-backend-ops that adds a new mode for numerically checking the gradients calculated by the backwards pass. Some comments:

I am using the mean absolute asymmetry to judge whether the results from the backwards pass are correct. The asymmetry of $a$ and $b$ is defined as $(a - b) / (a + b)$. It is similar to the mean absolute relative error but more stable if the reference value goes towards zero. The threshold for most tests is $10^{-4}$ or $10^{-3}$ if the gradient calculation is more difficult.
For some of the more difficult operations 5 instead of 3 points are used to estimate the gradient.
I reduced the input sizes for most operations where I got the impression that it was selected arbitrarily. The primary reason is simply that the numerical calculation of gradients is very slow. I am also skipping test cases where more than 10000 would be checked. My secondary reason for reducing the input sizes is that for non-scalar outputs the results are combined via summing them up and the fewer input values there are to this sum the more precisely the numerical gradients can be calculated.
I added missing GGML_ASSERT to unsupported ops that would fail once the backwards pass is invoked. I did this for consistency but I think it's kind of a bad solution. I think long-term it would be good to at least have a CPU implementation for the backwards pass of all ops so that the compute graph can be safely constructed and simply checked with ggml_backend_supports_op (right now there is additional logic in the test code to prevent crashes from the failed asserts).
I added CUDA support for SUM, ADD1, and NEG.

tests/test-backend-ops.cpp

ggerganov · 2024-09-01T06:58:54Z

tests/test-backend-ops.cpp

+
+        ggml_build_forward_expand(gf, out);
+        ggml_graph_cpy(gf, gb);
+        ggml_build_backward_expand(ctx, gf, gb, true); // TODO why can the results sometimes be wrong with keep == false?


Unless we are computing second-order gradient (i.e. backward of backward) the keep should be set to false (though it should work with keep == true as well). In this case, if the results are sometimes wrong when keep == false, this would mean there is likely a bug somewhere.

The computation of backward graphs assumes that the initial grad tensors are initialized with zeros. This was the purpose of ggml_graph_reset(), though today this function only makes sense for the CPU backend.

How do I reproduce this issue?

On the last commit I also no longer have issues with keep == false. The problem I was seeing was likely caused by #943 and I misinterpreted the cause.

ggerganov

Some of the gradient tests are failing occasionally (SIN, COS, ROPE) - I guess the error threshold might have to be adjusted, but we can do that as we go.

Might want to wait for @slaren review as well before merging

JohannesGaessler · 2024-09-02T13:41:35Z

The failure rate of SIN and COS was ~12%. With the adjusted parameters the failure rate for a threshold of $10^{-4}$ was ~4%. For the increased threshold of $10^{-3}$ the failure rate is ~0.1%.

slaren

There are a lot more cases like the last two. If every created tensor needs to be a param, do it automatically in the ggml_new_tensor functions in parent class. If the reason some are disabled is because ggml does not support backwards pass, then extract that to a function that makes that clear, and disable it by default.
It is very important that test-backend-ops remains very simple to add new test cases for, and nobody is going to understand if they need to add calls to ggml_set_param for their test or not.

tests/test-backend-ops.cpp

slaren · 2024-09-02T14:01:27Z

tests/test-backend-ops.cpp

+        if (ggml_is_matrix(in) && ggml_is_vector(rows)) {
+            ggml_set_param(ctx, in);
+        }


slaren · 2024-09-02T14:02:49Z

tests/test-backend-ops.cpp

+        if (op == ggml_add || ggml_are_same_shape(a, b)) {
+            ggml_set_param(ctx, a);
+            ggml_set_param(ctx, b);
+        }


JohannesGaessler · 2024-09-02T14:51:46Z

There are a lot more cases like the last two. If every created tensor needs to be a param, do it automatically in the ggml_new_tensor functions in parent class. If the reason some are disabled is because ggml does not support backwards pass, then extract that to a function that makes that clear, and disable it by default.

There are some tricky edge cases like CROSS_ENTROPY_LOSS where the backwards pass calculates gradients only for the logits but not the labels even though a variation of the labels leads to a change in the output. Would something like this be okay:

Add a function like should_be_param(const struct ggml_tensor * t) to determine which tensors should be made parameters (and that defaults to false).
Extend ggml_new_tensor with an optional argument for the tensor name that can be used to resolve edge cases in should_be_param when it's not possible to do so by e.g. type.

slaren · 2024-09-02T15:25:52Z

I think it would be enough to add some comments explaining the cases where ggml_set_param is omitted, and remove the commented ones from the operators that do not support a backwards pass.

tests/test-backend-ops.cpp

JohannesGaessler · 2024-09-03T12:54:57Z

I added a small bit of documentation to the top of the file to explain the intended usage as well as a heavily commented example struct for a new op that covers the bare minimum needed for a new test. Going through the code I noticed that it's possible to avoid explicit logic for ggml_set_param for at least part of the ops so I did that where possible.

tests: add gradient checking to test-backend-ops

4e15cdf

JohannesGaessler force-pushed the backend-train branch from bcc78e0 to 4e15cdf Compare August 31, 2024 17:38

ggerganov reviewed Sep 1, 2024

View reviewed changes

JohannesGaessler added 2 commits September 1, 2024 09:41

remove old comment

2351e87

reorder includes

d8f9b51

ggerganov approved these changes Sep 2, 2024

View reviewed changes

adjust SIN/COS parameters

1296c71

slaren reviewed Sep 2, 2024

View reviewed changes

tests/test-backend-ops.cpp Show resolved Hide resolved

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

add documentation, use supports_op if possible

8657257

JohannesGaessler force-pushed the backend-train branch from d419343 to 8657257 Compare September 3, 2024 12:51

slaren approved these changes Sep 3, 2024

View reviewed changes

JohannesGaessler merged commit d02b23d into ggerganov:master Sep 3, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: add gradient tests for all backends #932

tests: add gradient tests for all backends #932

JohannesGaessler commented Aug 26, 2024

slaren commented Aug 27, 2024

JohannesGaessler commented Aug 31, 2024

ggerganov Sep 1, 2024

JohannesGaessler Sep 1, 2024

ggerganov left a comment

JohannesGaessler commented Sep 2, 2024

slaren left a comment

slaren Sep 2, 2024

slaren Sep 2, 2024

JohannesGaessler commented Sep 2, 2024

slaren commented Sep 2, 2024

JohannesGaessler commented Sep 3, 2024

tests: add gradient tests for all backends #932

tests: add gradient tests for all backends #932

Conversation

JohannesGaessler commented Aug 26, 2024

slaren commented Aug 27, 2024

JohannesGaessler commented Aug 31, 2024

ggerganov Sep 1, 2024

Choose a reason for hiding this comment

JohannesGaessler Sep 1, 2024

Choose a reason for hiding this comment

ggerganov left a comment

Choose a reason for hiding this comment

JohannesGaessler commented Sep 2, 2024

slaren left a comment

Choose a reason for hiding this comment

slaren Sep 2, 2024

Choose a reason for hiding this comment

slaren Sep 2, 2024

Choose a reason for hiding this comment

JohannesGaessler commented Sep 2, 2024

slaren commented Sep 2, 2024

JohannesGaessler commented Sep 3, 2024