Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made critical changes to small_gemm #568

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Commits on Oct 28, 2021

  1. Made some critical changes to small_gemm kernels

    Details:
    - In case of GEMM, whenever beta is zero, we need to perform C = alpha
    *(A * B) instead of C = beta * C + alpha * (A * B)
     Added conditions to check the value of beta at different levels inside
     small_gemm kernels and decide whether to perform scaling C with beta or
     not.
    -Modified small_gemm kernels to use BLIS specific functions to retrieve
     different fields of objects.
    -Calling bli_gemm_check before entering bli_gemm_small to facilitate
     early return in case of invalid inputs.
    -For corner cases inside small_gemm kernels, a buffer called f_temp
     is used to load and store data to and from registers.
     populating the buffer with zeroes before use.
    -In bli_gemm_front, datatypes of status and return value from
     bli_gemm_small are not matching.
     Corrected the datatype of the variable 'status' inside bli_gemm_front
     to err_t.
    
    Change-Id: I8b52ad55008f028d6c8b7e0d20f746a869d9daea
    Signed-off-by: Meghana Vankadari <[email protected]>
    AMD-Internal: [CPUPL-689,SWLCSG-104]
    Meghana-vankadari committed Oct 28, 2021
    Configuration menu
    Copy the full SHA
    ed7780d View commit details
    Browse the repository at this point in the history
  2. Implemented 16x3 based gemm kernel for the case where A has transpose

    Details:
    - This implementation does a transpose operation while packing 16xk of A
      buffer and passes it to 16x3-nn kernel.
    - The same implementation works for the case where B has transpose.
    
    AMD-Internal: [CPUPL-1376]
    Change-Id: I81f74deb609926598f62c30f5bd6fc80fb1b9a17
    Meghana-vankadari committed Oct 28, 2021
    Configuration menu
    Copy the full SHA
    1c6d455 View commit details
    Browse the repository at this point in the history
  3. Disabled calling of bli_dgemm_small from gemm_front

    Details:
    - Decision logic to choose small_gemm has been moved to blas interface.
    - Redirecting all the calls to small_gemm from gemm_front to native
      implementation.
    
    AMD-Internal: [CPUPL-1376]
    Change-Id: I6490f67113e9f7c272269f441c86f2a0b3c89a53
    Meghana-vankadari committed Oct 28, 2021
    Configuration menu
    Copy the full SHA
    c597fa6 View commit details
    Browse the repository at this point in the history
  4. Fixed blastest failure for haswell configuration

    Details:
    - Placed optimized version of BLAS DGEMM, ZGEMM definitions under
      BLIS_CONFIG_EPYC as they use gemm small which are defined only
      for zen family configurations.
    - Added code to query and set cntx in gemv and trsv framework before
      cntx is referred for any function pointers to avoid querying
       from NULL pointer.
    
    AMD-Internal: [CPUPL-1562]
    Change-Id: I977d028ec4ddb57dcdc70e443e7708f36c01cca9
    Meghana-vankadari committed Oct 28, 2021
    Configuration menu
    Copy the full SHA
    ac2a50f View commit details
    Browse the repository at this point in the history

Commits on Nov 26, 2021

  1. Configuration menu
    Copy the full SHA
    faf5540 View commit details
    Browse the repository at this point in the history