Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Armv8-A Row-major Kernel Improvements #698

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Commits on Nov 20, 2022

  1. Arm NEON Improve C-Prefetching for DGEMM

    - Only DGEMM at this moment.
    - Prefetch whole lines.
    - Scatter prefetching insts.
    xrq-phys committed Nov 20, 2022
    Configuration menu
    Copy the full SHA
    e8068f6 View commit details
    Browse the repository at this point in the history
  2. Arm NEON Init. Opt. For DGEMM

    Instead of clearing C rows, Deploy first-k FMUL
     so that instructions are saved.
    xrq-phys committed Nov 20, 2022
    Configuration menu
    Copy the full SHA
    ad73717 View commit details
    Browse the repository at this point in the history

Commits on Nov 21, 2022

  1. Arm NEON DGEMM Change Regs IO

    Instead of loading from stack, directly pass regs in.
    Arm64 has 30 regs for use. This may or may not speed up a tiny bit.
    xrq-phys committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    0ddde0f View commit details
    Browse the repository at this point in the history
  2. Fix Init. Bug

    Forget to commit header for ad73717.
    xrq-phys committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    04b5b71 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2022

  1. Armv8-A Port Row-maj DGEMM Uker Changes to SGEMM

    - Init k-loop clears C.
    - Scattered C preloading.
    xrq-phys committed Dec 16, 2022
    Configuration menu
    Copy the full SHA
    47c63c1 View commit details
    Browse the repository at this point in the history