Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Cuda Decoder #4811

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Commits on Dec 13, 2022

  1. [src] Several cuda decoder fixes.

    These affect both correctness and performance.
    
    - Add missing cudaStreamSynchronize()
    
    This was not caught before because we were running at smaller batch
    sizes, which allowed the init decoding kernels to run in parallel with
    the nnet3 kernels, and thus have completed at this point. At large
    enough batch sizes, no such parallelization is possible (all blocks of
    the GPU are occupied).
    
    - Faster host paged to pinned memory copy via multithreading.
    
    - Disable timing in cuda events for increased performance.
    
    Before (on A100 PCIe):
    
    Overall:  Aggregate Total Time: 26.6364 Total Audio: 194525
    RealTimeX: 7302.96
    
    After (on A100 PCIe):
    
    Overall:  Aggregate Total Time: 26.0323 Total Audio: 194525
    RealTimeX: 7472.43
    
    - In online decoder, Create writers before initializing cuda.
    
    CUDA initialization creates a lot of virtual memory (for unified
    virtual memory, if I understand correctly) that can cause errors if
    memory oversubscription is not set high enough when using the fork()
    syscall.
    
    The issue is further described here:
    
    https://groups.google.com/g/kaldi-help/c/3hc0xsRpqqY?pli=1
    
    - Add cudaProfilerStart/Stop to online binary
    
    - Name H2H copy threads in NSight Systems.
    galv committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    31d61c0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2c247a1 View commit details
    Browse the repository at this point in the history
  3. Decrease latency by doing partial hypothesis work on host at the same

    time as cuda calls on the device.
    galv committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    abbaa6e View commit details
    Browse the repository at this point in the history
  4. New Thread pool implementation.

    galv committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    0d31458 View commit details
    Browse the repository at this point in the history
  5. Add RTFx calculation to online decoder.

    Note that the max RTFx in online mode is necessarily --num-parallel-streaming-channels
    galv committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    dc95de1 View commit details
    Browse the repository at this point in the history
  6. Improve online performance of cuda decoder.

    Use a thread pool that sleeps when there is no data to retrieve.
    
    Sort data at the right pooint to improve cache performance.
    
    Remove spin locks with atomics. These cause slow downs compared to
    condition variables, in particular, because we cannot sleep accurate
    for 200 microseconds or less. (A 200 microsecond sleep turns out tot ake
    250 microseconds). These delays cause unnecessary slow down.
    galv committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    501b9d9 View commit details
    Browse the repository at this point in the history
  7. [misc] Install python2.7

    This is to fix a CI error.
    
    It appears that this is from using "ubuntu-latest" in the CI
    workflow. It got upgraded to ubuntu 22.04 automatically, and this
    doesn't have python2.7 by default.
    galv committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    6d94122 View commit details
    Browse the repository at this point in the history
  8. Make codefactor changes.

    galv committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    87d577f View commit details
    Browse the repository at this point in the history