-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] add more example for permute/scatter-gather/moe #1471
Open
carlushuang
wants to merge
22
commits into
develop
Choose a base branch
from
ck_tile/moe
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+8,194
−99
Open
Changes from 6 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
7971bb5
add test for scatter/gather
carlushuang 1b4b640
add permute example
carlushuang 0eb75e2
Merge remote-tracking branch 'origin/develop' into ck_tile/moe
carlushuang f19355a
fix typo
carlushuang b2e95e2
fix typo 2
carlushuang 22ab193
add ck_tile for matrix_core swizzle kernel
carlushuang bf21466
add b_nr_kr_waveflatten pattern
carlushuang 1ba8a08
update tmp work
carlushuang 33ceea6
merge to convert address
carlushuang 199f7f7
modify moe
carlushuang 54d3e2f
Merge remote-tracking branch 'origin/develop' into ck_tile/moe
carlushuang 4412a07
remove extra files
carlushuang 00a0a07
add topk reference and test
carlushuang 9a232a2
add another test
carlushuang 9b24c14
Merge remote-tracking branch 'origin/develop' into ck_tile/moe
carlushuang ee956e8
add one more test case
carlushuang bf8e6de
support argmax reduce in test
carlushuang 840cba8
Merge remote-tracking branch 'origin/develop' into ck_tile/moe
carlushuang 667047b
topk-softmax
carlushuang a24c569
Merge remote-tracking branch 'origin/develop' into ck_tile/moe
carlushuang a848e00
support topk-softmax up to 64 experts
carlushuang 41659ab
Merge remote-tracking branch 'origin/develop' into ck_tile/moe
carlushuang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# generate a list of kernels, but not actually emit files at config stage | ||
execute_process( | ||
COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_LIST_DIR}/generate.py | ||
--api fwd,fwd_splitkv --list_blobs ${CMAKE_CURRENT_BINARY_DIR}/fwd_blob_list.txt | ||
) | ||
|
||
execute_process( | ||
COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_LIST_DIR}/generate.py | ||
--api bwd --list_blobs ${CMAKE_CURRENT_BINARY_DIR}/bwd_blob_list.txt | ||
) | ||
|
||
# NOTE: for cmake, the FMHA_FWD_GEN_BLOBS/FMHA_BWD_GEN_BLOBS files must be in the same directory | ||
# as current cmake list, otherwise will not figure out the dependency properly | ||
file(STRINGS ${CMAKE_CURRENT_BINARY_DIR}/fwd_blob_list.txt FMHA_FWD_GEN_BLOBS) | ||
file(STRINGS ${CMAKE_CURRENT_BINARY_DIR}/bwd_blob_list.txt FMHA_BWD_GEN_BLOBS) | ||
|
||
add_custom_command( | ||
OUTPUT ${FMHA_FWD_GEN_BLOBS} | ||
COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_LIST_DIR}/generate.py | ||
--api fwd,fwd_splitkv --output_dir ${CMAKE_CURRENT_BINARY_DIR} | ||
) | ||
|
||
add_custom_command( | ||
OUTPUT ${FMHA_BWD_GEN_BLOBS} | ||
COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_LIST_DIR}/generate.py | ||
--api bwd --output_dir ${CMAKE_CURRENT_BINARY_DIR} | ||
) | ||
|
||
set(EXAMPLE_FMHA_FWD "tile_example_fmha_fwd") | ||
# not using add_example_executable() to add this target, since we don't want this to have | ||
# to be included in "make all/install/check" | ||
message("adding example ${EXAMPLE_FMHA_FWD}") | ||
add_executable(${EXAMPLE_FMHA_FWD} EXCLUDE_FROM_ALL fmha_fwd.cpp) | ||
target_include_directories(${EXAMPLE_FMHA_FWD} PRIVATE ${CMAKE_CURRENT_LIST_DIR}) | ||
target_sources(${EXAMPLE_FMHA_FWD} PRIVATE ${FMHA_FWD_GEN_BLOBS}) | ||
|
||
set(EXAMPLE_FMHA_BWD "tile_example_fmha_bwd") | ||
# not using add_example_executable() to add this target, since we don't want this to have | ||
# to be included in "make all/install/check" | ||
message("adding example ${EXAMPLE_FMHA_BWD}") | ||
add_executable(${EXAMPLE_FMHA_BWD} EXCLUDE_FROM_ALL fmha_bwd.cpp) | ||
target_include_directories(${EXAMPLE_FMHA_BWD} PRIVATE ${CMAKE_CURRENT_LIST_DIR}) | ||
target_sources(${EXAMPLE_FMHA_BWD} PRIVATE ${FMHA_BWD_GEN_BLOBS}) | ||
|
||
# NOTE: this is dangerous since will change the whole kernel to flush denormals | ||
# WIP with compiler team for an exp2 intrinsic..., then remove this | ||
if(NOT DEFINED FMHA_FWD_FAST_EXP2) | ||
set(FMHA_FWD_FAST_EXP2 true) | ||
endif() | ||
|
||
set(EXAMPLE_FMHA_FWD_COMPILE_OPTIONS) | ||
set(EXAMPLE_FMHA_BWD_COMPILE_OPTIONS) | ||
|
||
# NOTE: we turn off undefined-func-template to let source compile without explicit declare function specializations | ||
# ... because they are auto-generated | ||
if(FMHA_FWD_FAST_EXP2) | ||
list(APPEND EXAMPLE_FMHA_FWD_COMPILE_OPTIONS -Wno-undefined-func-template -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero) | ||
list(APPEND EXAMPLE_FMHA_BWD_COMPILE_OPTIONS -Wno-undefined-func-template -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero) | ||
else() | ||
list(APPEND EXAMPLE_FMHA_FWD_COMPILE_OPTIONS -Wno-undefined-func-template -DCK_TILE_FMHA_FWD_FAST_EXP2=0) | ||
list(APPEND EXAMPLE_FMHA_BWD_COMPILE_OPTIONS -Wno-undefined-func-template -DCK_TILE_FMHA_FWD_FAST_EXP2=0) | ||
endif() | ||
|
||
# Allow comparing floating points directly in order to check sentinel values | ||
list(APPEND EXAMPLE_FMHA_FWD_COMPILE_OPTIONS -Wno-float-equal) | ||
list(APPEND EXAMPLE_FMHA_BWD_COMPILE_OPTIONS -Wno-float-equal) | ||
|
||
target_compile_options(${EXAMPLE_FMHA_FWD} PRIVATE ${EXAMPLE_FMHA_FWD_COMPILE_OPTIONS}) | ||
target_compile_options(${EXAMPLE_FMHA_BWD} PRIVATE ${EXAMPLE_FMHA_BWD_COMPILE_OPTIONS}) | ||
|
||
# TODO: we have to turn off this global prop, otherwise the progress bar generated | ||
# by cmake will print too many files, execvp: /bin/sh: Argument list too long | ||
# however, this property may affect global | ||
# TODO: consider codegen a makefile by us | ||
set_property(GLOBAL PROPERTY RULE_MESSAGES OFF) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see generate.py in this folder. do you use it elsewhere?