Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Compute] Pre-solve chunked indices before merging chunks in Sort kernels #44084

Open
pitrou opened this issue Sep 12, 2024 · 2 comments
Open

Comments

@pitrou
Copy link
Member

pitrou commented Sep 12, 2024

Describe the enhancement requested

In the chunked sort kernels (for ChunkedArray and Table), the most expensive step can be the recursive merge of sorted chunks after each individual chunk was sorted.

Currently, this merge step resolves chunked indices every time an access is made to read a value. This means chunked resolution is computed O(n*log2(k)) times (where n is the input length and k is the number of chunks).

However, we could instead compute chunked indices after sorting the individual chunks. Then there would be no chunk resolution when merging, just direct accesses through ResolvedChunks.

Component(s)

C++

@pitrou
Copy link
Member Author

pitrou commented Sep 12, 2024

cc @felipecrv

@pitrou
Copy link
Member Author

pitrou commented Sep 14, 2024

Two potential downsides to this approach:

  1. the size taken by those temporary ResolvedChunks is twice the size of indices, hence a bigger CPU cache footprint
  2. there has to be a final "reverse resolution" step where we convert back the sorted ResolvedChunks into absolute indices... or we maintain those absolute indices along the ResolvedChunk, which implies an ever bigger cache footprint

Experimenting will tell whether this can be beneficial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant