You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the chunked sort kernels (for ChunkedArray and Table), the most expensive step can be the recursive merge of sorted chunks after each individual chunk was sorted.
Currently, this merge step resolves chunked indices every time an access is made to read a value. This means chunked resolution is computed O(n*log2(k)) times (where n is the input length and k is the number of chunks).
However, we could instead compute chunked indices after sorting the individual chunks. Then there would be no chunk resolution when merging, just direct accesses through ResolvedChunks.
Component(s)
C++
The text was updated successfully, but these errors were encountered:
the size taken by those temporary ResolvedChunks is twice the size of indices, hence a bigger CPU cache footprint
there has to be a final "reverse resolution" step where we convert back the sorted ResolvedChunks into absolute indices... or we maintain those absolute indices along the ResolvedChunk, which implies an ever bigger cache footprint
Experimenting will tell whether this can be beneficial.
Describe the enhancement requested
In the chunked sort kernels (for ChunkedArray and Table), the most expensive step can be the recursive merge of sorted chunks after each individual chunk was sorted.
Currently, this merge step resolves chunked indices every time an access is made to read a value. This means chunked resolution is computed
O(n*log2(k))
times (wheren
is the input length andk
is the number of chunks).However, we could instead compute chunked indices after sorting the individual chunks. Then there would be no chunk resolution when merging, just direct accesses through
ResolvedChunk
s.Component(s)
C++
The text was updated successfully, but these errors were encountered: