Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add intermediate InstructionDetails struct for instruction processing #2399

Conversation

tao-stones
Copy link

Problem

Process instructions (for a transaction) is really 2 steps: parse tx to gather deterministic ixs info; then (optionally) use current working bank to sanitize and convert gather instruction details to something else.

An intermediate struct between these two steps can be useful.

Summary of Changes

  • add InstructionDetails struct
  • break process_compute_budget_instruction() into two steps: try_from() to built Result<InstructionDetails>, and sanitize_and_convert_to_compute_budget_limits() to get Result<ComputeBudgetLimits>.
  • keep process_compute_budget_instructions() as a wrapper for backward compatibility, temporarily (it will be refactored out in follow up changes)

Fixes #2352

@tao-stones tao-stones marked this pull request as draft August 2, 2024 02:31
@tao-stones
Copy link
Author

ah, forgot to cherry-pick tests, convert to draft till tests are imported.

@tao-stones tao-stones marked this pull request as ready for review August 5, 2024 16:27
@tao-stones tao-stones force-pushed the feat-add-intermediate-instruction_details-for-processing-ixs branch from af05e47 to b0b1b68 Compare August 5, 2024 16:46
Comment on lines 29 to 38
compute_budget_instruction_details.process_instruction(
i as u8,
program_id,
instruction,
)?;
builtin_instruction_details.process_instruction(program_id, instruction)?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this API very strange. I would prefer if we did something like add a method to SanitizedMessage called something like try_process_compute_budget_instructions or use a ComputeBudgetInstructionDetails::try_from_message constructor which takes the SanitizedMessage as an argument.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue from above discussion, say if we filtered a list of interested instructions, which might be a mix of builtins, pre-compiles, compute-budgets and perhaps some other types; It's still good to iterate through the list once, pass each instruction for different processors to collect individual info. So we are operates on per-instruction basis, instead of per SanitizedMessage (or SVMMessage). Here it loosely implements Visitor pattern, where different instructions are processed by different processors. Might be better to refactor to something like this:

for ix in fitlered_instructions {
    ix.accept(&mut compute_budget_instruction_details)?;
    ix.accept(&mut builtin_instruction_details)?;
    ... ...
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the visitor pattern is fine I guess but I don't like that this interface operates over individual unfiltered instructions and does a hash for each program id as @apfitzge mentioned above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been doing bench and profiling on iteration and hashing. Will share finding later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a782e33 adds an old-school quick builtins-filter, using the bench from #2498 , it addresses the concern: it no longer wastes time to hash non-builtin program key to check against hashmap.

bench of master
running 5 tests
test bench_process_compute_budget_instructions_builtins        ... bench:      16,077 ns/iter (+/- 397)
test bench_process_compute_budget_instructions_compute_budgets ... bench:      31,168 ns/iter (+/- 952)
test bench_process_compute_budget_instructions_empty           ... bench:       3,264 ns/iter (+/- 20)
test bench_process_compute_budget_instructions_mixed           ... bench:     449,149 ns/iter (+/- 6,138)
test bench_process_compute_budget_instructions_non_builtins    ... bench:      16,066 ns/iter (+/- 132)
bench before this commit - almost doubled time for all scenarios
running 5 tests
test bench_process_compute_budget_instructions_builtins        ... bench:      42,953 ns/iter (+/- 1,501)
test bench_process_compute_budget_instructions_compute_budgets ... bench:      43,980 ns/iter (+/- 820)
test bench_process_compute_budget_instructions_empty           ... bench:       6,138 ns/iter (+/- 176)
test bench_process_compute_budget_instructions_mixed           ... bench:   1,353,764 ns/iter (+/- 27,608)
test bench_process_compute_budget_instructions_non_builtins    ... bench:      35,780 ns/iter (+/- 1,129)
bench with this commit - non_builtins perf back t o original
running 5 tests
test bench_process_compute_budget_instructions_builtins        ... bench:      40,815 ns/iter (+/- 269)
test bench_process_compute_budget_instructions_compute_budgets ... bench:      44,544 ns/iter (+/- 155)
test bench_process_compute_budget_instructions_empty           ... bench:       5,770 ns/iter (+/- 49)
test bench_process_compute_budget_instructions_mixed           ... bench:     389,491 ns/iter (+/- 5,886)
test bench_process_compute_budget_instructions_non_builtins    ... bench:      15,358 ns/iter (+/- 515)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this still shows the first 2 scenarios as having much worse performance, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, these two demonstrate the added cost of locating builtin program cost from BUILTIN_INSTRUCTION_COSTS, but to builtins only

let mut compute_budget_instruction_details = ComputeBudgetInstructionDetails::default();
let mut builtin_instruction_details = BuiltinInstructionDetails::default();

for (i, (program_id, instruction)) in instructions.enumerate() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach of going instruction by instruction and comparing each program id is not very efficient. We should probably use a deduped program id list to find the compute budget program id, filter relevant instructions for processing, and then build the struct of the tx requested compute budget parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting idea to explore. So the idea is to collect as many instruction related information in one iteration, so we don't have to scan tx many times. For the immediate needs, InstructionDetails currently collects info about compute-budget and builtin instructions. We could filter just builtin instructions first. I am also planning to add pre-compiled info into instructionDetails (such as num_signatures), that might also be able to fit into same schema. I'd like to think about it more, maybe can do this in follow up PRs? As for this PR, it can continue keep currently behavior.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't resolve conversations that aren't resolved

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try benching this approach as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't resolve conversations that aren't resolved

I closed it cause I thought this conversation has merged into above one, there I added filter and bench results. Sound like you have different idea to try here. Can you explain more? By "use a deduped program id list", I am assuming you are referring to compute-budget programs specifically, but we are not just processing compute budget anymore, also want to accumulate all builtin's cost.

Copy link

@apfitzge apfitzge Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is that a program_id_index may appear multiple times in a single tx. We do not need to resolve it to builtin/compute-budget each time we see that index - only the first time.

runtime-transaction/src/instruction_details.rs Outdated Show resolved Hide resolved
Comment on lines 16 to 29
pub struct InstructionDetails {
builtin_instruction_details: BuiltinInstructionDetails,
compute_budget_instruction_details: ComputeBudgetInstructionDetails,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I see the value of combining different "instruction details" into a single struct. Can we keep them separate? I don't see much value in eliminating a loop since the work inside the loop is more expensive than the loop itself.

Copy link
Author

@tao-stones tao-stones Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind like this umbrella struct, it defines concept well: giving the list of instructions, it fetches and stores their "details". What "details" are can just be internal knowledge. Information in builtin_instruction_details and compute_budget_instruction_details are used together (in num_non_compute_budget_instructions()), they probably could be flatted into one InstructionDetails struct (as in my initial version). From call site perspective, it's easier to work with one instructionDetails vs different "details". I broke them into separate pieces to make changing and unit testing easier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

787326d removed the "inner details" structs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep the inner structs and remove the outer struct

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I tried to say was the "inner details" are not truly independent two set of datas, rather they were artificially separated.

Can you help me to understand why keeping inner structs is better than what it is?

@tao-stones tao-stones force-pushed the feat-add-intermediate-instruction_details-for-processing-ixs branch from a782e33 to f15f7e7 Compare August 8, 2024 16:24
@tao-stones tao-stones force-pushed the feat-add-intermediate-instruction_details-for-processing-ixs branch from f15f7e7 to 787326d Compare August 8, 2024 23:08
builtins-default-costs/src/lib.rs Outdated Show resolved Hide resolved
runtime-transaction/src/instruction_details.rs Outdated Show resolved Hide resolved
let mut compute_budget_instruction_details = ComputeBudgetInstructionDetails::default();
let mut builtin_instruction_details = BuiltinInstructionDetails::default();

for (i, (program_id, instruction)) in instructions.enumerate() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try benching this approach as well?

@jstarry
Copy link

jstarry commented Aug 9, 2024

I think this PR should just focus on the refactor and then have another PR handle the builtin instruction stuff. I'm not actually super onboard with #2212 because some people could be counting on builtin instructions giving them 200k cu's. We should move away from the 200k cu per ix logic entirely and make sure everyone specifies a cu limit in my opinion.

@tao-stones
Copy link
Author

Additional bench results for to_bytes -> as_ref:

to_bytes
running 5 tests
test bench_process_compute_budget_instructions_builtins        ... bench:      44,762 ns/iter (+/- 978)
test bench_process_compute_budget_instructions_compute_budgets ... bench:      48,917 ns/iter (+/- 685)
test bench_process_compute_budget_instructions_empty           ... bench:       6,238 ns/iter (+/- 136)
test bench_process_compute_budget_instructions_mixed           ... bench:     442,665 ns/iter (+/- 6,131)
test bench_process_compute_budget_instructions_non_builtins    ... bench:      16,802 ns/iter (+/- 448)
as_ref
running 5 tests
test bench_process_compute_budget_instructions_builtins        ... bench:      44,262 ns/iter (+/- 329)
test bench_process_compute_budget_instructions_compute_budgets ... bench:      48,939 ns/iter (+/- 677)
test bench_process_compute_budget_instructions_empty           ... bench:       6,234 ns/iter (+/- 47)
test bench_process_compute_budget_instructions_mixed           ... bench:     441,409 ns/iter (+/- 4,118)
test bench_process_compute_budget_instructions_non_builtins    ... bench:      16,682 ns/iter (+/- 413)

@tao-stones
Copy link
Author

I think this PR should just focus on the refactor and then have another PR handle the builtin instruction stuff.

At the moment, the conversation is on handling builtin stuff. Not sure if breaking pr would make difference.

I'm not actually super onboard with #2212 because some people could be counting on builtin instructions giving them 200k cu's.

When builtin is allocated 200K budget, it creates room to execute other instructions that otherwise could fail due budget limit. This inconsistency should be fixed, it could lead to overpacking blocks.

We should move away from the 200k cu per ix logic entirely and make sure everyone specifies a cu limit in my opinion.

100% agree to remove 200K per ix logic, especially after all builtins now consume budgets just like other instructions; I don't see reason for cost model (banking side code) to treat builtins differently from other instructions. But "make sure everyone specifies cu limit" might take longer.

My preference is to do this one first, then update cost-model to access "instructionDetails" for cu limits etc, so cost model and compute budget are on same page. Then later if we want to remove 200K cu logic, we just need to do that in one place.

@jstarry
Copy link

jstarry commented Aug 9, 2024

At the moment, the conversation is on handling builtin stuff. Not sure if breaking pr would make difference.

It does because typically refactors and new features are not done in the same PR

@tao-stones
Copy link
Author

At the moment, the conversation is on handling builtin stuff. Not sure if breaking pr would make difference.

It does because typically refactors and new features are not done in the same PR

It was intended as feat to add InstructionDetails that collects builtin ix costs, so the follow-up PR can use that to set correct budget for builtins with a feature gate. It does have refactoring and performance improving elements in it. How about breaking it into follow steps:

  1. refactor: break process_compute_budget_instructions() into "process" part and "sanitizing" part, with InstructionDetails struct that only have compute-budget stuff;
  2. perf: optimize "process" part, address the inefficiency of iterating through all instructions to look for CB ix.
  3. feat: collects builtin costs into InstructionDetails
  4. fix: allocating exact budget for builtins, feature gated.

@tao-stones
Copy link
Author

close this PR, break into into #2559, #2560, #2561, #2562

@tao-stones tao-stones closed this Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BankingStage: Add InstructionDetails struct to cache and reuse static tx instruction info
3 participants