feat: add intermediate InstructionDetails struct for instruction processing #2399

tao-stones · 2024-08-01T23:25:51Z

Problem

Process instructions (for a transaction) is really 2 steps: parse tx to gather deterministic ixs info; then (optionally) use current working bank to sanitize and convert gather instruction details to something else.

An intermediate struct between these two steps can be useful.

Summary of Changes

add InstructionDetails struct
break process_compute_budget_instruction() into two steps: try_from() to built Result<InstructionDetails>, and sanitize_and_convert_to_compute_budget_limits() to get Result<ComputeBudgetLimits>.
keep process_compute_budget_instructions() as a wrapper for backward compatibility, temporarily (it will be refactored out in follow up changes)

Fixes #2352

tao-stones · 2024-08-02T02:32:45Z

ah, forgot to cherry-pick tests, convert to draft till tests are imported.

runtime-transaction/src/instruction_details.rs

runtime-transaction/src/instructions_processor.rs

runtime-transaction/src/builtin_instruction_details.rs

runtime-transaction/src/instruction_details.rs

runtime-transaction/src/instructions_processor.rs

runtime-transaction/src/compute_budget_instruction_details.rs

jstarry · 2024-08-06T10:18:56Z

runtime-transaction/src/instruction_details.rs

+            compute_budget_instruction_details.process_instruction(
+                i as u8,
+                program_id,
+                instruction,
+            )?;
+            builtin_instruction_details.process_instruction(program_id, instruction)?;


I find this API very strange. I would prefer if we did something like add a method to SanitizedMessage called something like try_process_compute_budget_instructions or use a ComputeBudgetInstructionDetails::try_from_message constructor which takes the SanitizedMessage as an argument.

continue from above discussion, say if we filtered a list of interested instructions, which might be a mix of builtins, pre-compiles, compute-budgets and perhaps some other types; It's still good to iterate through the list once, pass each instruction for different processors to collect individual info. So we are operates on per-instruction basis, instead of per SanitizedMessage (or SVMMessage). Here it loosely implements Visitor pattern, where different instructions are processed by different processors. Might be better to refactor to something like this:

for ix in fitlered_instructions { ix.accept(&mut compute_budget_instruction_details)?; ix.accept(&mut builtin_instruction_details)?; ... ... }

Yeah the visitor pattern is fine I guess but I don't like that this interface operates over individual unfiltered instructions and does a hash for each program id as @apfitzge mentioned above.

been doing bench and profiling on iteration and hashing. Will share finding later.

a782e33 adds an old-school quick builtins-filter, using the bench from #2498 , it addresses the concern: it no longer wastes time to hash non-builtin program key to check against hashmap.

bench of master

running 5 tests test bench_process_compute_budget_instructions_builtins ... bench: 16,077 ns/iter (+/- 397) test bench_process_compute_budget_instructions_compute_budgets ... bench: 31,168 ns/iter (+/- 952) test bench_process_compute_budget_instructions_empty ... bench: 3,264 ns/iter (+/- 20) test bench_process_compute_budget_instructions_mixed ... bench: 449,149 ns/iter (+/- 6,138) test bench_process_compute_budget_instructions_non_builtins ... bench: 16,066 ns/iter (+/- 132)

bench before this commit - almost doubled time for all scenarios

running 5 tests test bench_process_compute_budget_instructions_builtins ... bench: 42,953 ns/iter (+/- 1,501) test bench_process_compute_budget_instructions_compute_budgets ... bench: 43,980 ns/iter (+/- 820) test bench_process_compute_budget_instructions_empty ... bench: 6,138 ns/iter (+/- 176) test bench_process_compute_budget_instructions_mixed ... bench: 1,353,764 ns/iter (+/- 27,608) test bench_process_compute_budget_instructions_non_builtins ... bench: 35,780 ns/iter (+/- 1,129)

bench with this commit - non_builtins perf back t o original

running 5 tests test bench_process_compute_budget_instructions_builtins ... bench: 40,815 ns/iter (+/- 269) test bench_process_compute_budget_instructions_compute_budgets ... bench: 44,544 ns/iter (+/- 155) test bench_process_compute_budget_instructions_empty ... bench: 5,770 ns/iter (+/- 49) test bench_process_compute_budget_instructions_mixed ... bench: 389,491 ns/iter (+/- 5,886) test bench_process_compute_budget_instructions_non_builtins ... bench: 15,358 ns/iter (+/- 515)

But this still shows the first 2 scenarios as having much worse performance, no?

yea, these two demonstrate the added cost of locating builtin program cost from BUILTIN_INSTRUCTION_COSTS, but to builtins only

jstarry · 2024-08-06T10:20:37Z

runtime-transaction/src/instruction_details.rs

+        let mut compute_budget_instruction_details = ComputeBudgetInstructionDetails::default();
+        let mut builtin_instruction_details = BuiltinInstructionDetails::default();
+
+        for (i, (program_id, instruction)) in instructions.enumerate() {


This approach of going instruction by instruction and comparing each program id is not very efficient. We should probably use a deduped program id list to find the compute budget program id, filter relevant instructions for processing, and then build the struct of the tx requested compute budget parameters.

This is an interesting idea to explore. So the idea is to collect as many instruction related information in one iteration, so we don't have to scan tx many times. For the immediate needs, InstructionDetails currently collects info about compute-budget and builtin instructions. We could filter just builtin instructions first. I am also planning to add pre-compiled info into instructionDetails (such as num_signatures), that might also be able to fit into same schema. I'd like to think about it more, maybe can do this in follow up PRs? As for this PR, it can continue keep currently behavior.

Please don't resolve conversations that aren't resolved

Can you try benching this approach as well?

Please don't resolve conversations that aren't resolved

I closed it cause I thought this conversation has merged into above one, there I added filter and bench results. Sound like you have different idea to try here. Can you explain more? By "use a deduped program id list", I am assuming you are referring to compute-budget programs specifically, but we are not just processing compute budget anymore, also want to accumulate all builtin's cost.

I think the idea is that a program_id_index may appear multiple times in a single tx. We do not need to resolve it to builtin/compute-budget each time we see that index - only the first time.

runtime-transaction/src/compute_budget_instruction_details.rs

runtime-transaction/src/instruction_details.rs

jstarry · 2024-08-07T01:26:06Z

runtime-transaction/src/instruction_details.rs

+pub struct InstructionDetails {
+    builtin_instruction_details: BuiltinInstructionDetails,
+    compute_budget_instruction_details: ComputeBudgetInstructionDetails,
+}


I don't think I see the value of combining different "instruction details" into a single struct. Can we keep them separate? I don't see much value in eliminating a loop since the work inside the loop is more expensive than the loop itself.

I kind like this umbrella struct, it defines concept well: giving the list of instructions, it fetches and stores their "details". What "details" are can just be internal knowledge. Information in builtin_instruction_details and compute_budget_instruction_details are used together (in num_non_compute_budget_instructions()), they probably could be flatted into one InstructionDetails struct (as in my initial version). From call site perspective, it's easier to work with one instructionDetails vs different "details". I broke them into separate pieces to make changing and unit testing easier.

787326d removed the "inner details" structs.

I prefer to keep the inner structs and remove the outer struct

What I tried to say was the "inner details" are not truly independent two set of datas, rather they were artificially separated.

Can you help me to understand why keeping inner structs is better than what it is?

…deterministic ix info from transaction; Add function to sanitize and convert InstructionDetails to ComputeBudgetLimits; Issue solana-labs#2352

one of them.

Co-authored-by: Andrew Fitzgerald <[email protected]>

… bug that misses counting non-builtins, that broke tests

builtins-default-costs/src/lib.rs

runtime-transaction/src/instruction_details.rs

jstarry · 2024-08-09T02:58:38Z

runtime-transaction/src/instruction_details.rs

+        let mut compute_budget_instruction_details = ComputeBudgetInstructionDetails::default();
+        let mut builtin_instruction_details = BuiltinInstructionDetails::default();
+
+        for (i, (program_id, instruction)) in instructions.enumerate() {


Can you try benching this approach as well?

jstarry · 2024-08-09T03:01:48Z

I think this PR should just focus on the refactor and then have another PR handle the builtin instruction stuff. I'm not actually super onboard with #2212 because some people could be counting on builtin instructions giving them 200k cu's. We should move away from the 200k cu per ix logic entirely and make sure everyone specifies a cu limit in my opinion.

Co-authored-by: Justin Starry <[email protected]>

tao-stones · 2024-08-09T03:26:22Z

Additional bench results for to_bytes -> as_ref:

to_bytes

running 5 tests
test bench_process_compute_budget_instructions_builtins        ... bench:      44,762 ns/iter (+/- 978)
test bench_process_compute_budget_instructions_compute_budgets ... bench:      48,917 ns/iter (+/- 685)
test bench_process_compute_budget_instructions_empty           ... bench:       6,238 ns/iter (+/- 136)
test bench_process_compute_budget_instructions_mixed           ... bench:     442,665 ns/iter (+/- 6,131)
test bench_process_compute_budget_instructions_non_builtins    ... bench:      16,802 ns/iter (+/- 448)

as_ref

running 5 tests
test bench_process_compute_budget_instructions_builtins        ... bench:      44,262 ns/iter (+/- 329)
test bench_process_compute_budget_instructions_compute_budgets ... bench:      48,939 ns/iter (+/- 677)
test bench_process_compute_budget_instructions_empty           ... bench:       6,234 ns/iter (+/- 47)
test bench_process_compute_budget_instructions_mixed           ... bench:     441,409 ns/iter (+/- 4,118)
test bench_process_compute_budget_instructions_non_builtins    ... bench:      16,682 ns/iter (+/- 413)

tao-stones · 2024-08-09T03:47:40Z

I think this PR should just focus on the refactor and then have another PR handle the builtin instruction stuff.

At the moment, the conversation is on handling builtin stuff. Not sure if breaking pr would make difference.

I'm not actually super onboard with #2212 because some people could be counting on builtin instructions giving them 200k cu's.

When builtin is allocated 200K budget, it creates room to execute other instructions that otherwise could fail due budget limit. This inconsistency should be fixed, it could lead to overpacking blocks.

We should move away from the 200k cu per ix logic entirely and make sure everyone specifies a cu limit in my opinion.

100% agree to remove 200K per ix logic, especially after all builtins now consume budgets just like other instructions; I don't see reason for cost model (banking side code) to treat builtins differently from other instructions. But "make sure everyone specifies cu limit" might take longer.

My preference is to do this one first, then update cost-model to access "instructionDetails" for cu limits etc, so cost model and compute budget are on same page. Then later if we want to remove 200K cu logic, we just need to do that in one place.

jstarry · 2024-08-09T04:12:13Z

At the moment, the conversation is on handling builtin stuff. Not sure if breaking pr would make difference.

It does because typically refactors and new features are not done in the same PR

tao-stones · 2024-08-09T18:13:58Z

At the moment, the conversation is on handling builtin stuff. Not sure if breaking pr would make difference.

It does because typically refactors and new features are not done in the same PR

It was intended as feat to add InstructionDetails that collects builtin ix costs, so the follow-up PR can use that to set correct budget for builtins with a feature gate. It does have refactoring and performance improving elements in it. How about breaking it into follow steps:

refactor: break process_compute_budget_instructions() into "process" part and "sanitizing" part, with InstructionDetails struct that only have compute-budget stuff;
perf: optimize "process" part, address the inefficiency of iterating through all instructions to look for CB ix.
feat: collects builtin costs into InstructionDetails
fix: allocating exact budget for builtins, feature gated.

tao-stones · 2024-08-12T17:17:50Z

close this PR, break into into #2559, #2560, #2561, #2562

tao-stones requested review from jstarry and apfitzge August 2, 2024 02:29

tao-stones marked this pull request as draft August 2, 2024 02:31

apfitzge reviewed Aug 2, 2024

View reviewed changes

runtime-transaction/src/instruction_details.rs Outdated Show resolved Hide resolved

runtime-transaction/src/instruction_details.rs Outdated Show resolved Hide resolved

runtime-transaction/src/instructions_processor.rs Show resolved Hide resolved

tao-stones marked this pull request as ready for review August 5, 2024 16:27

tao-stones force-pushed the feat-add-intermediate-instruction_details-for-processing-ixs branch from af05e47 to b0b1b68 Compare August 5, 2024 16:46

apfitzge requested changes Aug 5, 2024

View reviewed changes

tao-stones requested a review from apfitzge August 5, 2024 22:34

jstarry reviewed Aug 6, 2024

View reviewed changes

jstarry reviewed Aug 7, 2024

View reviewed changes

tao-stones force-pushed the feat-add-intermediate-instruction_details-for-processing-ixs branch from a782e33 to f15f7e7 Compare August 8, 2024 16:24

tao-stones and others added 8 commits August 8, 2024 16:40

feat: add intermediate InstructionDetails struct to gather and store …

da7d12f

…deterministic ix info from transaction; Add function to sanitize and convert InstructionDetails to ComputeBudgetLimits; Issue solana-labs#2352

Separate inner detail struct into their own model, add tests for each

ee05678

one of them.

Update runtime-transaction/src/builtin_instruction_details.rs

131a8e8

Co-authored-by: Andrew Fitzgerald <[email protected]>

make raw structs private

39bc426

add helper to count non-compute-budget-instructions

70087f8

move requested_heap_bytes sanitization

dc27560

add quick program_id filter for builtins

0b14274

remove inner "details" struct, flatten into single structure. Fixed a…

787326d

… bug that misses counting non-builtins, that broke tests

tao-stones force-pushed the feat-add-intermediate-instruction_details-for-processing-ixs branch from f15f7e7 to 787326d Compare August 8, 2024 23:08

jstarry reviewed Aug 9, 2024

View reviewed changes

tao-stones and others added 2 commits August 8, 2024 22:12

Update runtime-transaction/src/instruction_details.rs

6369e53

Co-authored-by: Justin Starry <[email protected]>

Update builtins-default-costs/src/lib.rs

2dd223b

Co-authored-by: Justin Starry <[email protected]>

tao-stones closed this Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add intermediate InstructionDetails struct for instruction processing #2399

feat: add intermediate InstructionDetails struct for instruction processing #2399

tao-stones commented Aug 1, 2024

tao-stones commented Aug 2, 2024

jstarry Aug 6, 2024

tao-stones Aug 6, 2024

jstarry Aug 7, 2024

tao-stones Aug 7, 2024

tao-stones Aug 8, 2024

apfitzge Aug 8, 2024

tao-stones Aug 8, 2024

jstarry Aug 6, 2024

tao-stones Aug 6, 2024

jstarry Aug 9, 2024

jstarry Aug 9, 2024

tao-stones Aug 9, 2024

apfitzge Aug 9, 2024 •

edited

Loading

jstarry Aug 7, 2024

tao-stones Aug 8, 2024 •

edited

Loading

tao-stones Aug 8, 2024

jstarry Aug 9, 2024

tao-stones Aug 9, 2024

jstarry Aug 9, 2024

jstarry commented Aug 9, 2024

tao-stones commented Aug 9, 2024

tao-stones commented Aug 9, 2024

jstarry commented Aug 9, 2024

tao-stones commented Aug 9, 2024

tao-stones commented Aug 12, 2024

feat: add intermediate InstructionDetails struct for instruction processing #2399

feat: add intermediate InstructionDetails struct for instruction processing #2399

Conversation

tao-stones commented Aug 1, 2024

Problem

Summary of Changes

tao-stones commented Aug 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apfitzge Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tao-stones Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jstarry commented Aug 9, 2024

tao-stones commented Aug 9, 2024

tao-stones commented Aug 9, 2024

jstarry commented Aug 9, 2024

tao-stones commented Aug 9, 2024

tao-stones commented Aug 12, 2024

apfitzge Aug 9, 2024 •

edited

Loading

tao-stones Aug 8, 2024 •

edited

Loading