Data visualization based on evaluation CSV files #296

ruiAzevedo19 · 2024-07-30T10:07:44Z

Goal: create a HTML report with graphs for data visualization.
Tool: D3.js library for data visualization graphs

TODO

Create a table for the evaluation CSV file
Scatter plot
- Create a CSV file that stores the models meta information such as pricing and human-readable names
  - Extend the model's interface with a MetaInformation function that returns a model's meta information
  - Write the meta information in a CSV file
- Create a scatter plot with the model costs and score
error bars over multiple runs, to show variance

The text was updated successfully, but these errors were encountered:

…nto a helper, so to it can be reused Part of #296

…it can be reused Part of #296

… a CSV file, so it can be used for data visualization Part of #296

…nto a helper, so to it can be reused Part of #296

…it can be reused Part of #296

… a CSV file, so it can be used for data visualization Part of #296

Part of #296

…e JSON response, to avoid these values to be converted latter on Part of #296

…der to the model package, since it is model related Part of #296

…nto a helper, so to it can be reused Part of #296

…e it can error if the file already exists Part of #296

…adable names Part of #296

…d in data visualization Part of #296

Part of #296

bauersimon · 2024-08-01T09:46:58Z

Leaving this here until we have the summing logic in the visualization.

# script.sh <evaluation-without-extension> <meta-without-extension>

pip install csvkit

sed -i '1s/-/_/g' $1.csv # SQL does not like hyphens in column names.
sed -i '1s/-/_/g' $2.csv # SQL does not like hyphens in column names.

csvsql --query "SELECT model_id, language, SUM(score) AS score, SUM(coverage) AS coverage, SUM(files_executed) AS files_executed, SUM(files_executed_maximum_reachable) AS files_executed_maximum_reachable, SUM(generate_tests_for_file_character_count) AS generate_tests_for_file_character_count, SUM(processing_time) AS processing_time, SUM(response_character_count) AS response_character_count, SUM(response_no_error) AS response_no_error, SUM(response_no_excess) AS response_no_excess, SUM(response_with_code) AS response_with_code, SUM(tests_passing) AS tests_passing FROM $1 WHERE task NOT LIKE '%-symflower-fix' GROUP BY model_id, language" $1.csv > $1-by-language.csv

csvsql --query "SELECT model_id, SUM(score) AS score, SUM(CASE WHEN language = 'golang' THEN score ELSE 0 END) AS golang_score, SUM(CASE WHEN language = 'java' THEN score ELSE 0 END) AS java_score, SUM(CASE WHEN language = 'ruby' THEN score ELSE 0 END) AS ruby_score FROM $1 WHERE task NOT LIKE '%-symflower-fix' GROUP BY model_id" $1.csv > $1-by-language-score.csv

csvsql --query "SELECT $1.model_id, model_name, (completion + prompt + request) AS cost, SUM(score) AS score, SUM(coverage) AS coverage, SUM(files_executed) AS files_executed, SUM(files_executed_maximum_reachable) AS files_executed_maximum_reachable, SUM(generate_tests_for_file_character_count) AS generate_tests_for_file_character_count, SUM(processing_time) AS processing_time, SUM(response_character_count) AS response_character_count, SUM(response_no_error) AS response_no_error, SUM(response_no_excess) AS response_no_excess, SUM(response_with_code) AS response_with_code, SUM(tests_passing) AS tests_passing FROM $1 LEFT JOIN $2 ON $1.model_id = $2.model_id WHERE task NOT LIKE '%-symflower-fix' GROUP BY $1.model_id" $1.csv $2.csv > $1-total.csv

csvsql --query "SELECT model_id, task, SUM(score) AS score, SUM(coverage) AS coverage, SUM(files_executed) AS files_executed, SUM(files_executed_maximum_reachable) AS files_executed_maximum_reachable, SUM(generate_tests_for_file_character_count) AS generate_tests_for_file_character_count, SUM(processing_time) AS processing_time, SUM(response_character_count) AS response_character_count, SUM(response_no_error) AS response_no_error, SUM(response_no_excess) AS response_no_excess, SUM(response_with_code) AS response_with_code, SUM(tests_passing) AS tests_passing FROM $1 WHERE task NOT LIKE '%-symflower-fix' GROUP BY model_id, task" $1.csv > $1-by-task.csv

csvsql --query "SELECT model_id, task, language, SUM(score) AS score, SUM(coverage) AS coverage, SUM(files_executed) AS files_executed, SUM(files_executed_maximum_reachable) AS files_executed_maximum_reachable, SUM(generate_tests_for_file_character_count) AS generate_tests_for_file_character_count, SUM(processing_time) AS processing_time, SUM(response_character_count) AS response_character_count, SUM(response_no_error) AS response_no_error, SUM(response_no_excess) AS response_no_excess, SUM(response_with_code) AS response_with_code, SUM(tests_passing) AS tests_passing FROM $1 WHERE task NOT LIKE '%-symflower-fix' GROUP BY model_id, task, language" $1.csv > $1-by-task-by-language.csv

csvsql --query "SELECT model_id, SUM(CASE WHEN task NOT LIKE '%-symflower-fix' THEN score ELSE 0 END) AS score, SUM(CASE WHEN task LIKE '%-symflower-fix' THEN score ELSE 0 END) AS score_fix, SUM(CASE WHEN task NOT LIKE '%-symflower-fix' THEN files_executed ELSE 0 END) AS files_executed, SUM(CASE WHEN task LIKE '%-symflower-fix' THEN files_executed ELSE 0 END) AS files_executed_fix FROM $1 WHERE (task LIKE 'transpile%' OR task LIKE 'write-tests%') AND language = 'golang' GROUP BY model_id " $1.csv > $1-by-symflower-fix.csv

…e JSON response, to avoid these values to be converted latter on Part of #296

…der to the model package, since it is model related Part of #296

…nto a helper, so to it can be reused Part of #296

…e it can error if the file already exists Part of #296

…adable names Part of #296

…d in data visualization Part of #296

… generic name, since it can sort all kind of CSV records Part of #296

…adable names Part of #296

…d in data visualization Part of #296

… generic name, since it can sort all kind of CSV records Part of #296

…adable names Part of #296

…d in data visualization Part of #296

…e JSON response, to avoid these values to be converted latter on Part of #296

…der to the model package, since it is model related Part of #296

…nto a helper, so to it can be reused Part of #296

…e it can error if the file already exists Part of #296

… generic name, since it can sort all kind of CSV records Part of #296

…adable names Part of #296

…d in data visualization Part of #296

…e JSON response, to avoid these values to be converted latter on Part of #296

…der to the model package, since it is model related Part of #296

…nto a helper, so to it can be reused Part of #296

…e it can error if the file already exists Part of #296

… generic name, since it can sort all kind of CSV records Part of #296

…adable names Part of #296

…d in data visualization Part of #296

ruiAzevedo19 added the enhancement New feature or request label Jul 30, 2024

ruiAzevedo19 added this to the v0.6.0 milestone Jul 30, 2024

ruiAzevedo19 self-assigned this Jul 30, 2024

bauersimon mentioned this issue Jul 30, 2024

Dump the assessments in the CSV files once they happen and not in the end of all executions #237

Closed

7 tasks

ruiAzevedo19 added a commit that referenced this issue Jul 30, 2024

refactor, Extract the logic to fetch models from the openrouter API i…

02eb24b

…nto a helper, so to it can be reused Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 30, 2024

refactor, Move the logic that creates report files into a helper, so …

74f81a9

…it can be reused Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 30, 2024

Store models cost information along with the corresponding scoring in…

c6ddd10

… a CSV file, so it can be used for data visualization Part of #296

ruiAzevedo19 mentioned this issue Jul 30, 2024

Store models meta information in a CSV file, so it can be further used in data visualization #298

Merged

ruiAzevedo19 added a commit that referenced this issue Jul 30, 2024

refactor, Extract the logic to fetch models from the openrouter API i…

8ae0ab8

…nto a helper, so to it can be reused Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 30, 2024

refactor, Move the logic that creates report files into a helper, so …

b968df6

…it can be reused Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 30, 2024

Store models cost information along with the corresponding scoring in…

06442e3

… a CSV file, so it can be used for data visualization Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 30, 2024

HTML report for data visualization

b515b6f

Part of #296

ruiAzevedo19 mentioned this issue Jul 30, 2024

HTML report for data visualization #299

Draft

ruiAzevedo19 added a commit that referenced this issue Jul 31, 2024

refactor, Convert models costs to numeric values when unmarshaling th…

fa6db2c

…e JSON response, to avoid these values to be converted latter on Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 31, 2024

refactor, Move the model's meta information structures from the provi…

3486ebc

…der to the model package, since it is model related Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 31, 2024

refactor, Extract the logic to fetch models from the openrouter API i…

162e998

…nto a helper, so to it can be reused Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 31, 2024

refactor, Use the built-in Golang function to open report files, sinc…

b1c89fb

…e it can error if the file already exists Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 31, 2024

Let the LLM models have meta information such as pricing and human-re…

cf3c0f1

…adable names Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 31, 2024

Store models meta information in a CSV file, so it can be further use…

152ecc2

…d in data visualization Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 31, 2024

Store models meta information in a CSV file, so it can be further use…

6dbdd09

…d in data visualization Part of #296

ruiAzevedo19 added a commit that referenced this issue Jul 31, 2024

HTML report for data visualization

c477647

Part of #296

ruiAzevedo19 modified the milestones: v0.6.0, v0.7.0 Aug 1, 2024

ruiAzevedo19 added a commit that referenced this issue Aug 1, 2024

refactor, Convert models costs to numeric values when unmarshaling th…

7bf5359

…e JSON response, to avoid these values to be converted latter on Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 1, 2024

refactor, Move the model's meta information structures from the provi…

2d04793

…der to the model package, since it is model related Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 1, 2024

refactor, Extract the logic to fetch models from the openrouter API i…

99ebb90

…nto a helper, so to it can be reused Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 1, 2024

refactor, Use the built-in Golang function to open report files, sinc…

71d0e8c

…e it can error if the file already exists Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 1, 2024

Let the LLM models have meta information such as pricing and human-re…

a31b419

…adable names Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 1, 2024

Store models meta information in a CSV file, so it can be further use…

37f2824

…d in data visualization Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 1, 2024

Store models meta information in a CSV file, so it can be further use…

e723387

…d in data visualization Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 1, 2024

Store models meta information in a CSV file, so it can be further use…

e86e508

…d in data visualization Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 2, 2024

refactor, Rename the function that sorts evaluation records to a more…

856fe8b

… generic name, since it can sort all kind of CSV records Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 2, 2024

Let the LLM models have meta information such as pricing and human-re…

e8a9929

…adable names Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 2, 2024

Store models meta information in a CSV file, so it can be further use…

e553c2a

…d in data visualization Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 2, 2024

refactor, Rename the function that sorts evaluation records to a more…

36dc3c6

… generic name, since it can sort all kind of CSV records Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 2, 2024

Let the LLM models have meta information such as pricing and human-re…

3c6c20f

…adable names Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 2, 2024

Store models meta information in a CSV file, so it can be further use…

7a6b88b

…d in data visualization Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 6, 2024

refactor, Convert models costs to numeric values when unmarshaling th…

7bdbb8f

…e JSON response, to avoid these values to be converted latter on Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 6, 2024

refactor, Move the model's meta information structures from the provi…

74d4990

…der to the model package, since it is model related Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 6, 2024

refactor, Extract the logic to fetch models from the openrouter API i…

7749312

…nto a helper, so to it can be reused Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 6, 2024

refactor, Use the built-in Golang function to open report files, sinc…

3f46219

…e it can error if the file already exists Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 6, 2024

refactor, Rename the function that sorts evaluation records to a more…

a4dfe3e

… generic name, since it can sort all kind of CSV records Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 6, 2024

Let the LLM models have meta information such as pricing and human-re…

88d69f0

…adable names Part of #296

ruiAzevedo19 added a commit that referenced this issue Aug 6, 2024

Store models meta information in a CSV file, so it can be further use…

38c7c6b

…d in data visualization Part of #296

Munsio pushed a commit that referenced this issue Aug 28, 2024

refactor, Convert models costs to numeric values when unmarshaling th…

516843d

…e JSON response, to avoid these values to be converted latter on Part of #296

Munsio pushed a commit that referenced this issue Aug 28, 2024

refactor, Move the model's meta information structures from the provi…

9fc4b96

…der to the model package, since it is model related Part of #296

Munsio pushed a commit that referenced this issue Aug 28, 2024

refactor, Extract the logic to fetch models from the openrouter API i…

887b1bd

…nto a helper, so to it can be reused Part of #296

Munsio pushed a commit that referenced this issue Aug 28, 2024

refactor, Use the built-in Golang function to open report files, sinc…

e6ffee1

…e it can error if the file already exists Part of #296

Munsio pushed a commit that referenced this issue Aug 28, 2024

refactor, Rename the function that sorts evaluation records to a more…

fc0e671

… generic name, since it can sort all kind of CSV records Part of #296

Munsio pushed a commit that referenced this issue Aug 28, 2024

Let the LLM models have meta information such as pricing and human-re…

ab700a7

…adable names Part of #296

Munsio pushed a commit that referenced this issue Aug 28, 2024

Store models meta information in a CSV file, so it can be further use…

2f527d2

…d in data visualization Part of #296

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data visualization based on evaluation CSV files #296

Data visualization based on evaluation CSV files #296

ruiAzevedo19 commented Jul 30, 2024 •

edited by bauersimon

Loading

bauersimon commented Aug 1, 2024 •

edited

Loading

Data visualization based on evaluation CSV files #296

Data visualization based on evaluation CSV files #296

Comments

ruiAzevedo19 commented Jul 30, 2024 • edited by bauersimon Loading

TODO

bauersimon commented Aug 1, 2024 • edited Loading

ruiAzevedo19 commented Jul 30, 2024 •

edited by bauersimon

Loading

bauersimon commented Aug 1, 2024 •

edited

Loading