Make BFCL User-Friendly and Easy to Extend #510

devanshamin · 2024-07-07T23:12:56Z

Thank you for open-sourcing BFCL and for your efforts in maintaining it. As I explored the codebase, I noticed some areas for improvement, including duplicate functions, constants, and variables that are inaccessible outside their functions, as well as a lack of abstract classes/methods.

To address these issues, I've begun refactoring the codebase to make it more straightforward to follow, customize, install, and extend. Although this work is still in progress, I wanted to share it with the contributors to seek feedback and gauge interest in reviewing or contributing to the PR. I understand that it’s a significant refactor and may require considerable time and effort, so any assistance or feedback would be greatly appreciated.

To-Do:

Refactor

Test

Dependencies
benchmark -> OSS and Proprietary model handlers
evaluate

Goal: To make BFCL similar to lm-evaluation-harness but for function calling.

Please let me know your thoughts and if you would be interested in reviewing or contributing to this PR.

- Move `model_handler` to `bfcl/model_handler` - Separate `oss` and `proprietary` model - Move java and javascript parsers to `bfcl/parser` - Standardize model handlers and remove duplicate methods

- Test data compilation handled by `bfcl/types.py:LeaderboardCategories.load_data` method

ShishirPatil · 2024-07-07T23:45:49Z

Hey @devanshamin Thank you so much for flagging and suggesting improvements! We agree with everything you mentioned. If it helps, one design decision we have adopted is that when in doubt, defer to simplicity and ease of code readability, given this is an OSS. re: landing the PR: We are absolutely delighted you want to contribute to the Berkeley Function Calling Leaderboard (BFCL) project, and will be absolutely on-board to review and land this PR! Welcome aboard mate!

devanshamin · 2024-07-08T01:17:12Z

Hey @devanshamin Thank you so much for flagging and suggesting improvements! We agree with everything you mentioned. If it helps, one design decision we have adopted is that when in doubt, defer to simplicity and ease of code readability, given this is an OSS. re: landing the PR: We are absolutely delighted you want to contribute to the Berkeley Function Calling Leaderboard (BFCL) project, and will be absolutely on-board to review and land this PR! Welcome aboard mate!

Awesome! I'm glad to hear you're on board. Keeping the theme of simplicity in mind, I was thinking of coming up with a detailed plan for the refactor and getting your feedback on it. @HuanzhiMao reached out to me, and we are planning on setting up a Zoom call. During the call, I can go over the changes that I have made and the plan, and hear your thoughts and feedback on how to move forward. After the meeting, I can write up a draft with next steps which we can track over here.

- poetry build system is no longer used

- To allow for separate dependencies for oss and proprietary model

- test category is already added to each example during loading the data

…Patil#496)

- Use same test groups for benchmarking and evaluation - Add a custom enum class with intuitive methods to dynamically create test groups - Use custom enum to reduce manually creation of test groups - Update benchmark cli args to accept test group argument - Add pydantic validator to validate test group and test categories

devanshamin · 2024-07-09T13:44:19Z

Here is an article outlining steps on merging this PR - #521

- Load original json test data files - Add `id` and `test_category` keys to each example - Save model responses for each test category in a separate file

- Single cli entrypoint with subcommands to run benchmark and evaluation

This PR aims to improve the organization and distribution of the codebase by packaging the BFCL codebase. This PR is part of a series of changes that break down the tasks outlined in #510. --------- Co-authored-by: Huanzhi Mao <[email protected]>

This PR reorganizes the model handler by splitting it into two distinct components: an Open Source (OSS) model handler and a Proprietary model handler. This change is part of a series of updates that address the tasks outlined in issue #510. --------- Co-authored-by: Huanzhi Mao <[email protected]>

devanshamin added 9 commits July 7, 2024 18:47

Move *.jsonl files from eval_checker to data dir

b1b67b1

Add pyproject.toml file

1a0a932

Ignore poetry.lock and .cache dir

ae89841

Add .env.example containing all the env vars

76e1bde

Remove changelog from README

e11240d

Refactor model_handler

ebc2142

- Move `model_handler` to `bfcl/model_handler` - Separate `oss` and `proprietary` model - Move java and javascript parsers to `bfcl/parser` - Standardize model handlers and remove duplicate methods

Move eval_checker to bfcl/eval_checker

4121e51

Add benchmark module

12bdeed

Remove eval_data_compilation

837c767

- Test data compilation handled by `bfcl/types.py:LeaderboardCategories.load_data` method

devanshamin added 6 commits July 8, 2024 18:03

Remove poetry.lock

1e0004f

- poetry build system is no longer used

Add hugging face hub token

e52d531

Update build system

f0833ed

Move functionary from oss_model to proprietary_model

1e8da5a

- To allow for separate dependencies for oss and proprietary model

Fix type error

34a170a

Remove test category

f736521

- test category is already added to each example during loading the data

devanshamin force-pushed the refactor_bfcl branch from 1cd7222 to f736521 Compare July 8, 2024 18:04

Make eval_checker consistent with main branch by merging (Shishir…

893c9af

…Patil#496)

devanshamin force-pushed the refactor_bfcl branch 2 times, most recently from 4d8a86c to 58a0648 Compare July 9, 2024 01:05

devanshamin force-pushed the refactor_bfcl branch from 58a0648 to 88e8462 Compare July 9, 2024 01:53

devanshamin added 6 commits July 9, 2024 19:32

Improve test data downloading and saving model responses

cb7349a

- Load original json test data files - Add `id` and `test_category` keys to each example - Save model responses for each test category in a separate file

Support benchmarking of proprietary models

a4a1c4f

Replaced with bfcl/benchmark.py

795d959

Add relevance evaluator

c7c5167

- Single cli entrypoint with subcommands to run benchmark and evaluation

Rename benchmark to llm_generation

1605012

Rename evaluate to evaluation

90a6bde

devanshamin added 10 commits July 11, 2024 12:41

Update sub-commands

fb0a599

Add evaluation for executable group

a42fd29

Standardize checker result

fa2694a

Convert checker from module to directory

09384e3

Add evaluation for ast group

7bd671e

Remove eval_checker dir

7c65495

Generate bfcl leaderboard result csv file

159039d

Fix issue of incorrect test category comparison

707e2bd

Update comments

e85ca86

Add new readme

3f73201

devanshamin force-pushed the refactor_bfcl branch from d7847fd to 3f73201 Compare July 13, 2024 18:49

Fix evaluation section

15b9c6a

HuanzhiMao changed the base branch from main to dev/huanzhi July 14, 2024 01:50

HuanzhiMao force-pushed the dev/huanzhi branch from 7bef000 to 897e068 Compare July 14, 2024 02:11

HuanzhiMao changed the base branch from dev/huanzhi to main July 15, 2024 06:16

update package dependency version

e0645b1

devanshamin mentioned this pull request Aug 1, 2024

[BFCL] Package the Codebase #565

Merged

devanshamin mentioned this pull request Aug 28, 2024

[BFCL] Refactor Model Handler into OSS and Proprietary Components #612

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make BFCL User-Friendly and Easy to Extend #510

Make BFCL User-Friendly and Easy to Extend #510

devanshamin commented Jul 7, 2024 •

edited

Loading

ShishirPatil commented Jul 7, 2024

devanshamin commented Jul 8, 2024

devanshamin commented Jul 9, 2024

Make BFCL User-Friendly and Easy to Extend #510

Are you sure you want to change the base?

Make BFCL User-Friendly and Easy to Extend #510

Conversation

devanshamin commented Jul 7, 2024 • edited Loading

To-Do:

Refactor

Test

ShishirPatil commented Jul 7, 2024

devanshamin commented Jul 8, 2024

devanshamin commented Jul 9, 2024

devanshamin commented Jul 7, 2024 •

edited

Loading