Skip to content
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.

Build issues on Mac #1

Closed
Dimitrije-V opened this issue Apr 11, 2023 · 7 comments
Closed

Build issues on Mac #1

Dimitrije-V opened this issue Apr 11, 2023 · 7 comments

Comments

@Dimitrije-V
Copy link

When building on mac, it is not possible to use cmake -D CMAKE_EXE_LINKER_FLAGS="-static" ..
As it returns:

ld: library not found for -lcrt0.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [bin/codegen] Error 1
make[2]: *** [examples/codegen/CMakeFiles/codegen.dir/all] Error 2
make[1]: *** [examples/codegen

Instead, we need to use cmake ..

Furthermore, I needed to amend the file ggml/examples/codegen/CMakeLists.txt to explicitly find the boost headers, with:

find_package(Boost REQUIRED)
include_directories(${Boost_INCLUDE_DIRS})

Finally, I had to change a few lines in ggml/examples/codegen/serve.cpp to stop further build errors:

Line 54: crow::json::wvalue response = {{"token","1"}, {"expires_at", 2600000000}, {"refresh_in",900}}; had to change to:
crow::json::wvalue response = {{"token","1"}, {"expires_at", static_cast<std::uint64_t>(2600000000)}, {"refresh_in",900}};

Line 191: {"logprobs", NULL} had to change to: {"logprobs", nullptr}

Line 198: {"prompt_tokens", embd_inp.size()}, had to change to: {"prompt_tokens", static_cast<std::uint64_t>(embd_inp.size())},

Line 199: {"total_tokens", n_past + embd_inp.size()} had to change to: {"total_tokens", static_cast<std::uint64_t>(n_past + embd_inp.size())}

Line 206: {"created", std::time(NULL)}, had to change to: {"created", static_cast<std::int64_t>(std::time(nullptr))},

After all of these changes, I was finally able to get this tool working. I've opened a PR with the fixes I made:
ravenscroftj/ggml#1

@Dimitrije-V
Copy link
Author

Dimitrije-V commented Apr 11, 2023

If this PR is reviewed and merged, I'm happy to write up detailed installation steps for Mac within the README.

@dabdine
Copy link

dabdine commented Apr 12, 2023

@Dimitrije-V Thanks, this PR works on my end.

For those wondering which dependencies you need, you'll need cmake and boost, which you can install with Homebrew:

brew install cmake boost

In terms of performance, my 14" 2021 M1 MacBook Pro returned a response via the API within about 20-30 seconds. It took several minutes to complete a request when used with fauxpilot in an existing python project. I didn't investigate why. I assume it's due to the increased token count.

@ravenscroftj
Copy link
Owner

ravenscroftj commented Apr 12, 2023

This is awesome thank you @Dimitrije-V for your contributions - I have merged your PR - really appreciate your contribution.

@dabdine thank you for reporting your performance. I'd also be interested to know which model you were using (2b or 6b or something else?) and how the -t switch affects performance on mac. GGML is fully ARM NEON compatible and should be doing performant things using the apple silicon.

I opened issue #3 separately to track the slow completion for long inputs.

I will also add some notes to the BUILD.md with both of your observations.

Thanks again!

@dabdine
Copy link

dabdine commented Apr 12, 2023

This is awesome thank you @Dimitrije-V for your contributions - I have merged your PR - really appreciate your contribution.

@dabdine thank you for reporting your performance. I'd also be interested to know which model you were using (2b or 6b or something else?) and how the -t switch affects performance on mac. GGML is fully ARM NEON compatible and should be doing performant things using the apple silicon.

I opened issue #3 separately to track the slow completion for long inputs.

I will also add some notes to the BUILD.md with both of your observations.

Thanks again!

Great! I'll give it another shot today. Do you have any specific scenarios in mind for performance testing (number of threads, prompt, etc)?

@ravenscroftj
Copy link
Owner

Thanks very much @dabdine - what I might do is write some proper benchmark scripts and standard prompts that can be tested after compile but for now can I get you to try a short python prompt - maybe something like the below:

import os
import json

def main():
   """this is the main function that opens the file and loads the json data"""

And a longer python prompt (perhaps you can load the convert ggml script from the repo and go to the bottom of the file)

With the 2021 M1 MBP I believe you have 6 "performance" cores? So maybe try -t 6 and see how that goes?

For reference I'm able to get sub 10 second generation on my AMD Ryzen 5000 for the first prompt and the 2B model with -t 6.

@tectiv3
Copy link

tectiv3 commented Apr 14, 2023

From M2 Pro:

./bin/codegen -t 10 -m ../../models/codegen-6B-multi-ggml-4bit-quant-001.bin -p 'import os                                                                                                     (22s 939ms)
                                          import json

                                          def main():
                                             """this is the main function that opens the file and loads the json data"""'
main: seed = 1681443411
gptj_model_load: loading model from '../../models/codegen-6B-multi-ggml-4bit-quant-001.bin' - please wait ...
gptj_model_load: n_vocab = 51200
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 33
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 5269.92 MB
gptj_model_load: memory_size =  1056.00 MB, n_mem = 67584
gptj_model_load: ......................................... done
gptj_model_load: model size =  4213.84 MB / num tensors = 335
main: number of tokens in prompt = 29

import os
import json

def main():
   """this is the main function that opens the file and loads the json data"""

  path = 'test_data/output_data.json'
  file = open(path, 'r')
  data = json.load(file)

  #print data

  return data


if __name__ == '__main__':
  main()<|endoftext|>

main: mem per token = 18109648 bytes
main:     load time =  1242.47 ms
main:   sample time =     9.70 ms
main:  predict time =  8380.64 ms / 89.16 ms per token
main:    total time = 10096.66 ms

when I run with -t 12 - it takes over 20 sec.

./bin/codegen -t 8 -m ../../models/codegen-6B-multi-ggml-4bit-quant-001.bin -p 'import os                                                                                                      (12s 872ms)
                                          import json

                                          def main():
                                             """this is the main function that opens the file and loads the json data"""'
main: seed = 1681443538
gptj_model_load: loading model from '../../models/codegen-6B-multi-ggml-4bit-quant-001.bin' - please wait ...
gptj_model_load: n_vocab = 51200
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 33
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 5269.92 MB
gptj_model_load: memory_size =  1056.00 MB, n_mem = 67584
gptj_model_load: ......................................... done
gptj_model_load: model size =  4213.84 MB / num tensors = 335
main: number of tokens in prompt = 29

import os
import json

def main():
   """this is the main function that opens the file and loads the json data"""

    # open json file
    with open("/home/pi/raspi-weather/json/weather.json") as json_data:
        data = json.load(json_data)
        # print the data out
        print data

if __name__ == "__main__":
    main()
<|endoftext|>

main: mem per token = 18109616 bytes
main:     load time =  1242.58 ms
main:   sample time =    10.98 ms
main:  predict time =  4640.35 ms / 44.62 ms per token
main:    total time =  6138.13 ms

The results are wildly inconsistent though.

@ravenscroftj
Copy link
Owner

ravenscroftj commented Apr 15, 2023

thanks for sharing these results @tectiv3 - certainly food for thought.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants