Skip to content

build example/main.cpp as shared library and intercept token printing using FFI #8339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mtasic85
Copy link
Contributor

@mtasic85 mtasic85 commented Jul 6, 2024

build example/main.cpp as shared library libllama-cli.so
set custom STDOUT and STDERR file descriptors
set custom fprintf and fflush functions to intercept token generation in shared library

this would allow FFI integration such as this one in python:

from ctypes import *

#
# open shared library
#
lib = CDLL('./libllama-cli.so')
lib.llama_cli_main.argtypes = [c_int, POINTER(c_char_p)]
lib.llama_cli_main.restype = c_int

#
# redefine fprintf and fflush
#
@CFUNCTYPE(c_int, c_void_p, c_char_p, c_char_p)
def fprintf(file_obj, fmt, *args):
    content = fmt.decode('utf-8') % tuple(arg.decode('utf-8') for arg in args)
    print(content, flush=True, end='')
    size = len(content)
    return size


@CFUNCTYPE(c_int, c_void_p)
def fflush(file_obj):
    print(flush=True, end='')
    return 0


lib.llama_set_fprintf(fprintf)
lib.llama_set_fflush(fflush)

#
# generate and print token by token
#
argv: list[bytes] = [
    b'llama-cli',
    b'-m',
    b'models/7B/ggml-model.bin',
    b'--no-display-prompt',
    b'--simple-io',
    b'--log-disable',
    b'-p',
    b'What is cosmos?',
]

argc = len(argv)
argv = (c_char_p * argc)(*argv)
res = lib.llama_cli_main(argc, argv)
assert res == 0

…STDOUT and STDERR file descriptors; set custom fprintf and fflush functions to intercept token generation in shared library
@mtasic85
Copy link
Contributor Author

mtasic85 commented Jul 7, 2024

This would allow wrapper libraries in other programming languages not to patch llama.cpp source code (which is the case in few WASM libraries/packages) but to be able to build versions of llama.cpp that is easier to integrate via their own FFI.

@ggerganov
Copy link
Member

This is not the intention of the example - if we accept this, then we will have to maintain this new library and be careful about things like backwards compatibility, etc. 3rd party projects should interface only with the llama and ggml libraries

@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Jul 7, 2024
@mtasic85
Copy link
Contributor Author

mtasic85 commented Jul 7, 2024

There are four local functions defined/declared, and one conditionally function (llama_cli_main) defined if main.cpp is compiled as shared library. All 5 functions are local to main.cpp and should not by any means affect llama / ggml.

This can be seen as few very-high level API calls that take care of most of the work.

On the other hand, integration of llama.cpp to existing projects would be way easier if someone does not want to deal with internals of llama/ggml. We could be able to compile llama-cli (main.cpp), use FFI library of choice and immediately use llama.cpp.

Instead of waiting on maintainers of bindings/wrappers for llama.cpp to updated their packages, we would only depend on llama.cpp and FFI library (for python that is usually ctypes or cffi).

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants