Running LLM locally is fascinating because we can deploy applications and do not need to worry about data privacy issues by using 3rd party services.
This article talks about how to deploy GPT4All on Raspberry Pi and then expose a REST API that other applications can use.
Installation
Preparation
- Raspberry Pi 4 8G Ram Model
- Raspberry Pi OS
Reference
- https://qengineering.eu/install-vulkan-on-raspberry-pi.html
- https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-bindings/python/README.md
- https://github.com/nomic-ai/gpt4all/issues/1530
Step-by-step instructions
Install Vulkan SDK
The dependencies we need:
sudo apt-get update sudo apt-get upgrade -y sudo apt-get install libxcb-randr0-dev libxrandr-dev -y sudo apt-get install libxcb-xinerama0-dev libxinerama-dev libxcursor-dev -y sudo apt-get install libxcb-cursor-dev libxkbcommon-dev xutils-dev -y sudo apt-get install xutils-dev libpthread-stubs0-dev libpciaccess-dev -y sudo apt-get install libffi-dev x11proto-xext-dev libxcb1-dev libxcb-*dev -y sudo apt-get install libssl-dev libgnutls28-dev x11proto-dri2-dev -y sudo apt-get install x11proto-dri3-dev libx11-dev libxcb-glx0-dev -y sudo apt-get install libx11-xcb-dev libxext-dev libxdamage-dev libxfixes-dev -y sudo apt-get install libva-dev x11proto-randr-dev x11proto-present-dev -y sudo apt-get install libclc-dev libelf-dev mesa-utils -y sudo apt-get install libvulkan-dev libvulkan1 libassimp-dev -y sudo apt-get install libdrm-dev libxshmfence-dev libxxf86vm-dev libunwind-dev -y sudo apt-get install libwayland-dev wayland-protocols -y sudo apt-get install libwayland-egl-backend-dev -y sudo apt-get install valgrind libzstd-dev vulkan-tools -y sudo apt-get install git build-essential bison flex ninja-build -y sudo apt-get install python3-mako -y
Build Vulkan API
sudo pip3 install meson # install mako sudo pip3 install mako # install v3dv cd ~ git clone -b 20.3 https://gitlab.freedesktop.org/mesa/mesa.git mesa_vulkan # build v3dv (± 30 min) cd mesa_vulkan CFLAGS="-mcpu=cortex-a72" \ CXXFLAGS="-mcpu=cortex-a72" \ meson --prefix /usr \ -D platforms=x11 \ -D vulkan-drivers=broadcom \ -D dri-drivers= \ -D gallium-drivers=kmsro,v3d,vc4 \ -D buildtype=release build ninja -C build -j4 sudo ninja -C build install # check your driver $ glxinfo -B
Build Python bindings
set up llmodel
git clone --recurse-submodules https://github.com/nomic-ai/gpt4all.git cd gpt4all/gpt4all-backend/ vim CMakeLists.txt
Find set(LLAMA_KOMPUTE YES)
, and comment it
Save the txt file, and continue with the following commands
mkdir build cd build cmake .. -DKOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK=ON cmake --build . --parallel
Make sure libllmodel.*
exists in gpt4all-backend/build
Setup Python package
cd ../../gpt4all-bindings/python pip3 install -e .
Test API Using Python
Now we can test GPT4All on the Pi using the following Python script:
import time from functools import wraps from gpt4all import GPT4All def timer(func): """A decorator that prints the execution time of the function it decorates.""" @wraps(func) def wrapper(*args, **kwargs): start_time = time.time() result = func(*args, **kwargs) end_time = time.time() print(f"{func.__name__} ran in: {end_time - start_time} sec") return result return wrapper @timer def ask(prompt): model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf") output = model.generate(prompt, max_tokens=512) if __name__ == "__main__": answer = ask("The capital of France is ") print(answer)
Results
It takes more than 2 minutes for Raspberry Pi to generate the following texts with a 3B model. It looks quite good on such a tiny machine, doesn’t it?