EndAI  banner

EndAI

2 devlogs
10h 2m 19s

Run AI all offline with ease! Also use my custom trained EndAI models!
This is for Lockin week 3.

This project uses AI

I have used AI for the frontend, some bugs in the code that I was stuck on and some debugging. Used especially before I wrote it in python, in the old abandoned C part.

Demo Repository

Loading README...

Bram (QKing)

EndAI – Devlog #2

Hello hello! This is the second devlog for EndAI.

What’s done so far:

Everything from the previous devlog.
I also have extended the webui with several options.
I have also made 2 models that can be run in the webui.
I have uploaded them to huggingface so you all can use them.
https://huggingface.co/QKing-Official/EndAI
https://huggingface.co/QKing-Official/EndAI-Small

What’s next:

Read the comments and you will know! But I think I will leave this untouched until stardance.

That’s it for now. I will continue coding, but on another project! Cya! Thanks for the support and make sure to check #coding-with-bram and #scared-of-ai

Attachment
0
Bram (QKing)

EndAI – Devlog #1

Hey! This is the first devlog for EndAI.

This project is a local AI chat system that runs GGUF models using llama.cpp through Python. The goal is to load models, chat with them, and manage sessions in one simple server.

What’s done so far:

The backend is written in Python using Flask. It can load GGUF models and run them locally. It detects your hardware (CUDA, ROCm, Metal, or CPU) and automatically decides how to use it for better performance.

You can load and unload models without restarting the server. There is also a basic downloader to fetch models in the background and track progress while they download.

I added multiple prompt templates so different model formats work properly (like ChatML, Llama 2, Llama 3, Mistral, Alpaca). There is also a simple token counter and a system to trim long chats so the model does not run out of context.

Chat sessions are saved in a JSON file so you don’t lose your conversations.

What’s next:

More stability and cleanup. The system works, but it still needs polishing and better structure before I expand it further.

Why it took longer:

This project took longer than expected because llama.cpp didn’t behave properly at first. I also mixed different languages and approaches while building it, so I had to rewrite parts of the code from scratch after realizing it was getting messy and inconsistent. Basically a lot of trial, error, and fixing my own confusion.

That’s it for now. I will continue coding! Cya!

Attachment
0