I have a SQL database, and a schema document that goes with it, is there any kind of AI I can self host and train it on this data?
The goal would be to ask it simple questions like “from which table in [dbname] can I find a list of products I’ve sold in the last x days”
Would be even better if I could ask it to write some queries to find exactly what I’m after.
What kind of hardware would I need to run something like this, if it exists?
Vector embeddings with ChromaDB. Basically you pre compute the word embeddings of every row / table / whatever granularity you want and then stick that into a vector DB. Then you do an embedding computation of your query and compare similarity. You can either return the table / row / whatever you want that’s most similar (“semantic search”) or you use that as context for an LLM (“RAG”)
You could use something like https://huggingface.co/defog/sqlcoder-34b-alpha / https://github.com/defog-ai/sqlcoder however I haven’t used this one myself but it builds up on CodeLlama model so the quality should be good. I haven’t seen other models specifically for SQL queries yet.
Edit: it includes all the hardware requirements and some demo examples in the links
New Lemmy Post: Self hosted AI to train on an existing database (https://lemmy.world/post/9013086)
Tagging: #SelfHosted(Replying in the OP of this thread (NOT THIS BOT!) will appear as a comment in the lemmy discussion.)
I am a FOSS bot. Check my README: https://github.com/db0/lemmy-tagginator/blob/main/README.md