Self-hosted + self-trained LLMs are probably the future for enterprise.
While consumers are happy to get their data mined to avoid paying, businesses are the opposite: willing to pay a lot to avoid feeding data to MSFT/GOOG/META.
They may give assurances on data protection (even here GitHub copilot TOS has sketchy language around saving down derived data), but can’t get around fundamental problem that their products need user interactions to work well.
So it seems with BigTechLLM there’s inherent tension between product competitiveness and data privacy, which makes them incompatible with enterprise.
Biz ideas along these lines: - Help enterprises set up, train, maintain own customized LLMs - Security, compliance, monitoring tools - Help AI startups get compliant with enterprise security - Fine tuning service
I'm always interested in seeing the prompt that drives these kinds of tools.
In this case it appears to be using RetrievalQA from LangChain, which I think is this prompt here: https://github.com/hwchase17/langchain/blob/v0.0.176/langcha...
Use the following pieces of context to answer the question at the end. If you don't
know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:
"System requirements" section should really mention what amount of RAM or VRAM is needed for inference.
These are the similar projects I've come across:
- [GitHub - e-johnstonn/BriefGPT: Locally hosted tool that connects documents to LLMs for summarization and querying, with a simple GUI.](https://github.com/e-johnstonn/BriefGPT)
- [GitHub - go-skynet/LocalAI: Self-hosted, community-driven, local OpenAI-compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. No GPU required. LocalAI is a RESTful API to run ggml compatible models: llama.cpp, alpaca.cpp, gpt4all.cpp, rwkv.cpp, whisper.cpp, vicuna, koala, gpt4all-j, cerebras and many others!](https://github.com/go-skynet/LocalAI)
- [GitHub - paulpierre/RasaGPT: RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram](https://github.com/paulpierre/RasaGPT)
- [GitHub - imartinez/privateGPT: Interact privately with your documents using the power of GPT, 100% privately, no data leaks](https://github.com/imartinez/privateGPT)
- [GitHub - reworkd/AgentGPT: Assemble, configure, and deploy autonomous AI Agents in your browser.](https://github.com/reworkd/AgentGPT)
- [GitHub - deepset-ai/haystack: Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-4, ChatGPT and alike). Haystack offers production-ready tools to quickly build complex question answering, semantic search, text generation applications, and more.](https://github.com/deepset-ai/haystack)
- [PocketLLM « ThirdAi](https://www.thirdai.com/pocketllm/)
- [GitHub - imClumsyPanda/langchain-ChatGLM: langchain-ChatGLM, local knowledge based ChatGLM with langchain | 基于本地知识库的 ChatGLM é—®ç”](https://github.com/imClumsyPanda/langchain-ChatGLM)
Got this working locally - badly needs GPU support (have a 3090 so come on!) there is some workaround but expect it will come pretty soon. This video was a useful walkthough esp on using different model and upping the CPU threads. https://www.youtube.com/watch?v=A3F5riM5BNE
I tried this on my M2 Macbook with 16gb of RAM but got:
"ggml_new_tensor_impl: not enough space in the context's memory pool (needed 18296202768, available 18217606000)"
One quick plug
I want to have the memory part of langchain down, vector store + local database + client to chat with an LLM (gpt4all model can be swapped with OpenAI api just switching the base URL)
https://github.com/aldarisbm/memory
It's still got ways to go, if someone wants to help let me know :)
Working on something similar that uses keyterm extraction for traversal of topics and fragments, without using Langchain. It's not designed to be private, however: https://github.com/FeatureBaseDB/DocGPT/tree/main
Wow. I keep a personal Wiki, Journal and use plain text accounting...
This project could help me create a personal AI which answers any questions to my life, finances or knowledge...
Quick how-to/demo:
https://www.youtube.com/watch?v=A3F5riM5BNE
Also has a suggestion of a few alternative models to use.
Hi, very interesting... what are the memory/disk requirements to run it? 16GB of RAM would be enough? I suggest to add these requirements to the README
Would someone do me the kindness of explaining (a little more) how this works?
It looks like you can ask a question and the model will use its combined knowledge of all your documents to figure out the answer. It looks like it isn't fine-tuned or trained on all the documents, is that right? How is each document turned into an embedding, and then how does the model figure out which documents to consult to answer the question?
When you split a document into chunks, doesn't some crucial information get cut in half? In that case, you'd probably lose that information in the context if that information was immediately followed by an irrelevant information that reduces the cosine similarity. Is there a "smarter" way to feed documents as context to LLMs?
This will still hallucinate, right?
Projects like this for using with your documents datasets are invaluable, but everything I've tried so far is hallucinating, so not practical. What's the state of the art of the LLM without hallucination at the moment?
This is a shortcut/workaround to transforming the private docs to a prompt:answer dataset and fine tuning right?
What would be the difference in user experience or information retrieval performance between the two?
My impression is it saves work on the dataset transformation and compute for fine tuning, so it must be less performant. Is there a reason to prefer the strategy here other than ease of setup?
Does something like this exist for local code repos? (Excuse my ignorance since the space is moving faster than light.)
With so many LLM options out there, how do we keep track of which ones are good?
For some reason, downloading the model they suggest keeps failing. I tried to download it in Firefox and Edge. I'm using Windows, if that matters. Anyone else seeing similar issues?
Is there a benchmark for retrieval from multiple ft documents? I tried the LangchainQA with Pinecone and wasn't impressed with the search result when using it on my Zotero library.
How many tokens/second on an average machine?
If you select a gpt4all model like GPT-J can this be used commercially or is there other dependency that limits the license?
Would this work better with something like llama or a instruction following model like alpaca?
So many good links here, thanks to the OP for sharing, and to all commenters as well!
does this only work with llamaCPP ? I.e. can't use GPU models with this?
Always wondering pros/cons of Chroma and Qdrant. Can someone tell me?
This is the future.
> Put any and all your files into the source_documents directory
Why? Why can't I define any directory (my existing Obsidian vault, for example) as the source directory?
[dead]
I posted it 9 days ago and somehow this one gets the attention. The same freaking post. Unbelievable
Granted I'm not coming from the python world, but I have tried many of these projects, and very few of them install out of the box. They usually end with some incompatibility, and files scattered all over the place, leading to future nightmares.
Just for fun, here's the result of python -m pip install -r ./requirements.txt for tortoise-tts;…many many lines
… I'm not asking for support, just saying if people really want to make something 'easy' they'd use docker. I gather there are better python package managers, but I gather that's a bit of a mess too.Someone is thinking "this is part of learning the language," but I think it's just bad design.