For example, for us, we are building an LLM chatbot that pulls in the data of a technical book publisher. They have 20 years of technical books, and 20 years of videotaped conference talks.
Hard:
- We're using LangChain, which isn't always great
- The data pipeline was trickier than I had initially thought
- Indexing embeddings (in PostGres) is just hard (requires tons of ram)
But the hardest thing has been working on conversation quality. We've started to use LangSmith, which was a godsend for tracing and observability, and came out fairly recently. But it's not perfect and I wish there were better tools out there.
Chain of thought is underutilized. It almost never makes sense to show the user the "bare" response of the LLM. It's so easy to have LLMs self-critique, think through user intent, etc. to drastically improve the final output