Automatic schema inference eliminates the need to manually define input and output schemas.
Preconfigured API endpoints, such as /invoke, /batchand /stream, that can handle multiple requests simultaneously.
Supervision
LangServe can be integrated with LangSmith tracing for real-time monitoring capabilities such as:
Monitor performance metrics, debug issues, and gain insights into application behavior.
Keep applications at a high level of performance.
LangServe provides a playground for both technical and non-technical users to interact with and test the application: it supports streaming outputs, logging of intermediate steps, and configurable options for fine-tuning applications. LangServe also automatically generates API documentation.
Deployment with LangServe can be done using GitHub for one-click email leads database deployment and is compatible with multiple hosting platforms such as Google Cloud and Replit.
Key Components of LlamaIndex
LlamaIndex equips LLMs with the ability to add RAG functionality to the system using external knowledge sources, databases and indexes as query engines for in-memory purposes.
Typical LlamaIndex Workflow
Indexing phase
During this stage, your private data is effectively converted into a searchable vector index. LlamaIndex can process various types of data, including unstructured text documents, structured database records, and knowledge graphs.
Data is transformed into numerical embeddings that capture their semantic meaning, allowing for rapid similarity searches. This stage ensures that all relevant information is indexed and ready for rapid retrieval.
Storage
Once you've loaded and indexed data, you'll want to store it to avoid the time and cost of re-indexing it. By default, indexed data is stored in memory only, but there are ways to persist it for future use.
The simplest method is to use the method .persist(), which writes all data to disk in a specified location. For example, after creating an index, you can use the method .persist()to save the data to a directory.
To reload persistent data, you will need to rebuild the storage context from the saved directory and then load the index using this context. This will quickly resume the stored index, saving time and computing resources.
You can learn how to do this in our complete LlamaIndex tutorial .
Vector stores
Vector stores are useful for storing embeddings created during the indexing process.
Scale
LlamaIndex uses the OpenAItext-embedding-ada-002 default to generate these embeddings. Depending on the LLM used, different embeddings may be preferable for efficiency and computational cost.
The VectorStoreIndex converts all text into embeddings using an LLM API. When querying, the input query is also converted into an embedding and ranked. The index returns the top k similar embeddings as text fragments.
To retrieve the most relevant data, a method known as "top-k semantic retrieval" is used.
If embeddings are already created and stored, you can load them directly from the vector store, without needing to reload documents or re-create the index.
A synthetic index is a simpler form of indexing that is more suitable for generating summaries from text documents. It stores all documents and returns them to the query engine.