# Retrieval Augmented Generation

The IONOS AI Model Hub allows you to combine Large Language Models and a vector database to implement Retrieval 
Augmented Generation use cases. 

Retrieval Augmented Generation is an approach that allows you to teach an existing Large Language Model, such as LLama 
or Mistral, to answer not only based on the knowledge the model learned during training, but also based on the knowledge you specified yourself.

Retrieval Augmented Generation uses two components: 
* a Large Language Model (we offer corresponding models for [<mark style="color:blue;">text generation</mark>](https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/text-generation)) and
* [<mark style="color:blue;">Document Collections</mark>](https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/document-collections)

If one of your users queries your Retrieval Augmented Generation system, you first get the most similar documents from the
corresponding document collection. Second, you ask the Large Language Model to answer the query by using both the knowledge
it was trained on and the most similar documents from your document collection.

## Overview

This tutorial is intended for developers. It assumes you have basic knowledge of:

* REST APIs and how to call them
* A programming language to handle REST API endpoints (for illustration purposes, the tutorials use Python and Bash scripting)

You should also be familiar with the IONOS:

* [<mark style="color:blue;">Text Generation</mark>](https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/text-generation)
* [<mark style="color:blue;">Document Collections</mark>](https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/document-collections)

By the end of this tutorial, you'll be able to: Answer customer queries using a Large Language Model which
adds data from your document collections to the answers.

## Background

* The IONOS AI Model Hub API offers both document collections and Large Language Models that you can use to implement retrieval augmented generation without having to manage corresponding hardware yourself.
* Our AI Model Hub API provides all required functionality without your data being transferred out of Germany.

### Prerequisite: Access API Token from environment variable

We strongly suggest that you save your IONOS API token as environment variable in your operating system. You can then access it using the following lines of code:

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()
IONOS_API_TOKEN = os.getenv('IONOS_API_TOKEN')

#### Before you begin

To get started,  set up a document collection using [<mark style="color:blue;">Document Collections</mark>](https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/document-collections) and get the **identifier** of this document collection.

You will need this **identifier** in the subsequent steps. Please set the variable COLLECTION_ID to this identifier. To make this documentation self contained, we generate a minimal document collection and use the corresponding identifier in the remainder of this document:

In [2]:
import requests
import base64

header = { "Authorization": f"Bearer {IONOS_API_TOKEN}", "Content-Type": "application/json" }

COLLECTION_NAME = 'test collection'
body = {"properties": {"name": COLLECTION_NAME }}
response = requests.post("https://inference.de-txl.ionos.com/collections", json=body, headers=header)
COLLECTION_ID = response.json()['id']

CONTENT = "IONOS hosts your AI workloads"
content_base64 = base64.b64encode(CONTENT.encode('utf-8')).decode("utf-8")
body = {"items": [{"properties": {"contentType": "text/plain", "name": "test", "content": content_base64} }] }
requests.put(f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/documents", json=body, headers=header)

<Response [200]>

### Step 1: Retrieve Available Models

Fetch a list of models to see which are available for your use case:

In [3]:
import requests

endpoint = "https://inference.de-txl.ionos.com/models"

header = {
    "Authorization": f"Bearer {IONOS_API_TOKEN}", 
    "Content-Type": "application/json"
}
requests.get(endpoint, headers=header).json()

{'href': 'https://inference.de-txl.ionos.com/models',
 'id': '32bec5d9-6174-5f08-8ed1-881e226ccf74',
 'items': [{'id': '0b6c4a15-bb8d-4092-82b0-f357b77c59fd',
   'metadata': {'createdDate': '2024-12-04T14:07:23.474Z',
    'lastModifiedDate': '2024-12-04T14:07:23.477Z'},
   'properties': {'category': 'text/nlp',
    'description': 'meta-llama/Meta-Llama-3.1-405B-Instruct-FP8',
    'name': 'meta-llama-3-1-405b-instruct-fp8'},
   'type': 'model'},
  {'id': '8965182de-4f5f-47cd-98c7-f71a6330fc6bb',
   'metadata': {'createdDate': '2024-12-04T14:07:23.505Z',
    'lastModifiedDate': '2024-12-04T14:07:23.508Z'},
   'properties': {'category': 'text/nlp',
    'description': 'meta-llama/Meta-Llama-3.1-70B-Instruct',
    'name': 'meta-llama-3-70b-instruct'},
   'type': 'model'},
  {'id': 'b2a34c2d-82a0-42db-949d-5c11197b0f65',
   'metadata': {'createdDate': '2024-12-04T14:07:23.557Z',
    'lastModifiedDate': '2024-12-04T14:07:23.561Z'},
   'properties': {'category': 'text/nlp',
    'description': 

This query returns a JSON document consisting of all foundation models and corresponding meta information.
   
The JSON document consists an entry **items***. This is a list of all available foundation models. Of the 7 attributes per foundation model 3 are relevant for you:
* **id**: The identifier of the foundation model 
* **properties.description**: The textual description of the model
* **properties.name**: The name of the model

**Note:** The identifiers for the foundation models differ between our API for Retrival Augmented Generation and for the
image generation and text generation endpoints compatible with OpenAI.

From the list you generated in the previous step, choose the model you want to use and the **id**. You will use this **id**
in the next step to use the foundation model.

## Manual retrieval augmented generation

This section shows how to use the document collection and the contained documents to answer a user query.

#### Step 1: Retrieve relevant documents

To retrieve the documents relevant to answering the user query, invoke the **query** endpoint as follows:

In [4]:
import requests
import base64

USER_QUERY = "What does IONOS do?"
NUM_OF_DOCUMENTS = 1

endpoint = f"https://inference.de-txl.ionos.com/collections/{COLLECTION_ID}/query"
body = {"query": USER_QUERY, "limit": NUM_OF_DOCUMENTS }
relevant_documents = requests.post(endpoint, json=body, headers=header)

relevant_documents_decoded = [
    base64.b64decode(entry['document']['properties']['content']).decode()
    for entry in relevant_documents.json()['properties']['matches']
]

relevant_documents_decoded

['IONOS hosts your AI workloads']

This will return a list of the `NUM_OF_DOCUMENTS` most relevant documents in your document collection for answering the user query. 

#### Step 2: Generate final answer

Now, combine the user query and the result from the document collection in one prompt:

In [5]:
import requests

MODEL_ID = '8965182de-4f5f-47cd-98c7-f71a6330fc6bb'
endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
prompt = f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be an honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {"; ".join(relevant_documents_decoded)}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {USER_QUERY}<|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
"""
body = { "properties": {"input": prompt} }
response = requests.post(endpoint, json=body, headers=header)
response.json()['properties']['output']

'    IONOS hosts AI workloads.'

The result will be a `JSON-Document` consisting of the answer to the customer and some meta information. You can access it in the field 
at **properties.output**

**Note:** The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.

## Automated Retrieval Augmented Generation

The IONOS AI Model Hub allows for automating the process described above. Namely, by specifying the collection ID and the collection 
query directly to our foundation model endpoint, it first queries the document collection and returns it in a variable which you can 
then directly use in your prompt. This section describes how to do this.

### Apply combined retrieval augmented generation prompt to foundation model

To implement a Retrieval Augmented Generation use case with only one prompt, you have to invoke the **/predictions** endpoint of 
the Large Language Model you want to use and send the prompt as part of the body of this query:

In [6]:
import requests

endpoint = f"https://inference.de-txl.ionos.com/models/{MODEL_ID}/predictions"
body = { "properties": {
    "input": f"""
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Please use the information specified as context to answer the question.
    Formulate you answer in one sentence and be a honest AI.<|eot_id|>
    <|begin_of_text|><|start_header_id|>context<|end_header_id|>
    {{{{.context}}}}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    {{{{.collection_query}}}} <|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    """,
    "collectionId": COLLECTION_ID,
    "collectionQuery": USER_QUERY,
    "options": {  
        "max_length": "500",  
        "temperature": "0.01"
    }  
}}
response = requests.post(endpoint, json=body, headers=header)
response.json()['properties']['output']

' IONOS hosts your AI workloads.'

This query conducts all steps necessary to answer a user query using Retrieval Augmented Generation:

* The user query (saved at collectionQuery) is sent to the collection (specified at collectionId).
* The results of this query are saved in a variable **.context**, while the user query is saved in a variable **.collection_query**.
  You can use both variables in your prompt.
* The example prompt uses the variables **.context** and **.collection_query** to answer the customer query.

**Note:** The best prompt strongly depends on the Large Language Model used. You might need to adapt your prompt to improve results.  

## Summary
In this tutorial, you learned how to use the IONOS AI Model Hub API to implement Retrieval Augmented Generation
use cases.

Namely, you learned how to: Derive answers to user queries using the content of your document collection and one of the IONOS foundation models.