IBM VEST Workshops

Integrating Watson Discovery with watsonx.ai foundation models

This lab will show you how we can use Watson Discovery and watsonx.ai together to solve a Retrieval-Augmented Generation (RAG) use case. By combining watsonx.ai LLM (Large Language Model) foundation models with an existing knowledge base of data, we can garner more insightful responses from watsonx that are based on known data and facts.

What is RAG?

For context, RAG or Retrieval-Augmented Generation is a now common AI framework used for improving the quality of LLM-generated responses by grounding the model using external sources of knowledge to supplement an LLM’s internal representation of information.

The use of external sources provides a few benefits, such as improving the quality of responses from an LLM as well as the overall trustworthiness of those responses.

This is done, in part, by providing an LLM with the most current, reliable facts (i.e. knowledge base) and guaranteeing that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted.

Similarly, by grounding an LLM on a set of external, verifiable facts, the model has fewer opportunities to pull information baked into its parameters or biases from its training data. This reduces the chances that an LLM will leak sensitive data or ‘hallucinate’ incorrect or misleading information.

Watson Discovery

Watson Discovery is an AI-powered platform developed by IBM, accessible through the cloud and designed to assist businesses in deriving meaningful insights from unstructured data. Through utilizing natural language processing (NLP) and machine learning techniques, Watson Discovery helps organizations uncover valuable information from a diverse range of sources, such as text documents and web content. It's an integral part of IBM's suite of AI tools known as Watson.

In this lab, we will make use of a knowledge base consisting of some financial documents, which are ingested into Watson Discovery. We will then make use of watsonx.ai to ask some insightful questions about the financial documents based on the knowledge base. A link to the financial data used can be found here.

Ingesting Data into Watson Discovery

For this section of the lab, we will showcase how you can ingest data into Watson Discovery to create a knowledge base that can later be used within watsonx.ai.

If you are completing this lab in a workshop setting, a Watson Discovery instance with a collection of documents may already be set up for you; ask your lab instructor. Otherwise, you can create your own Watson Discovery instance from IBM Cloud with your own collection of documents. Steps for how to do so can be found below.

Creating a project in Watson Discovery

Before we ingest data into Watson Discovery, we will start by creating a new project.

  1. From the main Watson Discovery home screen, click the New Project button.

    create new project
  2. On the next screen, give your new project a name and select the Document Retrieval project type.

    name new project
  3. On the next screen, select the Upload Data option so you can upload your own data into your Watson Discovery project.

    upload data
  4. On the next screen, give your new collection a new name, such as Financial Documents, and select the OCR option under more processing settings.

    discovery collection name
  5. On this final screen, drag and drop your data for upload using the dialog box. Note the file limitations.

    For consistency of this lab, you can find a link to the documents used for in this labs collections here.

    discovery upload data
  6. If uploaded successfully, you should see the total number of uploaded documents under the card with the collection you named earlier.

    discovery main collections page
  7. If you see missing documents or would like to investigate the uploaded data for processing, you can click on your named collection and view additional details. Any issues related to processing of your data will be noted by Watson Discovery on this activity tab page.

    discovery collections detail page
  8. Under the Manage data tab, review the individual documents that make up your collection.

    discovery collections manage data page
  9. Finally, if you want to integrate your Watson Discovery collection into other services, you can view the project_id associated with your Watson Discovery collection under the Integrate and deploy left-hand side-bar item.

    discovery project id

Run the Watson Discovery + watsonx.ai lab

In conjunction with Watson Discovey as our knowledge base, we will be making use of watsonx.ai foundation models to answer questions about the documents in our Watson Discovery Collection.

To run the lab for this section, we will start by logging into the watsonx platform. After navigating to the watsonx homepage here, we will want to open the Notebook editor that we can use to run the notebook associated with this lab.

If you don't know how to acccess watsonx.ai or are unsure how to open to the notebook editor, follow this reference link, which will walk you through the process for accessing watsonx.ai and opening the Jupyter notebook editor.

Use the following values for this lab:

  • Name: {uniqueid}-rag-discovery
  • Notebook URL: https://raw.githubusercontent.com/ibm-build-lab/VAD-VAR-Workshop/main/content/labs/Watsonx/WatsonxAI/files/rag-with-discovery.ipynb

After your notebook is launched and created, you can follow along and run through each cell of the notebook to complete the lab. The notebook contains comments explaining what code in each cell does as well as any necessary input that you might need to provide in order to successfully run a cell.

Good luck!