IBM VEST Workshops
25 min
Last updated 06/02/2023

102: Configuration of DataStage flow and Databand setup

Initial set-up

Register for the Business Partner Databand demo environment

Note: If you have previously registered with the Business Partner Databand demo environment, you to not need to repeat this step. Open a session in your preferred web browser at https://ibm-bp-demo.databand.ai/ and click on Registration. Complete the form and click on Register. Provide the e-mail address you used to register in the workshop slack so the workshop team can activate your registration.

Create an Object Storage service instance in your IBM Cloud account

Follow the service creation instructions to configure a lite plan instance of IBM Cloud Object storage. If you get a message that lite plan instance already exists in the account, you do not need to add another one.

Register access to Cloud Pak for Data as-a-service in the Dallas region

If you have not previsously registered for an IBM Cloud Pak for Data as a Service account in the Dallas region, click this link to sign up for one in the Dallas region.

After agreeing to terms use the Log in with your IBMid button to complete the registration.

Download the DataStage flow source file

Download this zip file and save it for use in Section 1 where you will create a new DataStage flow.

Set up the Data Integration Flow/Job

datastage flow
Figure A – This labs Data Integration flow

This Next-Gen DataStage flow integrates data from a Db2 Warehouse on Cloud, Postgres Database, and MongoDB instance. This data is transformed via joining tables, filtering the records by State, calculating a level of debt, and ultimately assigning each individual mortgage applicant an appropriate mortgage rate.

To begin, perform the following steps:

  1. If you have not already done so, Log in to IBM Cloud Pak for Data. You will use your personal Cloud Pak for Data as a Service account in the Dallas region to do this lab.

    cpd login
  2. From the Cloud Pak for Data home screen, click Work with data to create a new project.

    work with data
  3. Click the Create and empty project tile.

    create project
  4. Name the project Databand_<YOUR_INITIALS>_vest like the example shown. Keep the settings as is (you can optionally add a description), and select a object storage instance to use for the project. Then click Create.

    project settings
    Important: If you did not provision a Cloud Object Storage Instance in the prerequisites, there will be a link in this page to add one that will take you to the catalog page. Create an instance using the lite plan and then refresh the project settings page.

  5. Once this project is created, select the Assets tab in the project overview screen and click the blue New asset icon.

    new asset
  6. Scroll down to the Graphical builders section, and click on the DataStage tile.

    datastage tile
  7. Select the Local file tab on the left-hand menu. Either drag and drop, or click Browse and upload the "Multicloud Data Integration.zip" file that you downloaded as a prerequisite to this lab.

    upload file
  8. Leave all the settings as-is, and press the blue Create button. Wait a few moments for the import process to complete.

    create flow

    After this import process completes, you will see three Data Fabric Trial connections, and a single Multicloud Data Integration Parallel Job.

    flow imported

Sync DataStage with Databand

  1. Close the import screen by clicking the x in the top right corner. Open the DataStage flow titled Multicloud Data Integration by clicking on it.

    open flow
    Your DataStage flow should look like the one in Figure A (shown below).

    open flow
    At this point, your DataStage environment is ready to be integrated with Databand. Open a new web browser tab and go to your IBM cloud console.

  2. After logging in to IBM Cloud, make sure you are in your own account by verifying your account is selected at the top.

    check account
  3. Create an API key for your cloud account by clicking the Manage dropdown on the top menu bar and selecting Access (IAM). This API key will be used later to sync your DataStage job with Databand.

    ibmcloud iam
  4. On the IAM screen, select the API keys tab on the left-hand menu.

  5. Click the blue Create button.

    create apikey
  6. Name your API key Databand_<YOUR_INITIALS>, optionally add a description, and click the blue Create button.

    create apikey
    Your API Key will be generated. Save this key in a safe place, as you will need it to create your DataStage Syncer in Databand.

    IMPORTANT – You will not be able to see the API key again. If you exited the screen before saving this key, or forget it, simply delete the key you created and make a new one by repeating Steps 4-6 above.

    copy apikey

Getting Started with Databand

  1. Open a new browser with the Databand environment. Log in using the credentials you were given after signup.

    databand dashboard

    We will now create our DataStage Syncer within Databand. A syncer will "sync” or integrate your DataStage environment with your Databand environment.

  2. Select the Integrations tab on the left-hand menu.

  3. Click the purple Add Integration button in the top right corner.

    add integration
  4. Select the DataStage tile under integration type

    datastage integration
  5. Select Cloud user and click Confirm.

    cloud integration
  6. Create a unique syncer name (for example, <YOUR_INITIALS>_datastage_syncer) and paste the API key that you saved into the API key field. Then click Next

    datastage syncer
  7. Select the <Databand_yourinitials> project that you created at the beginning of this lab. Then click Save.

    datastage project

Before continuing, it’s important to rename the source for the DataStage project. By default, the source name is the name of the account that owns that DataStage project. This is not very helpful since most people don’t know their account ID off the top of their head.

  1. Find your DataStage syncer. Select the Integrations tab in the left-hand menu

  2. Start typing the beginning of your unique syncer name in the Search bar.

  3. Click Edit under the Actions column on the right side of your DataStage syncer.

    edit-syncer
    This will open the edit pane for your DataStage integration.

  4. Click Next to view your available projects.

    view projects
  5. If your Databand project is not already selected, select the checkbox to the left of the Databand project source you want to edit.

  6. Click the pencil icon to the right of your Databand project to rename it.

    rename source
  7. Change the source name to something unique that will help you identify the source later (for example, Alyssa B’s Account).

  8. Click Save.

    save name

We have successfully synced our Cloud Pak for Data as a Service project with the Multicloud Data Integration flow, with Databand, and changed the project source name to a unique identifier.

Continue on to the next lab to start using Databand to observe the DataStage flow.