IBM logo

dashDB

Load delimited data using the REST API and cURL

What this tutorial is about

With the IBM dashDB REST API, you can load delimited data files (CSV, XLS, XLSX) into a dashDB table, monitor the status of load jobs, and retrieve detailed log files. This tutorial shows you how to create a dashDB table, load data into the table, monitor the status of your load jobs, and retrieve the associated load log files.

What you should be able to do

At the end of this tutorial, you should be able to:
  • Use cURL to access the dashDB REST API
  • (Optional) Use jq or jsoneditoronline.org to format JSON responses in a more readable format
  • Load delimited files (.CSV, .XLS, .XLSX) from your local client into a dashDB table
  • Monitor the status of load jobs
  • Download and view log files for load jobs
  • Delete log files for completed load jobs

Requirements

You need the sample dataset that contains delimited data files and the cURL command utility installed. Optionally, you might find it helpful to format the JSON response by either installing jq with cURL or using an online JSON editor. Use the following links to install or access these requirements:

Tutorial instructions

Procedure 1: Determining the base URL

In order to run dashDB REST endpoints, you'll need to construct the base URL that includes the host name and port number of your dashDB instance.
  1. Get the host name and port number of your dashDB instance.
    • Entry plan users:
      1. Log in to IBM Bluemix and navigate to the dashboard.
      2. Click your dashDB service tile, and then the Launch button to open the dashDB web console.
      3. Identify the host name and port number in the URL shown in your web browser.
    • Enterprise plan users:
      1. Identify the host name and port in the URL you use to access your dashDB instance.
     
  2. Construct the base URL using the following pattern:
    https://<host_name>:<port>/dashdb-api
    	
    Example: If the URL you use to access dashDB is https://awh-yp-small02.services.dal.bluemix.net:8443/console/ibmblu/index_Customer.jsp, the host name and port number you'll need to use in the base URL are https://awh-yp-small02.services.dal.bluemix.net and 8443, which results in the following base URL:
    https://awh-yp-small02.services.dal.bluemix.net:8443/dashdb-api   
    	

Procedure 2: Creating the target table

For this tutorial we're assuming that the target table doesn't exist yet. But if you already created the target table in dashDB, you can skip to Procedure 3.
  1. In the dashDB web console, click Tables in the left navigation menu.
  2. Enter the following statement in the Edit the DDL statements field:
    CREATE TABLE ORDER_DETAILS (
    		"ORDER_DETAIL_CODE" INTEGER NOT NULL ,
    		"ORDER_NUMBER" INTEGER NOT NULL ,
    		"SHIP_DATE" TIMESTAMP NOT NULL ,
    		"PRODUCT_NUMBER" INTEGER NOT NULL ,
    		"PROMOTION_CODE" INTEGER NOT NULL ,
    		"QUANTITY" BIGINT NOT NULL ,
    		"UNIT_COST" DECIMAL(19,2) ,
    		"UNIT_PRICE" DECIMAL(19,2) ,
    		"UNIT_SALE_PRICE" DECIMAL(19,2) )
    	;
    	
  3. Click Run DDL. dashDB creates a table called ORDER_DETAILS in your default schema, which uses your userID as it's name. After the table is created, you'll see the message DDL ran successfully.

Procedure 3: Loading delimited data

  1. Determine if you can load all of your data in a single REST request or if you need to load it using multiple REST requests.
    • If none of your files have header rows, load all of your files using a single REST request. This single REST request will create a better compression dictionary when you're loading data into an empty table. The dictionary is created based on a sample of the data initially loaded into the table, so loading the largest possible amount of data in the first load job creates the best possible dictionary. The data loaded to this table in subsequent REST requests is compressed based on this dictionary.
    • If some of your files have header rows and some don't, we recommend removing the headers and loading all of the files in a single REST request to create the best possible compression dictionary. It is also possible to load your files using two REST requests (one for files with header rows, one for files without header rows). If you decide to load your files using this method, make sure the first load job contains the largest possible amount of data.
    For this tutorial, you will load the sample data files using two REST requests, since the order_details_0.csv file contains a header row, while the other three files don't. The first REST request will load the three files without a header row so that you're loading the largest amount of data to create the compression dictionary.
  2. Verify that the same delimiter is used in all files included in the same REST request.
    • If the files are all .CSV format, you can specify any delimiter supported by dashDB. Valid values are hexadecimal values 0x01 to 0x7F, excluding line-feed (0x0A), carriage return (0x0D), space (0x20), and decimal point (0x2E). The default delimiter is , (0x2C).
    • If the files are all .XLS/.XLSX format, or a combination of .CSV and .XLS/.XLSX formats, the only supported delimiter is , (0x2C).
    The sample data files are all .CSV format and use the default delimiter.
  3. Determine if there are any query parameters that need to be percent-encoded. Reserved characters that have special meaning in URIs as well as blank spaces must be percent-encoded.The sample data includes a column called SHIP_DATE that uses the YYYY-MM-DD HH:MM:SS.U timestamp format. The blank space in this timestamp format has to be percent-encoded as %20, which results in the percent-encoded timestamp format YYYY-MM-DD%20HH:MM:SS.U.
  4. Load the data using the POST /load/local/del/{tableName} endpoint.
    • To load multiple files in a single REST request, use the -F parameter for each file. Let's load the three sample data files that don't contain a header row (headerRow=false). The following cURL command sends a REST request to start a job that loads data into the ORDER_DETAILS table:
      curl --user dash123456:******** -H "Content-Type: multipart/form-data" -X POST -F loadFile1=@"C:\dashDB\tutorial\order_details_1.csv" -F loadFile2=@"C:\dashDB\tutorial\order_details_2.csv" -F loadFile3=@"C:\dashDB\tutorial\order_details_3.csv" "https://<hostname>:8443/dashdb-api/load/local/del/DASH123456.ORDER_DETAILS?hasHeaderRow=false&timestampFormat=YYYY-MM-DD%20HH:MM:SS.U" 
      JSON result showing the load job started successfully and is currently running:
      {
      		"errorMessageCode": "NONE",
      		"resultCode": "SUCCESS",
      		"message": "LOAD started successfully.",
      		"result": {
      			"END_TIME": "",
      			"LOAD_ID": 1234567890,
      			"LOAD_LOGFILE": "loadlogs/load_1234567890_DASH123456_ORDER_DETAILS_20160202-094119-CST.json",
      			"LOAD_STATUS": "RUNNING",
      			"ROWS_COMMITTED": "",
      			"ROWS_DELETED": "",
      			"ROWS_LOADED": "",
      			"ROWS_READ": "",
      			"ROWS_REJECTED": "",
      			"ROWS_SKIPPED": "",
      			"SCHEMA": "DASH123456",
      			"START_TIME": "20160101 09:41:19 CST",
      			"TABLE": "ORDER_DETAILS"
      		}
      	}
      	
    • To load a single file in a REST request, use a single -F parameter.Let's load the order_details_0.csv sample file that contains a header row. The following cURL command sends a REST request to start a job that loads data into the ORDER_DETAILS table:
      curl --user dash123456:******** -H "Content-Type: multipart/form-data" -X POST -F loadFile=@"C:\dashDB\tutorial\order_details_0.csv" "https://<hostname>:8443/dashdb-api/load/local/del/DASH123456.ORDER_DETAILS?hasHeaderRow=true&timestampFormat=YYYY-MM-DD%20HH:MM:SS.U"
      JSON result showing the load job started successfully and is currently running:
      {
      		"errorMessageCode": "NONE",
      		"resultCode": "SUCCESS",
      		"message": "LOAD started successfully.",
      		"result": {
      			"END_TIME": "",
      			"LOAD_ID": 2345678901,
      			"LOAD_LOGFILE": "loadlogs/load_2345678901_DASH123456_ORDER_DETAILS_20160101-095500-CST.json",
      			"LOAD_STATUS": "RUNNING",
      			"ROWS_COMMITTED": "",
      			"ROWS_DELETED": "",
      			"ROWS_LOADED": "",
      			"ROWS_READ": "",
      			"ROWS_REJECTED": "",
      			"ROWS_SKIPPED": "",
      			"SCHEMA": "DASH123456",
      			"START_TIME": "20160101 09:55:00 CST",
      			"TABLE": "ORDER_DETAILS"
      		} 
      	}
      	

Procedure 4: Monitoring the status of load jobs

When you load large data files into dashDB, the POST /load/local/del/{tableName} request returns before the load job is complete. In this scenario, monitor the status of the load job using either the GET /load/{loadID} or GET /load/{tableName} endpoint.
Note: These two GET /load endpoints can only retrieve information about load jobs run by the specified user and started by the POST /load/local/del/{tableName} endpoint.

Monitoring the status of a load job based on load ID
  1. Get the load ID from the POST /load/local/del/{tableName} request. The LOAD_ID returned in the JSON result for the load job that loaded data from three sample data files is 1234567890.
  2. Get the status of the load job using the following cURL command:
    curl --user dash123456:******** -X GET "https://<hostname>:8443/dashdb-api/load/1234567890"
    JSON result showing the load job completed successfully and loaded all rows (ROWS_COMMITTED = ROWS_LOADED):
        {
    			"errorMessageCode": "NONE",
    			"resultCode": "SUCCESS",
    			"message": "LOAD retrieved successfully",
    			"result": {
    				"errorMessageCode": "NONE",
    				"resultCode": "SUCCESS",
    				"message": "Load completed successfully.",
    				"result": {
    					"END_TIME": "20160101 09:41:27 CST",
    					"LOAD_ID": 1234567890,
    					"LOAD_LOGFILE": "loadlogs/load_1234567890_DASH123456_ORDER_DETAILS_20151222-094119-CST.json",
    					"LOAD_STATUS": "COMPLETE",
    					"ROWS_COMMITTED": 334518,
    					"ROWS_DELETED": 0,
    					"ROWS_LOADED": 334518,
    					"ROWS_READ": 334518,
    					"ROWS_REJECTED": 0,
    					"ROWS_SKIPPED": 0,
    					"SCHEMA": "DASH123456",
    					"START_TIME": "20160101 09:41:19 CST",
    					"TABLE": "ORDER_DETAILS"
    				}
    			}
    		}
    	
Monitoring the status of load jobs based on table name
  1. Get the status of load jobs using the GET /load/{filename} endpoint:
    curl --user dash123456:******** -X GET "https://<hostname>:8443/dashdb-api/load/DASH123456.ORDER_DETAILS"
    JSON result:
    {
    	  "errorMessageCode": "NONE",
    	  "resultCode": "SUCCESS",
    	  "message": "LOAD retrieved successfully",
    	  "result": [
    		{
    		  "errorMessageCode": "NONE",
    		  "resultCode": "SUCCESS",
    		  "message": "Load completed successfully.",
    		  "result": {
    			"END_TIME": "20160101 09:41:27 CST",
    			"LOAD_ID": 1234567890,
    			"LOAD_LOGFILE": "loadlogs/load_1234567890_DASH123456_ORDER_DETAILS_20160101-094119-CST.json",
    			"LOAD_STATUS": "COMPLETE",
    			"ROWS_COMMITTED": 334518,
    			"ROWS_DELETED": 0,
    			"ROWS_LOADED": 334518,
    			"ROWS_READ": 334518,
    			"ROWS_REJECTED": 0,
    			"ROWS_SKIPPED": 0,
    			"SCHEMA": "DASH123456",
    			"START_TIME": "20160101 09:41:19 CST",
    			"TABLE": "ORDER_DETAILS"
    		  }
    		},
    		{
    		  "errorMessageCode": "NONE",
    		  "resultCode": "SUCCESS",
    		  "message": "Load completed successfully.",
    		  "result": {
    			"END_TIME": "20160101 09:55:03 CST",
    			"LOAD_ID": 2345678901,
    			"LOAD_LOGFILE": "loadlogs/load_2345678901_DASH123456_ORDER_DETAILS_20160101-095500-CST.json",
    			"LOAD_STATUS": "COMPLETE",
    			"ROWS_COMMITTED": 111505,
    			"ROWS_DELETED": 0,
    			"ROWS_LOADED": 111505,
    			"ROWS_READ": 111505,
    			"ROWS_REJECTED": 0,
    			"ROWS_SKIPPED": 0,
    			"SCHEMA": "DASH123456",
    			"START_TIME": "20160101 09:55:00 CST",
    			"TABLE": "ORDER_DETAILS"
    		  }
    		}
    	  ]
    	}
    	

Procedure 5: Downloading the load log file

You can retrieve a detailed load log file that contains the information returned by the GET /load/{loadID} endpoint plus additional SQLCODE and MESSAGE details. These details are useful for verifying that a load job completed and for troubleshooting failed load jobs.
  1. Get the path and name of the load log file from the LOAD_LOGFILE value in the JSON result for the POST /load/local/del/{tableName} endpoint. The load log file for the load job that loaded data from three sample data files is loadlogs/load_1234567890_DASH123456_ORDER_DETAILS_20160101-094119-CST.json.
  2. Download the load log file using the following cURL command:
    curl --user dash123456:******** -X GET "https://<hostname>:8443/dashdb-api/home/loadlogs/load_1234567890_DASH123456_ORDER_DETAILS_20160101-094119-CST.txt"
    Note: This command print the contents of the log file to stdout. Use the -o option for the cURL command to direct the output to a local file. JSON result:
    {
    		"message": "Load completed successfully.",
    		"errorMessageCode": "NONE",
    		"resultCode": "SUCCESS",
    		"result": {
    			"ROWS_COMMITTED": 334518,
    			"LOAD_ID": 1234567890,
    			"SCHEMA": "DASH123456",
    			"ROWS_LOADED": 334518,
    			"ROWS_DELETED": 0,
    			"START_TIME": "20160101 09:41:19 CST",
    			"LOAD_STATUS": "COMPLETE",
    			"ROWS_SKIPPED": 0,
    			"TABLE": "ORDER_DETAILS",
    			"LOAD_LOGFILE": "loadlogs/load_1234567890_DASH123456_ORDER_DETAILS_20160101-094119-CST.json",
    			"ROWS_REJECTED": 0,
    			"END_TIME": "20160101 09:42:38 CST",
    			"ROWS_READ": 334518,
    			"LOAD_OUTPUT": [{
    				"MESSAGE": "The utility is beginning to load data from file "/opt/ibm/dsserver/work/846010489.pipe".",
    				"SQLCODE": "SQL3109N"
    			}, {
    				"MESSAGE": "The utility is beginning the "ANALYZE" phase at time "01/01/2016 09:41:36.279334".",
    				"SQLCODE": "SQL3500W"
    			}, {
    				"MESSAGE": "Begin Load Consistency Point. Input record count = "0".",
    				"SQLCODE": "SQL3519W"
    			}, {
    				"SQLCODE": "SQL3520W",
    				"MESSAGE": "Load Consistency Point was successful."
    			}, {
    				"SQLCODE": "SQL3515W",
    				"MESSAGE": "The utility has finished the "ANALYZE" phase at time "01/01/2016 09:42:38.239780"."
    			}, {
    				"MESSAGE": "The utility is beginning the "LOAD" phase at time "01/01/2016 09:42:38.251327".",
    				"SQLCODE": "SQL3500W"
    			}, {
    				"SQLCODE": "SQL3110N",
    				"MESSAGE": "The utility has completed processing.  "334518" rows were read from the input file."
    			}, {
    				"MESSAGE": "Begin Load Consistency Point. Input record count = "334518".",
    				"SQLCODE": "SQL3519W"
    			}, {
    				"MESSAGE": "Load Consistency Point was successful.",
    				"SQLCODE": "SQL3520W"
    			}, {
    				"SQLCODE": "SQL3515W",
    				"MESSAGE": "The utility has finished the "LOAD" phase at time "01/01/2016 09:42:38.785531"."
    			}, {
    				"MESSAGE": "The utility is beginning the "BUILD" phase at time "01/01/2016 09:42:38.788546".",
    				"SQLCODE": "SQL3500W"
    			}, {
    				"SQLCODE": "SQL3213I",
    				"MESSAGE": "The indexing mode is "REBUILD"."
    			}, {
    				"SQLCODE": "SQL3515W",
    				"MESSAGE": "The utility has finished the "BUILD" phase at time "12/12/2015 09:42:38.922468"."
    			}]
    		}
    	}
    	

Procedure 6: Deleting log files

It is good practice to delete old log files to prevent disk space issues. You can use the DELETE /home/fileName endpoint to delete old load log files. As an example, let's delete the loadlogs/load_1234567890_DASH123456_ORDER_DETAILS_20160101-094119-CST.json log file.
Note: After you delete a log file for a particular load job, the details of the load job are not returned by the GET /load/{loadID} or GET /load/{tableName} endpoints.
  1. Delete the load log file using the following cURL command:
    curl --user dash123456:******** -X DELETE "https://<hostname>:8443/dashdb-api/home/loadlogs/load_1234567890_DASH123456_ORDER_DETAILS_20160101-094119-CST.txt"
    JSON result showing the log file was deleted successfully:
    {
    		"message":"NONE",
    		"result":[],
    		"errorMessageCode":"NONE",
    		"resultCode":"SUCCESS"
    	}
    	

For more details, see the dashDB API Reference and read the documentation on loading your data in dashDB



Load

Move Data with IBM Bluemix Lift

Load data from the desktop into dashDB

Load data from the Cloud into dashDB

Move data to the Cloud with dashDB's MoveToCloud script

Load Twitter data into dashDB

Store Tweets Using Bluemix, Node-RED, Cloudant, and dashDB

Load XML data into dashDB

Load JSON Data from Cloudant into dashDB

Use DataStage to Load Data into dashDB

Integrate dashDB and Informatica Cloud

Bring Your Oracle and Netezza Apps to dashDB with Database Conversion Workbench (DCW)

Install IBM Database Conversion Workbench

Convert data from Oracle to dashDB

Convert IBM Puredata System for Analytics to dashDB

From Netezza to dashDB: It's That Easy!

Manage data with dashDB Support Tools

Use Aginity Workbench for IBM dashDB

> Load delimited data using the REST API and cURL