Tutorial: Install the Data Hub Framework
1 - Set Up the Project Directory and Sample Data
- Create a directory called
data-hub. This directory will be referred to as “your project root” or simply “root”. - Download the
quick-start-4.3.2.warfile and place it your project root directory. - Under your project root, create a directory called
input. - Download the sample data .zip file. Expand it, as needed.
- Copy the subdirectories (e.g.,
campaigns,customers,orders) inside the sample data .zip file into theinputdirectory.
Result
Your project directory structure will be as follows:
data-hub
├─ quick-start-4.3.2.war
└─ input
├─ campaigns
├─ customers
├─ issuehistories
├─ issues
├─ orders
├─ parties
├─ products
│ ├─ games
│ └─ misc
├─ responses
└─ supportcustomers
2 - Start QuickStart
- Open a command-line window, and navigate to your DHF project root directory.
- Run the QuickStart .war.
- To use the default port number for the internal web server (port 8080):
java -jar quick-start-4.3.2.war - To use a custom port number; e.g., port 9000:
java -jar quick-start-4.3.2.war --server.port=9000
NOTE: If you are using Windows and a firewall alert appears, click
Allow access. - To use the default port number for the internal web server (port 8080):
Result
3 - Install the Data Hub
-
Open a web browser, and navigate to
http://localhost:8080. -
Browse to your project root directory. Then click .
-
Click to initialize your project directory.
-
After initializing your Data Hub Framework project, your project directory contains additional files and directories. Click .
-
Choose the
localenvironment, then click .
-
Enter your MarkLogic Server credentials, then click .
-
Click to install the data hub into MarkLogic.
-
Wait for the installation to complete.
-
When installation is complete, Click .
Result
When installation is complete, the Dashboard page displays the three initial databases and the number of documents in each.
- Staging contains incoming data.
- Final contains harmonized data.
-
Jobs contains data about the jobs that are run and tracing data about each harmonized document.