Project Directory Structure for DHF 4.0.x
When you initialize a Data Hub Framework project using QuickStart or the hubInit
gradle task, it sets up the following directory hierarchy. The diagram includes some placeholder entries so you can see, for example, how creation of entities, flows, and mappings interacts with the layout. An explanation of the sub-directories and files follows the diagram.
|- your-data-hub-dir
|- build.gradle
|- gradle
|- wrapper
|- gradle-wrapper.jar
|- gradle-wrapper.properties
|- gradle.properties
|- gradle-local.properties
|- gradlew
|- gradlew.bat
|- plugins
|- entities
|- Employee
|- input
|- inputflow 1
|- content.(sjs|xqy)
|- headers.(sjs|xqy)
|- main.(sjs|xqy)
|- triples.(sjs|xqy)
|- inputflow 2
|- ...
|- inputflow N
|- harmonize
|- harmonizeflow 1
|- collector.(sjs|xqy)
|- content.(sjs|xqy)
|- headers.(sjs|xqy)
|- main.(sjs|xqy)
|- triples.(sjs|xqy)
|- writer.(sjs|xqy)
|- ...
|- harmonizeflow N
|- mappings
|- mappingName 1
|- mappingName-0.mapping.json
|- ...
|- mappingName-N.mapping.json
|- src
|- main
|- hub-internal-config
|- ml-config
|- ml-modules
|- .tmp
Root
build.gradle
This file enables you to use Gradle to configure and manage your data hub instance. Visit the Gradle website for full documentation on how to configure it.
gradle
This directory houses the gradle wrapper. When you provision a new DHF project you get the gradle wrapper. Gradle wrapper is a specific, local version of gradle. You can use the wrapper to avoid having to install gradle on your system.
gradle.properties
This properties file defines variables needed by the hub to install and run properly. Ideally you would store values here that apply to all instances of your data hub.
gradle-local.properties
This properties file overrides the variables in gradle.properties for your local environment. If you need to change a value to run locally this is where you would do it.
gradle-{env}.properties
DHF looks for various environments based on which override files you have in your hub project. You can have as many environments as you like. Simply create a new override file with the environment name after the dash.
For example: gradle-dev.properties, gradle-qa.properties, gradle-prod.properties
gradlew, gradlew.bat
These are the *nix and Windows executable files to run the gradle wrapper. Gradle wrapper is a specific, local version of gradle. You can use the wrapper to avoid having to install gradle on your system.
plugins folder
This folder contains project-specific server-side modules that get deployed into MarkLogic. You can put any server-side files in here that you like, but the recommended location for custom modules and transforms is src/main/ml-modules; see the ml-gradle documentation.
When deployed to MarkLogic ./plugins is equivalent to the root uri /, so a library module at ./plugins/my-folder/my-lib.xqy
would be loaded into the modules database as /my-folder/my-lib.xqy
.
The only caveat is that the entities folder is reserved for Hub use and is treated as a special case by the deploy process.
plugins/entities
This folder contains your entity definitions. An entity is a domain object like Employee or SalesOrder. Each entity folder contains two subfolders: input and harmonize. DHF has custom logic to handle the deployment of this folder to MarkLogic.
plugins/entities/{entity}/input
The input subfolder contains all of the input flows for a given entity. Input flows are responsible for creating an XML or JSON envelope during content ingest. This folder contains one server-side module for each part of the envelope: content, headers, and triples. You may also optionally include a REST folder that contains custom MarkLogic REST extensions related to this input flow.
plugins/entities/{entity}/input/content.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for creating the content section of your envelope.
plugins/entities/{entity}/input/headers.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for creating the headers section of your envelope.
plugins/entities/{entity}/input/main.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for orchestrating your plugins.
plugins/entities/{entity}/input/triples.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for creating the triples section of your envelope.
plugins/entities/{entity}/harmonize
The harmonize subfolder contains all of the harmonize flows for a given entity. Harmonize flows are responsible for creating an XML or JSON envelope during content harmonization. This folder contains one server-side module for each part of the envelope: content, headers, and triples. It also contains collector and writer modules as described below. You may also optionally include a REST folder that contains custom MarkLogic REST extensions that are related to this input flow.
plugins/entities/{entity}/harmonize/collector.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for returning a list of things to harmonize. Harmonization is a batch process that operates on one or more items. The returned items should be an array of strings. Each string can have any meaning you like: uri, identifier, sequence number, etc.
plugins/entities/{entity}/harmonize/content.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for creating the content section of your envelope.
plugins/entities/{entity}/harmonize/headers.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for creating the headers section of your envelope.
plugins/entities/{entity}/harmonize/main.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for orchestrating your plugins.
plugins/entities/{entity}/harmonize/triples.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for creating the triples section of your envelope.
plugins/entities/{entity}/harmonize/writer.(sjs|xqy)
The server-side module (XQuery or JavaScript) responsible for persisting your envelope into MarkLogic.
plugins/entities/{entity}/harmonize/REST
In DHF 4.0, items that used to be here should be placed in src/main/ml-modules
plugins/mappings
This folder contains model-to-model mapping configuration artifacts that can be used to configure an input flow. See Using Model-to-Model Mapping.
plugins/mappings/{mapping}
This folder contains all versions of a given model-to-model mapping. The name of the folder is the same as mapping name. See Using Model-to-Model Mapping.
plugins/mappings/{mapping}/{mapping}-{version}.json
A model-to-model mapping configuration file. There may be multiple versions. For example, QuickStart creates a new version each time you modify a mapping. See Using Model-to-Model Mapping.
src/main/hub-internal-config folder
This folder contains subfolders and JSON files used to configure your DHF project’s STAGING environment.
These files represent the minimum configuration necessary for DHF to function. Do not edit anything in this directory. Instead, make a file with the same name and directory structure under the ml-config folder and add any properties you’d like to override.
ml-gradle
and should work as documented in that project.|- databases
|- staging-database.json
|- job-database.json
|- modules-database.json
|- staging-schemas-database.json
|- staging-triggers-database.json
|- security
|- roles
|- data-hub-role.json
|- hub-admin-role.json
|- users
|- data-hub-user.json
|- hub-admin-user.json
|- servers
|- job-server.json
|- staging-server.json
Each of the above JSON files conforms to the MarkLogic REST API for creating databases, mimetypes, roles, users, or servers.
src/main/ml-config folder
This folder contains subfolders and JSON files used to configure your DHF project’s FINAL environment.
You can add configuration artifacts here to customize the system. See ml-gradle wiki.
Any JSON files you put here will be merged with the hub-internal-config configurations by the Data Hub Framework during deployment.
src/main/ml-modules
This folder is the standard ml-gradle
location for artifacts to be deployed to the modules database. It comes out of the box with a default Search options configuration.
src/main/ml-modules-jobs
This folder is the standard ml-gradle
location for artifacts to be deployed to the modules database, but to be used with the JOBS appserver (specifically, the jobs and traces search options configuration). Users probably have no need to add to this directory.
.tmp folder
This folder contains temporary hub artifacts. You may safely ignore it.