Tutorial: Harmonize the Product Data by Mapping
A harmonize flow is another series of plugins that harmonizes the data in the staging database and stores the results in the final database. Harmonization includes standardizing formats, enriching data, resolving duplicates, indexing, and other tasks.
We can specify the source of an entity property value using one of two methods:
- By customizing the default harmonization code.
- By defining mappings that specify which fields in the raw datasets correspond with which properties in the entity model.
Model-to-model mapping (between the source data model and the canonical entity model) was introduced in DHF v4.0.0 to enable users to easily create a harmonization flow without coding. Mappings are ideal when the source data can be easily converted for use as the value of the entity property; a simple conversion can be a difference in the label case or a difference in simple data types.
We have already loaded the Product raw data by:
In this section, we will:
- Define the entity model by adding properties to the entity model.
- Define the mappings to specify which field in the dataset corresponds to the properties in the entity model.
- Create and Run the Harmonize Flow.
1 - Define the Entity Model
We first define the entity model, which specifies the standard labels for the fields we want to harmonize. For the Product dataset, we will harmonize two fields: sku
and price
. Therefore, we must add those fields as properties to our Product entity model.
Name | Type | Other settings | Notes |
---|---|---|---|
sku |
string | key | Used as the primary key because the SKU is unique for each product. |
price |
decimal | Set as a decimal because we need to perform calculations with the price. |
To define the Product entity model,
- In QuickStart's navigation bar, click .
- At the top of the Product entity card, click the pencil icon to edit the Product entity definition.
-
In the Product entity editor, click in the Properties section to add a new property.
- Set Name to
sku
. - Set Type to
string
. - To make
sku
the primary key, click the area in the key column for thesku
row.
- Set Name to
-
Click
- Set Name to
price
. - Set Type to
decimal
.
again to add another property.
- Set Name to
- Click .
-
If prompted to update the index, click
. -
Drag the bottom-right corner of the entity card to resize it and see the newly added properties.
2 - Define the Mappings
For the Product entity, we define the following simple mappings:
field in raw dataset (type) | property in entity model (type) | Notes |
---|---|---|
SKU (string) |
sku (string) |
Difference (case-sensitive) between field names |
price (string) |
price (decimal) |
Difference in types |
To create a mapping named Product Mapping
,
- In QuickStart’s navigation bar, click Mapping.
- In the left panel, click the + icon for the Product entity.
- In the Create New Mapping form, set Mapping Name to
Product Mapping
. - Click CREATE.
Your new mapping appears under the tab named Product in the left panel.
The mapping editor displays a row for each property in your entity model. In each row,
- the right column displays the entity property, and
- the left column contains a dropdown list from which you can select the source field that corresponds to that entity property.
To configure the mapping,
-
For each entity property, expand the dropdown list under Source and select the source field that corresponds to that entity property.
TIP: You can enter part of the field name to filter the dropdown list.
-
Click SAVE MAPPING.
3 - Create and Run the Harmonize Flow
Harmonization uses the data in your STAGING database to generate canonical entity instances in the FINAL database.
To create a harmonization flow for the Product entity,
- In QuickStart’s navigation bar, click Flows.
- Expand the tab named Product in the left panel.
- Click the + for Harmonize Flows.
- In the Create Harmonize Flow dialog, set Harmonize Flow Name to
Harmonize Products
. -
Under Mapping Generation, check “ append: mappingcreated append: “ “. - Click CREATE.
When you create a flow with mapping, QuickStart automatically generates harmonization code based on the entity model and the mapping and then deploys the code to MarkLogic Server.
To run the harmonization flow,
- Click the Flow Info tab.
- Click Run Harmonize.