Tutorial: Harmonize the Order Data by Custom Code
Harmonization of the Order entity is more complex.
- The
price
property of the entity model is the total amount for the entire order; therefore, it must be calculated. - The
product
property is an array of the products ordered, but they are not represented as an array in the source.
Therefore, we must use DHF code scaffolding to generate the harmonization code and then customize it.
We have already loaded the Order raw data by:
In this section, we will:
- Define the entity model by adding properties to the entity model.
- Create the Harmonize flow.
- Customize the Harmonize flow, specifically the Collector code and the Content code.
- Run the Harmonize Flow.
- View the results.
1 - Define the Entity Model
We assume the following about the Order data:
- Each product is identified by its SKU.
- Each order can have more than one product.
- Each product in the order has a specified quantity.
- Each order includes a total amount, which must be calculated.
Based on these assumptions, we will add the following properties to the Order entity model for harmonization:
Name | Type | Other settings | Notes |
---|---|---|---|
id |
string | Used as the primary key because order ID is unique for each order. Needs an element range index. | |
total |
decimal | The calculated total amount of the entire order. | |
products |
Product entity | Cardinality: 1..∞ | An array of pointers to the Product entities in our FINAL database. |
To define the Order entity model,
- In QuickStart's navigation bar, click .
- At the top of the Order entity card, click the pencil icon to edit the Order entity definition.
-
In the Order entity editor, click in the Properties section to add a new property.
- Set Name to
id
. - Set Type to
string
. - To make
id
the primary key, click the area in the key column for theid
row. - To specify that
id
needs an element range index, click the area in the lightning bolt column for theid
row.
- Set Name to
-
Click
- Set Name to
total
. - Set Type to
decimal
.
again to add another property.
- Set Name to
-
Click
- Set Name to
products
. -
Set Type to the entity Product.
- To indicate that the entity can have multiple instances of this property, set Cardinality to 1..∞.
again to add another property.
- Set Name to
- Click .
-
If prompted to update the index, click
. -
Drag the bottom-right corner of the entity card to resize it and see the newly added properties.
Result
Because the Order entity contains pointers to the Product entity, an arrow connects the Order entity card to the Product entity card with the cardinality we selected (1..∞).
2 - Create the Harmonize Flow
Harmonization uses the data in your STAGING database to generate canonical entity instances (documents) in the FINAL database.
To create a harmonization flow for the Order entity,
- In QuickStart’s navigation bar, click Flows.
- Expand the tab named Order in the left panel.
- Click the + for Harmonize Flows.
-
In the Create Harmonize Flow dialog, set Harmonize Flow Name to
Harmonize Orders
. - Click CREATE.
Because we used the default Create Structure from Entity Definition and we did not specify a mapping, DHF creates boilerplate code based on the entity model. This code includes default initialization for the entity properties, which we will customize.
3 - Customize the Harmonize Flow
3a - Customize the Collector Plugin
The Collector plugin generates a list of IDs for the flow to operate on. The IDs can be whatever your application needs (e.g., URIs, relational row IDs, twitter handles). The default Collector plugin produces a list of source document URIs.
An options
parameter is passed to the Collector plugin, and it contains the following properties:
- entity: the name of the entity this plugin belongs to (e.g., “Order”)
- flow: the name of the flow this plugin belongs to (e.g., “Harmonize Orders”)
- flowType: the type of flow being run (“input” or “harmonize”; e.g., “harmonize”)
The Load Orders input flow automatically groups the source documents into a collection named Order. The default Collector plugin uses that collection to derive a list of URIs.
View code snippet.
```javascript cts.uris(null, null, cts.collectionQuery(options.entity)) ```In our source Order CSV file, each row represented one line item in an order. For example, if the order had three line items, then three documents were created for that order in the staging database during the input phase. To combine all three documents into a single Order entity, they must be harmonized.
Each of those three documents would have the same order ID but different URIs. Therefore, we must customize the collector plugin to return a list of unique order IDs, instead of a list of URIs.
Technical Notes
- In our custom collector plugin code, we use the jsearch library library to find all the values of id in the Order collection and return the result.
- By default, jsearch paginates results; therefore, we call
slice()
to get all results at once.
Steps
To customize the Collector plugin,
- Click the COLLECTOR tab.
-
Replace the collector plugin code with the following:
/* * Collect IDs plugin * @param options - a map containing options. Options are sent from Java * @return - an array of ids or uris */ function collect(options) { const jsearch = require('/MarkLogic/jsearch.sjs'); return jsearch .values('id') .where(cts.collectionQuery(options.entity)) .slice(0, Number.MAX_SAFE_INTEGER) .result(); } module.exports = { collect: collect };
- Click SAVE.
3b - Customize the Content Plugin
The list of order IDs collected by our custom Collector plugin is passed to the Content plugin, specifically to its createContent
function.
We will customize createContent
to do the following:
- Collect all the line items of the same order into a single Order entity.
- Calculate the total cost of the order.
Technical Notes
-
A jsearch library query searches the Order collection for all source documents that have the same order id.
We also apply a
map
function to each matching document to extract the original content inside the envelope.The
orders
variable will contain an array of original JSON objects.View code snippet.
var orders = jsearch .collections('Order') .documents() .where( jsearch.byExample({ 'id': id }) ) .result('value') .results.map(function(doc) { return doc.document.envelope.instance; });
-
After collecting the line items in the same order,
- We calculate the total amount of the order and
- We store the appropriate Product entity references (using the SKU) in the products property of the Order instance.
View code snippet.
/* The following property is a local reference. */ var products = []; var price = 0; for (var i = 0; i < orders.length; i++) { var order = orders[i]; if (order.sku) { products.push(makeReferenceObject('Product', order.sku)); price += xs.decimal(parseFloat(order.price)) * xs.decimal(parseInt(order.quantity, 10)); } }
-
The default code includes some additional functions that we will remove because we do not need them.
extractInstanceProduct
: Extracts a Product instance in a form suitable for insertion into an Order instance. Because we reference Product entities within the Order instance, we do not need this function.extractInstanceOrder
: Extracts an Order instance from an order source document. Since we do not have a one-to-one correspondence, we cannot use this function.
However, although we do not use
extractInstanceOrder
, our customizedcreateContent
function must produce a similar structure.View code snippet.
return { '$attachments': attachments, '$type': 'Order', '$version': '0.0.1', 'id': id, 'price': price, 'products': products }
Steps
To customize the content plugin code,
- Click the CONTENT tab.
- Replace the content plugin code with the following:
- Click SAVE.
4 - Run the Harmonize Flow
When you create a flow with mapping, QuickStart automatically generates harmonization code based on the entity model and the mapping and then deploys the code to MarkLogic Server.
To run the harmonization flow,
- Click the Flow Info tab.
- Click Run Harmonize.
5 - View the Harmonized Orders
As with other flow runs, you can view the job status.
- In the QuickStart menu, click Jobs to open the Jobs list.
- In the list, click >_ for .
TIP: You can filter the list by using the free-text search field or the faceted search filters.
You can also explore your harmonized data in the FINAL database.
- In the QuickStart menu, click Browse Data.
- From the database selection dropdown, choose the FINAL database.
- (Optional) To narrow the list to include entities only, check the Entities Only box.
TIP: You can further filter the list by using the free-text search field or the faceted search filters.
- In the list, click the row of the first Order dataset item.