Search Index v/s Data Hub

View Only

Back to discussions

Expand all | Collapse all

Akshay Pandita08-20-2019 02:16

Hi Team - I am working with a customer where, they wanted to understand major difference in our elastic ...

Brad Stengel08-22-2019 14:15

Hi Akshay, I guess let's start by clarifying some terms. I suspect you are not actually matching using ...

1. Search Index v/s Data Hub

0 Like
Employee

Akshay Pandita
Posted 08-20-2019 02:16
Edited by Akshay Pandita 08-22-2019 01:50

Reply Reply Privately
Hi Team -

I am working with a customer where, they wanted to understand major difference in our elastic search ( Search Index) and Data Hub in terms of performance and scalability, e.g if we have to do matching using Fuzzy logic shall they go ahead with search Index method( with24 + algos) as it can handle complexity to a great extent or use Data Hub which has limited fuzzy matching( need to understand more on the matching capability of Data Hub) and has some pre defined limited algos like: contains, like, not like. Also a query that they have is if they scale it horizontally ( add more nodes) or vertically (add more power) the hub does not perform aptly.

I wanted to put across and share this to our community so that if such scenario anyone has seen or has some insights can share it with everyone. The basic understanding is that SI is scalable and Hub is also scalable( but to a lesser extent), but when we increase the load of data then write complexity in hub increases as compared to SI. The end objective is to have better performance and no data lag.

Thanks,
Akshay.

------------------------------
Akshay Pandita
Advisory Consultant - Professional Services

------------------------------
2. RE: Search Index v/s Data Hub

1 Like
Employee

Brad Stengel
Posted 08-22-2019 14:15

Reply Reply Privately
Hi Akshay,

I guess let's start by clarifying some terms. I suspect you are not actually matching using Search Index or Data Hub, but rather querying one of those to retrieve candidates to be passed to the matching engine. You correctly identify that the search index, being an implementation of the Elasticsearch search engine, provides robust indexing and searching that scales across the cluster. The Data Hub is a graph implementation that supports similar indexing and searching - including range, substring (contains), starts- or ends-with, etc. Because every node has a complete copy of the graph, any node can service any query, so queries scale well across the cluster. In reality you could use either the Search Index or the Data Hub as a source of candidates for matching. The implementation would be slightly different, using the candidate finder stage to query the search index, and the query hub stage for data hub.

Since you mention write complexity in the hub in talking about scalability, I'm going to assume you're talking about the performance of Write to Hub stage operations in a cluster vs a single server implementation (scaling out vs scaling up). One of the options for writing to the data hub locks the record(s) on all nodes when performing a write or update, and this takes longer with more nodes than with fewer. There are write strategies that avoid this although they aren't appropriate to all use cases.

You can contact me directly by email if you want to talk more about the particular client you are working with.

------------------------------
Brad Stengel
PITNEY BOWES SOFTWARE, INC
Miami Lakes FL
------------------------------

Original Message

Spectrum Technology Platform

Search Index v/s Data Hub

Akshay Pandita08-20-2019 02:16

Brad Stengel08-22-2019 14:15

1. Search Index v/s Data Hub

2. RE: Search Index v/s Data Hub

About Precisely

Customer Support

Copyright ©2024 Precisely. All rights reserved worldwide.