Big data may be semi-structured or unstructured. The massively parallel processing (MPP) architecture structures big data to enable easy querying for reporting and analytic purposes. MPP systems are sometimes referred to as shared nothing systems. This means that data is partitioned across many servers (otherwise known as nodes) and each server processes queries locally.
Let's explore MPP in detail using the following diagram as a point of reference:
Please see following, an explanation of the diagram:
- The process begins by the Client issuing a query that is then passed to the Master Node.
- The Master Node contains ...