Hadoop consists of two core components:

How to Become a Hadoop Developer Explore and Read Our Blogs Written By Our Insutry Experts Learn From KSR Data Vizon

1. HDFS (Hadoop Data File System)

2. MapReduce

HDFS

HDFS stores data across multiple nodes in a cluster and replicates data across multiple nodes to ensure fault tolerance.

MapReduce

MapReduce breaks down a large job into smaller tasks that can be executed in parallel on different nodes in the cluster.

Data Engineering
  • NameNode:
    • The NameNode is the master node in HDFS. It is responsible for maintaining the metadata of the file system, including the location of all data blocks. The NameNode also manages the replication of data blocks to ensure that data is not lost if a node fails.
  • DataNode:
    • DataNodes are the worker nodes in HDFS. They are responsible for storing data blocks. DataNodes also periodically send heartbeat messages to the NameNode to indicate that they are still alive.
  • Secondary NameNode:
    • The Secondary NameNode is a buffer node in HDFS. It stores the intermediate updates to the FS-Image of the Name Node in the Edit-Log and updates information to the final FS-image when the name node is inactive.
  • JobTracker:
    • The JobTracker is the master node in MapReduce. It is responsible for scheduling and monitoring MapReduce jobs. The JobTracker also assigns tasks to TaskTrackers and tracks the progress of tasks.
  • TaskTracker:
    • TaskTrackers are the worker nodes in MapReduce. They are responsible for executing tasks assigned by the JobTracker. TaskTrackers also periodically send heartbeat messages to the JobTracker to indicate that they are still alive.

Disadvantages of Hadoop 1.x Architecture:

  • Single point of failure:
    • The NameNode in Hadoop 1.x is a single point of failure. If the NameNode fails, the entire cluster becomes unavailable.
  • Limited scalability:
    • Hadoop 1.x is limited in terms of scalability. It can support a maximum of 4,000 nodes.
  • Limited support for real-time processing:
    • Hadoop 1.x is primarily designed for batch processing. It is not well-suited for real-time processing of data.
  • Lack of flexibility:
    • The Hadoop 1.x architecture is not very flexible. It is difficult to use Hadoop 1.x for other distributed computing technologies besides MapReduce.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *