Hadoop framework: Construction of distributed environment in cluster mode

Posted on

Source code:GitHub. Click here || Gitee. Click here

1、 Basic environment configuration

1. Three services

Prepare three centos7 services and clone the basic environment from the pseudo distributed environment.

133 hop01134 hop02136 hop03

2. Set host name

##Set name
hostnamectl set-hostname hop01
reboot -f

3. Host name communication

vim /etc/hosts
#Add service node hop01 hop02 hop03

4. SSH password free login

Configure three services SSH password free login.

[root@hop01 ~]# ssh-keygen -t rsa
... all the way back to the end
[root@hop01 ~]# cd .ssh
... permissions are assigned to the specified cluster service
[root@hop01 .ssh]# ssh-copy-id hop01
[root@hop01 .ssh]# ssh-copy-id hop02
[root@hop01 .ssh]# ssh-copy-id hop03
... log in to hop02 on hop01
[root@hop01 ~]# ssh hop02

Here, for the hop01 service, this operation should be performed in both hop02 and hop03 services.

5. Synchronization time

NTP component installation

yum install ntpdate ntp -y
rpm -qa|grep ntp

Basic management command

#View status
service ntpd status
service ntpd start
#Boot up
chkconfig ntpd on

Modify time service hop01

#Modify NTP configuration
vim /etc/ntp.conf
#Add content
restrict mask nomodify notrap
fudge stratum 10

Modify the time mechanism of hop02hop03, synchronize the time from hop01, and log off the mechanism of obtaining time from the network.

# server 0.centos.pool.ntp.org iburst
# server 1.centos.pool.ntp.org iburst
# server 2.centos.pool.ntp.org iburst
# server 3.centos.pool.ntp.org iburst

Write timed tasks

[root@hop02 ~]# crontab -e
*/10 * * * * /usr/sbin/ntpdate hop01

Modify the service time of hop02 and hop03

#Specified time
date -s "2018-05-20 13:14:55"
#View time

In this way, the time will be corrected or synchronized based on the time of hop01 service.

6. Clean up the environment

Clone three centos7 services from the virtual machine of pseudo distributed environment, and delete the data and log folders of the original Hadoop environment configuration.

[root@hop02 hadoop2.7]# rm -rf data/ logs/

2、 Construction of cluster environment

1. Overview of cluster configuration

Service list HDFS file Scheduling yarn Single service
hop01 DataNode NodeManager NameNode
hop02 DataNode NodeManager ResourceManager
hop03 DataNode NodeManager SecondaryNameNode

2. Modify configuration

vim core-site.xml


The three services need to specify the current host name respectively.

vim hdfs-site.xml



Here, modify the number of copies to 3 and specify the secondarynamenode service. The three services also modify and specify the secondarynamenode on the hop03 service.

vim yarn-site.xml


Specify the ResourceManager service on hop02.

vim mapred-site.xml

<! -- server address -- >

<! -- server web address -- >

Specify the relevant web-side viewing address on service hop01.

3. Cluster service configuration


Document:vim slaves


The cluster list of three services is configured here. Modify the same configuration of other services synchronously.

4. Format namenode

Note that the namenode is configured on the hop01 service.

[root@hop01 hadoop2.7]# bin/hdfs namenode -format

5. Start HDFS

[root@hop01 hadoop2.7]# sbin/start-dfs.sh
Starting namenodes on [hop01]
hop01: starting namenode
hop03: starting datanode
hop02: starting datanode
hop01: starting datanode
Starting secondary namenodes [hop03]
hop03: starting secondarynamenode

Pay attention to the print information here, which is consistent with the configuration. Namenodes are started on hop01, and secondary namenodes are started on hop03. You can view and verify each service through JPS command.

6. Start yarn

Note that yarn is configured on the hop02 service, so execute the start command in the hop02 service.

[root@hop02 hadoop2.7]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager
hop03: starting nodemanager
hop01: starting nodemanager
hop02: starting nodemanager,

Note the start-up print log here. So far, all the services planned by the cluster are started.

[root@hop01 hadoop2.7]# jps
4306 NodeManager
4043 DataNode
3949 NameNode
[root@hop02 hadoop2.7]# jps
3733 ResourceManager
3829 NodeManager
3613 DataNode
[root@hop03 hadoop2.7]# jps
3748 DataNode
3928 NodeManager
3803 SecondaryNameNode

View the cluster process under each service, which is consistent with the planning configuration.

7. Web interface


3、 Source code address

GitHub · address
Gitee · address

Recommended reading: programming system arrangement

entry name
[Java describes design patterns, algorithms, data structures]GitHub==GitEE
[Java foundation, concurrency, object-oriented, web development]GitHub==GitEE
[detailed explanation of basic components of spring cloud microservices]GitHub==GitEE
[comprehensive practical case of spring cloud microservice Architecture]GitHub==GitEE
[introduction to basic application of springboot framework to advanced level]GitHub==GitEE
[springboot framework integrated development of common middleware]GitHub==GitEE
[basic cases of data management, distribution and architecture design]GitHub==GitEE
[big data series, storage, components, computing and other frameworks]GitHub==GitEE

Leave a Reply

Your email address will not be published.