Untitled 3

予早 2026-04-30 23:53:27
Categories: Tags:

spark on yarn 全套搭建

一、Ubuntu安装

三台,分别为node1、node2、node3

ubuntu最小安装

https://blog.csdn.net/qq_41004932/article/details/124955610

ubuntu设置root初始密码

 ubuntu安装好后,root初始密码(默认密码)不知道,需要设置。
 
 1、先用安装时候的用户登录进入系统
 
 2、输入:sudo passwd  按回车
 
 3、输入新密码,重复输入密码,最后提示passwd:password updated sucessfully
 
 此时已完成root密码的设置
 
 4、输入:su root

重置hostname

修改网卡,静态ip

 vi /etc/netplan/00-installer-config.yaml
 network:
   ethernets:
     ens33:
       dhcp4: true
   version: 2
 network:
   ethernets:
     ens33:
       dhcp4: no
       addresses: [192.168.153.131/24]
       gateway4: 192.168.153.2
       nameservers:
               addresses: [8.8.8.8, 8.8.4.4]
   version: 2
 # 生效
 sudo netplan apply

关闭防火墙

 systemctl stop ufw
 systemctl disable ufw
 systemctl status ufw

二、ssh安装

 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDOi24D5KnKcSPOILCJE+1Ocqnkci3ffSGTbyY9FO4bsOGrnBLXexdhC3GZ/yGvscayQLTo1tI/G8j/7fZD1rAQo+B+UktsJXW4TilNoPGPOaZfs3bRB47ehqnNOPgRe2jwWB3RMX8/hsdDhdogTE9EbU2tP4+WyjgUXQIYINe6FYnPsThhuNiAQXbuJEGeKrTVtLvDOO8poOOH1ihc9yhMPvmaVbUmfi+nihImkCT/W6YQ6Z6rRR1m+/Empr+O6eCDn+yEOjJHEOZmvvWFGCPaprp8lDwNdyibnRdNov/mf38ZIC+ioMPjtPkIe86VP56KqRIhX/oASFhLePoKbAuPkxFnN+VSh8NPyIxOQDj6RRliZts/7Osu4Qbr6lzqO8rp9dMfphPtr6fhTjCY2aCP6C+tWAklnRh+gZbXsXc17p7tGZPcodv7gwHOgOGmpwEmpQPASD6M1MO7zPNGvDSvfxC7S4Ky8cd8PgREvdMH82mi182EG1r7ie+CVyFsoK8= ubuntu@node1
 
 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDCA6yahy/6p+1YtzWoGjk9VpmeHGcSYNyZ/AJm9Ah6tGU8umbO6z5sxMSk9Egj/DC/hC+aFbHit9f++HP2p3gn59Yi3eljRPdQ6/Lvhz1d6NRpyhZUx2vzd/abFTYt0FKpm9pnW6rZfgo9M0rpGd4AzTIAvrDXg14BoWxM9hD/Cs9bq7LUzMnOpaGBf82QAdfeDPo+QdTY3v41SndNVg8D43gvo39P3OJDtY9OtnYsDquOVRInXcWqAm+0b04fVkNwvenOXEPeQObsRA4R5NPlrL+C9/rwO/XqOH9FaN1uXD8uO5xMu7dWtZcapRtiiQo7ldS0iI47o3+anYEpZaFd6oM/cz+t75d4tsq/BXcmaP3zzLefrl1McoX6R+1WKtmyPDF2TQSwMdyVOdmJM+A5t8HATqR3xgDfJnaJlOtaF3x7kPuv8iOICpD0OwMbLhAYaEESh9hZpRDe3cWV1vwSwWVABywjigfl0DZdgqxoLYhrh0D/a3DOYlVWv2l5oIs= ubuntu@node2
 
 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCsPpGIQbx++GcVrwyg7MMfjnNH2lZGtojtfX5Rh47+B0obN2mkf82zJswSiK+lpS2ty1uXQ7edboVKb06Ft1ZOm4JP1SkxbqZzfqP1olwTWofJ/64C+7sKVGij+9ePQx3TvaFtwGuwgv9zIXtaj4aGUyc1IN9+jqVqYiPnzcRCfkUFtUxCTWMQvzACgVjVJTOPcCUl9YgvMJNH2/ahAQ47iNtdQi1Zl1jRyY7fwNKPMTTN0Id3bfHHrpZx8bXoCnDl3+KneI7544itTp5W0k6GmDAzvdV/28Q/G7usgXtUI94N17D30ZUtndvZ+GRP3z7q+agq3olkw9H7pJKToMado5uxED95UlvB/w5knOhdPTZ9CSJy2uzKMDbzwd0j2I1X1dfjGWQWyGLYdmUeiZuTrT0p+KMx5l6IkjZq0J/y8DMqTHyuCEPEMNGJEeF5sOYgQGJ1JZe/9TC2nWNg/MrsL/mqfZBhmabXN7mo9VfyQ84TXXBm7K+qhVxrp8VwuuM= ubuntu@node3
 这一步是把id_rsa.pub密钥复制过来并且命名为master_rsa.pub
 
 然后将该密钥添加进入authorized_keys文件中
 
 tjuwzq@master:~$ cat master_rsa.pub >> authorized_keys

三、安装java

 Ubuntu20.04远程安装Java JDK8
 
 sudo apt-get update
 sudo apt-get install openjdk-8-jdk
 java -version
 
 
 装完后不会配置环境变量 只有bin
 export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
 export PATH=$PATH:${JAVA_HOME}/bin
 
 https://blog.csdn.net/qq_36801710/article/details/79306319

四、scala安装

https://www.scala-lang.org/download/

 curl -fL https://github.com/coursier/coursier/releases/latest/download/cs-x86_64-pc-linux.gz | gzip -d > cs && chmod +x cs && ./cs setup
 cs install scala:2.13.12 && cs install scalac:2.13.12

https://scala-lang.org/download/2.13.12.html

 tar -zxvf scala-2.13.12.tgz
 mv scala-2.13.12 scala
 export SCALA_HOME=/home/ubuntu/scala
 export PATH=$PATH:$SCALA_HOME/bin

五、hadoop安装

https://archive.apache.org/dist/hadoop/common/hadoop-3.3.4/

 hadoop-3.3.4.tar.gz
 tar -zxvf hadoop-3.3.4.tar.gz
 mv hadoop-3.3.4 hadoop
 
 
 export HADOOP_HOME=/home/ubuntu/hadoop
 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HAME/sbin

配置

  1. hadoop-env.sh文件
 export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
 scp -r /home/ubuntu/hadoop/etc node2:/home/ubuntu/hadoop/
 scp -r /home/ubuntu/hadoop/etc node3:/home/ubuntu/hadoop/
 
 或者
 
 scp core-site.xml node2:/home/ubuntu/hadoop/etc/hadoop
 scp hdfs-site.xml node2:/home/ubuntu/hadoop/etc/hadoop
 scp mapred-site.xml node2:/home/ubuntu/hadoop/etc/hadoop
 scp yarn-site.xml node2:/home/ubuntu/hadoop/etc/hadoop
 scp workers node2:/home/ubuntu/hadoop/etc/hadoop
 
 scp core-site.xml node3:/home/ubuntu/hadoop/etc/hadoop
 scp hdfs-site.xml node4:/home/ubuntu/hadoop/etc/hadoop
 scp mapred-site.xml node3:/home/ubuntu/hadoop/etc/hadoop
 scp yarn-site.xml node3:/home/ubuntu/hadoop/etc/hadoop
 scp workers node3:/home/ubuntu/hadoop/etc/hadoop
 ./bin/hdfs namenode -format
 ./sbin/start-all.sh
 或者一个一个启动
 
 jps
 http://node1:9870/
 http://node1:9864/
 http://node2:9864/
 http://node3:9864/
 http://node1:8088/ # resource manager web ui
 namenode重新格式化
 https://blog.csdn.net/hylpeace/article/details/88371411

六、spark安装

 spark-3.2.4-bin-hadoop3.2-scala2.13.tgz
 tar -zxvf spark-3.2.4-bin-hadoop3.2-scala2.13.tgz
 mv spark-3.2.4-bin-hadoop3.2-scala2.13 spark
 
 export SPARK_HOME=/home/ubuntu/spark
 export PATH=$PATH:$SPARK_HOME/bin
 cp spark-env.sh.template spark-env.sh

spark-env.sh

 export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
 export SCALA_HOME=/home/ubuntu/scala
 
 export HADOOP_HOME=/home/ubuntu/hadoop
 export HADOOP_CONF_DIR=/home/ubuntu/hadoop/etc/hadoop
 export YARN_CONF_DIR=/home/ubuntu/hadoop/etc/hadoop
 
 export SPARK_CONF_DIR=/home/ubuntu/spark/conf
 scp /home/ubuntu/spark/conf/spark-env.sh node2:/home/ubuntu/spark/conf/

hadoop 3.3.4

spark 3.2.4

spark与scala对应关系

https://mvnrepository.com/artifact/org.apache.spark/spark-sql

 spark on yarn 模式部署spark,spark集群不用启动,spark-submit任务提交后,由yarn负责资源调度
 以standalone要启动spark集群构建master worker worker,然后向其提交任务,spark on yarn 由yarn充当master worker worker 向yarn提交任务
 spark集群主要是用于资源管理
 spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --name SparkPi --num-executors 1 --executor-memory 1g  --executor-cores 1 /home/ubuntu/spark/examples/jars/spark-examples_2.13-3.2.4.jar 100
 hdfs dfs -mkdir -p /spark/sparkhistory

spark-defaults.conf

 spark.yarn.historyServer.address node1:18080
 spark.history.ui.port 18080
 spark.eventLog.enabled true
 spark.eventLog.dir hdfs://node1:9000/spark/sparkhistory
 spark.eventLog.compress true
 scp spark-defaults.conf node2:/home/ubuntu/spark/conf

spark-env.sh中配置SPARK_HISTORY_OPTS并与spark-defaults.conf中对应配置值保持一致

 export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://node1:9000/spark/sparkhistory"
 /home/liucf/soft/spark-3.1.1/sbin/start-history-server.sh
 
 jps
 
 http://node1:18080/
 
 spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --name SparkPi --num-executors 1 --executor-memory 1g  --executor-cores 1 /home/ubuntu/spark/examples/jars/spark-examples_2.13-3.2.4.jar 100
 
 /home/ubuntu/spark/sbin/start-history-server.sh
 hadoop fs -getfacl /user
 hdfs dfs -setfacl -R -m user:zzz:r-x filepath
 
 上面的命令,就是给名为zzz的账户,对filepath授予读权限。
 -R表示是对filepath及其下面的目录与文件递归生效。
 注意需要加上x可执行权限,否则不生效。

hdfs授权,drwho

https://blog.csdn.net/weixin_42656458/article/details/113176108