Otu esi etinye na ịtọ ntọala Apache Spark na Ubuntu/Debian


Apache Spark bụ usoro mgbakọ na mwepụ na-ekesa na-emepe emepe nke emepụtara iji nye nsonaazụ mgbako ngwa ngwa. Ọ bụ injin mgbako na ebe nchekwa, nke pụtara na a ga-ahazi data ahụ na ebe nchekwa.

Spark na-akwado API dị iche iche maka mgbasa ozi, nhazi eserese, SQL, MLLib. Ọ na-akwado Java, Python, Scala, na R dị ka asụsụ kacha amasị. A na-etinyekarị Spark na ụyọkọ Hadoop mana ị nwekwara ike ịwụnye ma hazie spark na ọnọdụ kwụ ọtọ.

N'isiokwu a, anyị ga-ahụ ka esi etinye Apache Spark na nkesa Debian na Ubuntu.

Wụnye Java na Scala na Ubuntu

Iji tinye Apache Spark na Ubuntu, ịkwesịrị itinye Java na Scala na igwe gị. Ọtụtụ nkesa ọgbara ọhụrụ na-abịa na Java arụnyere na ndabara ma ị nwere ike nyochaa ya site na iji iwu a.

$ java -version

Ọ bụrụ na enweghị mmepụta, ịnwere ike ịwụnye Java site na iji edemede anyị ka esi etinye Java na Ubuntu ma ọ bụ mee iwu ndị a ka ịwụnye Java na nkesa Ubuntu na Debian.

$ sudo apt update
$ sudo apt install default-jre
$ java -version

Na-esote, ị nwere ike iwunye Scala site na ebe nchekwa kwesịrị ekwesị site na ịme iwu ndị a iji chọọ scala ma wụnye ya.

$ sudo apt search scala  ⇒ Search for the package
$ sudo apt install scala ⇒ Install the package

Iji nyochaa nrụnye nke Scala, mee iwu a.

$ scala -version 

Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Wụnye Apache Spark na Ubuntu

Ugbu a gaa na iwu wget gọọmentị ka ibudata faịlụ ahụ ozugbo na njedebe.

$ wget https://apachemirror.wuchna.com/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz

Ugbu a mepee ọnụ gị wee gbanwee gaa na ebe etinyere faịlụ ebudatara ma mee iwu na-esonụ iji wepụ faịlụ Apache Spark tar.

$ tar -xvzf spark-3.1.1-bin-hadoop2.7.tgz

N'ikpeazụ, bugharịa akwụkwọ ndekọ Spark ewepụtara gaa na/họrọ ndekọ.

$ sudo mv spark-3.1.1-bin-hadoop2.7 /opt/spark

Hazie mgbanwe gburugburu maka Spark

Ugbu a, ị ga-edozi mgbanwe gburugburu ebe obibi ole na ole na faịlụ .profile gị tupu ịmalite ọkụ.

$ echo "export SPARK_HOME=/opt/spark" >> ~/.profile
$ echo "export PATH=$PATH:/opt/spark/bin:/opt/spark/sbin" >> ~/.profile
$ echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile

Iji jide n'aka na mgbanwe mgbanwe gburugburu ebe obibi ọhụrụ ndị a nwere ike iru n'ime shei ma dị na Apache Spark, ọ dịkwa mkpa ka ịme iwu na-esonụ iji mee mgbanwe na nso nso a.

$ source ~/.profile

Binarị niile metụtara ọkụ ka ịmalite ma kwụsị ọrụ dị n'okpuru folda sbin.

$ ls -l /opt/spark

Bido Apache Spark na Ubuntu

Gbaa iwu a ka ịmalite ọrụ nna ukwu Spark na ọrụ ohu.

$ start-master.sh
$ start-workers.sh spark://localhost:7077

Ozugbo amalitere ọrụ ahụ gaa na ihe nchọgharị ahụ wee pịnye ibe nbanye URL na-esonụ. Site na ibe, ị ga-ahụ nna m ukwu na ọrụ ohu ka ebidola.

http://localhost:8080/
OR
http://127.0.0.1:8080

Ị nwekwara ike ịlele ma spark-shell na-arụ ọrụ nke ọma site na ịmalite iwu spark-shell.

$ spark-shell

Nke ahụ bụ maka akụkọ a. Anyị ga-ejide gị na akụkọ ọzọ na-atọ ụtọ n'oge adịghị anya.