Otu esi etinye na ịtọ ntọala Apache Spark na Ubuntu/Debian
Apache Spark bụ usoro mgbakọ na mwepụ na-ekesa na-emepe emepe nke emepụtara iji nye nsonaazụ mgbako ngwa ngwa. Ọ bụ injin mgbako na ebe nchekwa, nke pụtara na a ga-ahazi data ahụ na ebe nchekwa.
Spark na-akwado API dị iche iche maka mgbasa ozi, nhazi eserese, SQL, MLLib. Ọ na-akwado Java, Python, Scala, na R dị ka asụsụ kacha amasị. A na-etinyekarị Spark na ụyọkọ Hadoop mana ị nwekwara ike ịwụnye ma hazie spark na ọnọdụ kwụ ọtọ.
N'isiokwu a, anyị ga-ahụ ka esi etinye Apache Spark na nkesa Debian na Ubuntu.
Wụnye Java na Scala na Ubuntu
Iji tinye Apache Spark na Ubuntu, ịkwesịrị itinye Java na Scala na igwe gị. Ọtụtụ nkesa ọgbara ọhụrụ na-abịa na Java arụnyere na ndabara ma ị nwere ike nyochaa ya site na iji iwu a.
$ java -version
Ọ bụrụ na enweghị mmepụta, ịnwere ike ịwụnye Java site na iji edemede anyị ka esi etinye Java na Ubuntu ma ọ bụ mee iwu ndị a ka ịwụnye Java na nkesa Ubuntu na Debian.
$ sudo apt update $ sudo apt install default-jre $ java -version
Na-esote, ị nwere ike iwunye Scala site na ebe nchekwa kwesịrị ekwesị site na ịme iwu ndị a iji chọọ scala ma wụnye ya.
$ sudo apt search scala ⇒ Search for the package $ sudo apt install scala ⇒ Install the package
Iji nyochaa nrụnye nke Scala, mee iwu a.
$ scala -version Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL
Wụnye Apache Spark na Ubuntu
Ugbu a gaa na iwu wget gọọmentị ka ibudata faịlụ ahụ ozugbo na njedebe.
$ wget https://apachemirror.wuchna.com/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz
Ugbu a mepee ọnụ gị wee gbanwee gaa na ebe etinyere faịlụ ebudatara ma mee iwu na-esonụ iji wepụ faịlụ Apache Spark tar.
$ tar -xvzf spark-3.1.1-bin-hadoop2.7.tgz
N'ikpeazụ, bugharịa akwụkwọ ndekọ Spark ewepụtara gaa na/họrọ ndekọ.
$ sudo mv spark-3.1.1-bin-hadoop2.7 /opt/spark
Hazie mgbanwe gburugburu maka Spark
Ugbu a, ị ga-edozi mgbanwe gburugburu ebe obibi ole na ole na faịlụ .profile gị tupu ịmalite ọkụ.
$ echo "export SPARK_HOME=/opt/spark" >> ~/.profile $ echo "export PATH=$PATH:/opt/spark/bin:/opt/spark/sbin" >> ~/.profile $ echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile
Iji jide n'aka na mgbanwe mgbanwe gburugburu ebe obibi ọhụrụ ndị a nwere ike iru n'ime shei ma dị na Apache Spark, ọ dịkwa mkpa ka ịme iwu na-esonụ iji mee mgbanwe na nso nso a.
$ source ~/.profile
Binarị niile metụtara ọkụ ka ịmalite ma kwụsị ọrụ dị n'okpuru folda sbin.
$ ls -l /opt/spark
Bido Apache Spark na Ubuntu
Gbaa iwu a ka ịmalite ọrụ nna ukwu Spark na ọrụ ohu.
$ start-master.sh $ start-workers.sh spark://localhost:7077
Ozugbo amalitere ọrụ ahụ gaa na ihe nchọgharị ahụ wee pịnye ibe nbanye URL na-esonụ. Site na ibe, ị ga-ahụ nna m ukwu na ọrụ ohu ka ebidola.
http://localhost:8080/ OR http://127.0.0.1:8080
Ị nwekwara ike ịlele ma spark-shell na-arụ ọrụ nke ọma site na ịmalite iwu spark-shell.
$ spark-shell
Nke ahụ bụ maka akụkọ a. Anyị ga-ejide gị na akụkọ ọzọ na-atọ ụtọ n'oge adịghị anya.