SparkGeek

Posts

Jupyter Nodebook setup

January 29, 2024

1) Don't use root user expect installation of Python. Install Jupyter: Don't run the command through root user. python3 -m pip install --user jupyterlab Start Jupyter using below command to make it accessible on the remote browser: //On my system, I have used port as 8888 jupyter notebook --no-browser --port=8080 --ip=0.0.0.0

When there is an error of configuration this link contain the solution

October 14, 2020

http://www.alternatestack.com/development/intellij-add-new-scala-class-option-not-available/ When there is an error of configuration this link contain the solution

IntelliJ Class not Found

September 30, 2020

While runnning my first program of Spark with Scala, I started getting an error message which prevented me from learning. I spent complete night to figure out the problem. Fortunately I am able to resolve the issue. Below are the solution: 1) Close the project. 2) Close IntelliJ 3) Check environment variables. (I was missing HADOOP_HOME) 4) restart the IntelliJ 5) Open the project 6) Wait for sbt configuration to complete. This is another thing I was missing. 7) Project will execute now.

Running PySpark Program through file

August 19, 2019

If you have created a file containing PySpark program and need to run the file then that could be run through Spark Submit utility of spark which is at below location ./spark/bin/spark-submit <FileName.py>

Create File URL

August 19, 2019

File URL can be created by mentioning the absolute path of the file along with file:////PATH_TO_FILE

Running SQLContext

July 31, 2019

val sqlcontext = new org.apache.spark.sql.SQLContext(sc) val cataDF= sqlcontext.read.format("jdbc").option("url", "jdbc:vertica://172.16.67.241:5433/vertica_db").option("driver", "com.vertica.jdbc.Driver").option("dbtable", "DT1_0_8_OOB.Char1_Table").option("user", "release").option("password", "gl").load() cataDF.show()

Search This Blog