
Map Dataframe

import org.apache.spark.sql.{Row , SparkSession} import org.apache.spark.sql.types.{MapType , StringType , StructField , StructType} object SparkProject { def main (args: Array[String]): Unit = { // Set log levels org.apache.log4j.LogManager. getLogger ( "org" ).setLevel(org.apache.log4j.Level. ERROR ) org.apache.log4j.LogManager. getLogger ( "akka" ).setLevel(org.apache.log4j.Level. ERROR ) // Create a Spark session val spark = SparkSession. builder () .master( "local[1]" ) .appName( "SparkByExample" ) .getOrCreate() // Define the schema for the DataFrame val schema = StructType (Seq( StructField( "name" , StringType , true ) , StructField( "songs" , MapType(StringType , StringType , true ) , true ) )) // Create a Seq of Rows representing the data val data = Seq( Row ( "sublime" , Map( "good_song" -> "santeria" , &qu

Jupyter Nodebook setup

1) Don't use root user expect installation of Python. Install Jupyter: Don't run the command through root user. python3 -m pip install --user jupyterlab Start Jupyter using below command to make it accessible on the remote browser: //On my system, I have used port as 8888 jupyter notebook --no-browser --port=8080 --ip=

When there is an error of configuration this link contain the solution When there is an error of configuration this link contain the solution

IntelliJ Class not Found

 While runnning my first program of Spark with Scala, I started getting an error message which prevented me from learning. I spent complete night to figure out the problem. Fortunately I am able to resolve the issue. Below are the solution: 1) Close the project. 2) Close IntelliJ 3) Check environment variables. (I was missing HADOOP_HOME) 4) restart the IntelliJ 5) Open the project 6) Wait for sbt configuration to complete. This is another thing I was missing. 7) Project will execute now.

Running PySpark Program through file

If you have created a file containing PySpark program and need to run the file then that could be run through Spark Submit utility of spark which is at below location ./spark/bin/spark-submit <>

Create File URL

File URL can be created by mentioning the absolute path of the file along with file:////PATH_TO_FILE

Running SQLContext

val sqlcontext = new org.apache.spark.sql.SQLContext(sc) val cataDF="jdbc").option("url", "jdbc:vertica://").option("driver", "com.vertica.jdbc.Driver").option("dbtable", "DT1_0_8_OOB.Char1_Table").option("user", "release").option("password", "gl").load()