Monitor Job Runs
To view logs for a Job run, click on the run on the Job's page. For each Job run, you will see Spark driver logs, available for 10 days, and a link to active Spark UI or Spark History Server.
The Spark UI and Spark History Server are only accessible from within your VPC. To access them, you should port-forward into a bastion host within the VPC. Port forward to 4040 for the active application and 18080 for past applications.
JAR Job Logs
By default, JAR Jobs use log4j. Logs are available in the Onehouse console, under the monitoring tab of the Job run details page.
Python Job Logs
For Python Job logs to appear up on Onehouse console, the logs must be formatted with log4j. This must be added in the Job code.
One simple way is to add the following class in the Job Code:
from pyspark.sql import SparkSession
from typing import Optional
class LoggerProvider:
def get_logger(self, spark: SparkSession, custom_prefix: Optional[str] = ""):
log4j_logger = spark._jvm.org.apache.log4j # noqa
return log4j_logger.LogManager.getLogger(custom_prefix + self.__full_name__())
def __full_name__(self):
klass = self.__class__
module = klass.__module__
if module == "__builtin__":
return klass.__name__ # avoid outputs like '__builtin__.str'
return module + "." + klass.__name__
Then, the class that runs Spark should inherit the LoggerProvider
, like the below sample code. Finally, you should print all logs using self.logger
:
class SampleApp(LoggerProvider):
def __init__(self):
self.spark = SparkSession.builder \
.appName("SampleApp") \
.getOrCreate()
self.logger = self.get_logger(self.spark)
def run(self):
self.logger.info("Starting the application")
# Main code
self.spark.stop()
if __name__ == "__main__":
app = SampleApp()
app.run()