How to deploy Spark job to cluster

In the previous post we have created simple Spark job and executed it locally on a single machine. Nevertheless the main goal of Spark framework is to utilize cluster resources consisting of multiple servers and in this way increase data processing throughput. In the real life the amount of data…

How to write Big Data application with Java and Spark

Spark is modern Big Data framework to build highly scalable and feature rich data transformation pipelines. Spark's main advantages are simplicity and high performance compared to its predecessor - Hadoop. You can write Spark applications in main 3 languages: Scala, Java and Python. In this guide I will show you…

Persisting 100k messages per second on single server in real-time

Beside regular problem of Big Data analysis there is one more complex subtle task - persisting highly intensive data stream in a real-time. Image a scenario when your application cluster generates 100k business transactions per second, each one should be properly processed and written to data storage for further analysis…