Top 5 challenges and recipes when starting Apache Spark project

Intro Apache Spark is the most advanced and powerful computation engine intended for big data analytics. Beside data engine it also provides libraries for streaming (Spark Streaming), machine learning (MLlib) and graph processing (GraphX). Historically Spark emerged as a successor of Hadoop ecosystem with a following key advantages: Spark provides…

How to work with Big Data from Java Spring applications

Intro This article shows how to get around all tough stuff related to Big Data infrastructure, how to work with data fast and comfortably, without thinking about code deployment, keeping focused on business goals and getting things done as quickly as possible. And Zentadata Platform is an answer. It is…

Data analytics for everyone with Zentadata Data Studio

Intro This article is addressed for wider auditory of business analysts, data scientists, quality engineers and developers, in other words to people who work with data, make some analysis, build reports etc. There are lot of real business cases that can be simply solved with Zentadata platform. Today we are…

Quick start guide - Zentadata Developer Edition

Overview Zentadata Developer Edition is the simplest solution to start data analysis on your local machine right away. It is totally free and available for everyone. Zentadata Developer Edition consits of 2 modules: Data Studio data analytics IDE where you actually work with a data Developer Cluster data processing engine…

How Data Driven Enterprise makes business effective?

Intro There is a lot of information about benefits of Data Driven Enterprise. With the most prominent characteristics like following: Leverage data to prove multiple theories and choose the best one Continuously research business data to find new opportunities Seamless integration of ML into business processes to gain extra revenue…

How to deploy Spark job to cluster

In the previous post we have created simple Spark job and executed it locally on a single machine. Nevertheless the main goal of Spark framework is to utilize cluster resources consisting of multiple servers and in this way increase data processing throughput. In the real life the amount of data…

How to write Big Data application with Java and Spark

Spark is modern Big Data framework to build highly scalable and feature rich data transformation pipelines. Spark's main advantages are simplicity and high performance compared to its predecessor - Hadoop. You can write Spark applications in main 3 languages: Scala, Java and Python. In this guide I will show you…

How to run Postgres in Docker

If you need to run Postgres database for development needs you can just install it manually on local machine. But you should be aware that this procedure will require some level of knowledge about Postgres installation and maintaining procedures. On the other hand there is much more simple way -…