Torrent details for "Apache Spark 2 and 3 using Python 3 (Formerly CCA 175)"    Log in to bookmark

wide
Torrent details
Cover
Download
Torrent rating (0 rated)
Controls:
Category:
Language:
English English
Total Size:
9.48 GB
Info Hash:
ecc71c1cb0429a5f1314e9c86925dec7041c98d8
Added By:
Added:  
27-01-2022 12:12
Views:
397
Health:
Seeds:
1
Leechers:
1
Completed:
140
wide




Description
wide
Image error
Description

As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Python as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation of the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Apache Spark 2 and 3 using Python 3 as it covers industry-relevant topics beyond the scope of certification.

About Data Engineering

Data Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.

I have prepared this course for anyone who would like to transition into a Data Engineer role using Pyspark (Python + Spark). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.

Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself.

Setup of Single Node Big Data Cluster

Many of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don’t worry if you do not have the cluster handy, we will guide you through with support via Udemy Q&A.

   Setup Ubuntu based AWS Cloud9 Instance with right configuration
   Ensure Docker is setup
   Setup Jupyter Lab and other key components
   Setup and Validate Hadoop, Hive, YARN and Spark

A quick recap of Python

This course requires a decent knowledge of Python. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Python. If you are not familiar about Python, then we suggest you to go through our other course Data Engineering Essentials – Python, SQL and Spark.

Data Engineering using Spark SQL

Let us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.

   Getting Started with Spark SQL
   Basic Transformations using Spark SQL
   Managing Spark Metastore Tables – Basic DDL and DML
   Managing Spark Metastore Tables Tables – DML and Partitioning
   Overview of Spark SQL Functions
   Windowing Functions using Spark SQL

Data Engineering using Spark Data Frame APIs

Spark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.

   Data Processing Overview using Spark Data Frame APIs
   Processing Column Data using Spark Data Frame APIs
   Basic Transformations using Spark Data Frame APIs – Filtering, Aggregations, and Sorting
   Joining Data Sets using Spark Data Frame APIs
   Windowing Functions using Spark Data Frame APIs – Aggregations, Ranking, and Analytic Functions
   Spark Metastore Databases and Tables

Apache Spark Application Development and Deployment Life Cycle

As Apache Spark based Data Engineers we should be familiar about Application Development and Deployment Lifecycle. As part of this section you will learn the complete life cycle of Development and Deployment Life cycle. It includes but not limited to productionizing the code, externalizing the properties, reviewing the details of Spark Jobs and many more.

   Apache Spark Application Development Lifecycle
   Spark Application Execution Life Cycle and Spark UI
   Setup SSH Proxy to access Spark Application logs
   Deployment Modes of Spark Applications
   Passing Application Properties Files and External Dependencies

All the demos are given on our state of the art Big Data cluster. You can avail one-month complimentary lab access by reaching out to support@itversity.com with Udemy receipt.
Who this course is for:

   Any IT aspirant/professional willing to learn Data Engineering using Apache Spark
   Python Developers who want to learn Spark to add the key skill to be a Data Engineer

Requirements

   Basic programming skills using any programming language
   Self support lab (Instructions provided) or ITVersity lab at additional cost for appropriate environment.
   Minimum memory required based on the environment you are using with 64 bit operating system
   4 GB RAM with access to proper clusters or 16 GB RAM with virtual machines such as Cloudera QuickStart VM

Last Updated 1/2022

  User comments    Sort newest first

No comments have been posted yet.



Post anonymous comment
  • Comments need intelligible text (not only emojis or meaningless drivel).
  • No upload requests, visit the forum or message the uploader for this.
  • Use common sense and try to stay on topic.

  • :) :( :D :P :-) B) 8o :? 8) ;) :-* :-( :| O:-D Party Pirates Yuk Facepalm :-@ :o) Pacman Shit Alien eyes Ass Warn Help Bad Love Joystick Boom Eggplant Floppy TV Ghost Note Msg


    CAPTCHA Image 

    Anonymous comments have a moderation delay and show up after 15 minutes