AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

https://click.linksynergy.com/fs-bin/click?id=l7C703x9gqw&offerid=624447.14654&type=3&subid=0
  • Recently validated myself with the AWS Certified Data Analytics – Specialty (DAS-C01).
  • Data Analytics – Specialty (DAS-C01) has replaced the previous Big Data – Specialty (DAS-C01).
  • Big Data in itself is a very vast topic and with AWS services, there is lots to cover and know for the exam.
  • If you have worked on Big Data technologies including a bit of Visualization, it would be a great asset to pass this exam.

AWS Certified Data Analytics – Specialty (DAS-C01) exam basically validates

  • Define AWS data analytics services and understand how they integrate with each other.
  • Explain how AWS data analytics services fit in the data lifecycle of collection, storage, processing, and visualization.

Refer AWS Certified Data Analytics – Specialty Exam Guide for details

AWS Certified Data Analytics - Specialty DAS-C01 Domains

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Summary

  • AWS Certified Data Analytics – Specialty exam, as its name suggests, covers a lot of Big Data concepts right from data transfer and collection techniques, storage, pre and post processing, analytics, visualization with the added concepts for data security at each layer.
  • AWS Certified Data Analytics – Specialty exam has 65 questions to be solved within a time limit of 170 minutes
  • Questions and answer options are pretty long, so need time to read through them to make sense of the requirements and filter out the answers
  • As the exam was online from home, there was no access to paper and pen but the trick remains the same, read the question and draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
  • Be sure to cover the following topics
    • Whitepapers and articles
    • Analytics
      • Make sure you know and cover all the services in depth, as 80% of the exam is focused on topics like Glue, Kinesis and Redshift.
      • Glue
        • DAS-C01 covers Glue in detail. This is one of the newly added service as compared to Big Data -Specialty exam
        • Understand Glue as a fully-managed, extract, transform, and load (ETL) service
        • Glue natively supports RDS, Redshift, S3 and databases on EC2 instances.
        • Glue provides Glue crawlers to crawl data and helps discover and create schema in Glue Data Catalog
        • Glue supports Job Bookmark that helps track data that has already been processed during a previous run of an ETL job by persisting state information from the job run. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data or duplicate records.
      • Elastic Map Reduce
        • Understand EMR in depth
        • Understand EMRFS (hint: Use Consistent view to make sure S3 objects referred by different applications are in sync)
        • Know EMR Best Practices (hint: start with many small nodes instead on few large nodes)
        • Know EMR Encryption options
          • supports SSE-S3, SS3-KMS, CSE-KMS and CSE-Custom encryption for EMRFS
          • doesn’t support SSE-C  encryption
          • supports LUKS encryption for local disks
          • supports TLS for data in transit encryption
        • Know Hive can be externally hosted using RDS, Aurora and AWS Glue Data Catalog
        • Know also different technologies
          • Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources
          • Spark is a distributed processing framework and programming model that helps do machine learning, stream processing, or graph analytics using Amazon EMR clusters
          • Zeppelin/Jupyter as a notebook for interactive data exploration and provides open-source web application that can be used to create and share documents that contain live code, equations, visualizations, and narrative text
          • Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store
      • Kinesis
        • Understand Kinesis Data Streams and Kinesis Data Firehose in depth
        • Know Kinesis Data Streams vs Kinesis Firehose
          • Know Kinesis Data Streams is open ended on both producer and consumer. It supports KCL and works with Spark.
          • Know Kinesis Firehose is open ended for producer only. Data is stored in S3, Redshift and ElasticSearch.
          • Kinesis Firehose works in batches with minimum 60secs interval and is near-real time.
          • Kinesis Firehose supports transformation and  custom transformation using Lambda
        • Understand Kinesis Encryption (hint: use server side encryption or encrypt in producer for data streams)
        • Know difference between KPL vs SDK (hint: PutRecords are synchronously, while KPL supports batching)
        • Kinesis Best Practices (hint: increase performance increasing the shards)
      • Elasticsearch
        • Know ElasticSearch is a search service which supports indexing, full text search, faceting etc.
        • Elasticsearch can be used to analysis and supports visualization using Kibana which can be real time.
      • Redshift
        • Understand Redshift in depth
        • Understand Redshift Advanced topics like Workload Management, Distribution Style, Sort key
        • Understand Redshift Spectrum which allows querying data in S3 without loading existing Redshift cluster. It also helps querying S3 data with Redshift data.
        • Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
          • COPY command which allows parallelism, and performs better than multiple COPY commands
          • COPY command can use manifest files to load data
          • COPY command handles encrypted data
        • Understand Redshift Resizing cluster options (elastic resize did not support node type changes before, but does now)
        • Know Redshift views to control access to data.
      • Athena
        • serverless, interactive query service to analyze data in S3 using standard SQL
      • QuickSight
        • Understand QuickSight
        • Know Visual Types (hint: esp. plotting line, bar and story based visualizations)
        • Know Supported Data Sources (hint: supports files)
        • QuickSight provides direct integration with Microsoft AD
        • QuickSight supports Row level security using dataset rules
        • QuickSight supports ML insights as well
      • Know Data Pipeline for data transfer
    • Security, Identity & Compliance
    • Management & Governance Tools
      • Understand AWS CloudWatch for Logs and Metrics. Also, CloudWatch Events more real time alerts as compared to CloudTrail

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Resources

7 thoughts on “AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

  1. Great summary @jayendrapatil I’ll be taking this cert soon I bought the practice exam from AWS and failed.

    Is the practice exam from AWS similar to the real one?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.