AWS RDS Read Replicas

January 17, 2024 ~ Last updated on : February 12, 2024 ~ jayendrapatil

RDS Read Replicas

RDS Read Replica is a read-only copy of the DB instance.

RDS Read Replicas provide enhanced performance and durability for RDS.
RDS Read Replicas allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads.

RDS Read replicas enable increased scalability and database availability in the case of an AZ failure.
Read Replicas can help reduce the load on the source DB instance by routing read queries from applications to the Read Replica.
Read replicas can also be promoted when needed to become standalone DB instances.

RDS read replicas can be Multi-AZ i.e. set up with their own standby instances in a different AZ.
One or more replicas of a given source DB Instance can serve high-volume application read traffic from multiple copies of the data, thereby increasing aggregate read throughput.
RDS uses DB engines’ built-in replication functionality to create a special type of DB instance called a Read Replica from a source DB instance. It uses the engines’ native asynchronous replication to update the read replica whenever there is a change to the source DB instance.

Read Replicas are eventually consistent due to asynchronous replication.
RDS sets up a secure communications channel using public-key encryption between the source DB instance and the read replica, even when replicating across regions.
Read replica operates as a DB instance that allows only read-only connections. Applications can connect to a read replica just as they would to any DB instance.
Read replicas are available in RDS for MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server as well as Aurora.

RDS replicates all databases in the source DB instance.
RDS supports replication between an RDS MySQL or MariaDB DB instance and a MySQL or MariaDB instance that is external to RDS using Binary Log File Position or Global Transaction Identifiers (GTIDs) replication.

Read Replicas Creation

Read Replicas can be created within the same AZ, different AZ within the same region, and cross-region as well.
Up to five Read Replicas can be created from one source DB instance.

Creation process
- Automatic backups must be enabled on the source DB instance by setting the backup retention period to a value other than 0
- An existing DB instance needs to be specified as the source.
- RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot.
- RDS then uses the asynchronous replication method for the DB engine to update the Read Replica for any changes to the source DB instance.
RDS replicates all databases in the source DB instance.
RDS sets up a secure communications channel between the source DB instance and the Read Replica if that Read Replica is in a different AWS region from the DB instance.

RDS establishes any AWS security configurations, such as adding security group entries, needed to enable the secure channel.
During the Read Replica creation, a brief I/O suspension on the source DB instance can be experienced as the DB snapshot occurs.
I/O suspension typically lasts about one minute and can be avoided if the source DB instance is a Multi-AZ deployment (in the case of Multi-AZ deployments, DB snapshots are taken from the standby).

Read Replica creation time can be slow if any long-running transactions are being executed and should wait for completion
For multiple Read Replicas created in parallel from the same source DB instance, only one snapshot is taken at the start of the first create action.
A Read Replica can be promoted to a new independent source DB, in which case the replication link is broken between the Read Replica and the source DB. However, the replication continues for other replicas using the original source DB as the replication source

Read Replica Deletion & DB Failover

Read Replicas must be explicitly deleted, using the same mechanisms for deleting a DB instance.
If the source DB instance is deleted without deleting the replicas, each replica is promoted to a stand-alone, single-AZ DB instance.
If the source instance of a Multi-AZ deployment fails over to the standby, any associated Read Replicas are switched to use the secondary as their replication source.

Read Replica Storage & Compute requirements

A Read Replica, by default, is created with the same storage type as the source DB instance.
For replication to operate effectively, each Read Replica should have the same amount of compute & storage resources as the source DB instance.
Read Replicas should be scaled accordingly if the source DB instance is scaled.

Read Replicas Promotion

A read replica can be promoted into a standalone DB instance.
When the read replica is promoted
- New DB instance is rebooted before it becomes available.
- New DB instance that is created retains the option group and the parameter group of the former read replica.
- The promotion process can take several minutes or longer to complete, depending on the size of the read replica.
- If a source DB instance has several read replicas, promoting one of the read replicas to a DB instance has no effect on the other replicas.
If you plan to promote a read replica to a standalone instance, AWS recommends that you enable backups and complete at least one backup prior to promotion.

Read Replicas Promotion can help with
- Performing DDL operations (MySQL and MariaDB only)
  - DDL Operations such as creating or rebuilding indexes can take time and can be performed on the read replica once it is in sync with its primary DB instance.
- Sharding
  - Sharding embodies the “share-nothing” architecture and essentially involves breaking a large database into several smaller databases.
  - Read Replicas can be created and promoted corresponding to each of the shards and then using a hashing algorithm to determine which host receives a given update.
- Implementing failure recovery
  - Read replica promotion can be used as a data recovery scheme if the primary DB instance fails.

Read Replicas Multi-AZ

RDS read replicas can be Multi-AZ and we can have read-only standby instances in a different AZ.

Read Replicas is currently supported for MySQL, MariaDB, PostgreSQL, and Oracle database engines.
Read Replicas with Multi-AZ help build a resilient disaster recovery strategy and simplify the database engine upgrade process.
Read replica as Multi-AZ, allows you to use the read replica as a DR target providing automatic failover.

Also, when you promote the read replica to be a standalone database, it will already be Multi-AZ enabled.

Cross-Region Read Replicas

Supported for MySQL, PostgreSQL, MariaDB, and Oracle.
Not supported for SQL Server
Cross-Region Read Replicas help to improve
- disaster recovery capabilities (reduces RTO and RPO),
- scale read operations into a region closer to end users,
- migration from a data center in one region to another region
A source DB instance can have cross-region read replicas in multiple AWS Regions.
Cross-Region RDS read replica can be created from a source RDS DB instance that is not a read replica of another RDS DB instance.

Replica lags are higher for Cross-region replicas. This lag time comes from the longer network channels between regional data centers.
RDS can’t guarantee more than five cross-region read replica instances, due to the limit on the number of access control list (ACL) entries for a VPC
Read Replica uses the default DB parameter group and DB option group for the specified DB engine.

Read Replica uses the default security group.
Deleting the source for a cross-Region read replica will result in
- read replica promotion for MariaDB, MySQL, and Oracle DB instances
- no read replica promotion for PostgreSQL DB instances and the replication status of the read replica is set to terminated.

Read Replica Features & Limitations

RDS does not support circular replication.
DB instance cannot be configured to serve as a replication source for an existing DB instance; a new Read Replica can be created only from an existing DB instance for e.g., if MyDBInstance replicates to ReadReplica1, ReadReplica1 can’t be configured to replicate back to MyDBInstance. From ReadReplica1, only a new Read Replica can be created, such as ReadRep2.
Read Replica can be created from other Read replicas as well. However, the replica lag is higher for these instances and there cannot be more than four instances involved in a replication chain.

RDS Read Replicas Use Cases

Scaling beyond the compute or I/O capacity of a single DB instance for read-heavy database workloads, directing excess read traffic to Read Replica(s)

Serving read traffic while the source DB instance is unavailable for e.g. If the source DB instance cannot take I/O requests due to backups I/O suspension or scheduled maintenance, the read traffic can be directed to the Read Replica(s). However, the data might be stale.
Business reporting or data warehousing scenarios where business reporting queries can be executed against a Read Replica, rather than the primary, production DB instance.
Implementing disaster recovery by promoting the read replica to a standalone instance as a disaster recovery solution, if the primary DB instance fails.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

You are running a successful multi-tier web application on AWS and your marketing department has asked you to add a reporting tier to the application. The reporting tier will aggregate and publish status reports every 30 minutes from user-generated information that is being stored in your web applications database. You are currently running a Multi-AZ RDS MySQL instance for the database tier. You also have implemented ElastiCache as a database caching layer between the application tier and database tier. Please select the answer that will allow you to successfully implement the reporting tier with as little impact as possible to your database.
1. Continually send transaction logs from your master database to an S3 bucket and generate the reports off the S3 bucket using S3 byte range requests.
2. Generate the reports by querying the synchronously replicated standby RDS MySQL instance maintained through Multi-AZ (Standby instance cannot be used as a scaling solution)
3. Launch a RDS Read Replica connected to your Multi-AZ master database and generate reports by querying the Read Replica.
4. Generate the reports by querying the ElastiCache database caching tier. (ElasticCache does not maintain full data and is simply a caching solution)
Your company is getting ready to do a major public announcement of a social media site on AWS. The website is running on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers)
1. Deploy ElastiCache in-memory cache running in each availability zone
2. Implement sharding to distribute load to multiple RDS MySQL instances (this is only a read contention, the writes work fine)
3. Increase the RDS MySQL Instance size and Implement provisioned IOPS (not scalable, this is only a read contention, the writes work fine)
4. Add an RDS MySQL read replica in each availability zone

Your company has HQ in Tokyo and branch offices all over the world and is using logistics software with a multi-regional deployment on AWS in Japan, Europe and US. The logistic software has a 3-tier architecture and currently uses MySQL 5.6 for data persistence. Each region has deployed its own database. In the HQ region you run an hourly batch process reading data from every region to compute cross-regional reports that are sent by email to all offices this batch process must be completed as fast as possible to quickly optimize logistics. How do you build the database architecture in order to meet the requirements?
1. For each regional deployment, use RDS MySQL with a master in the region and a read replica in the HQ region
2. For each regional deployment, use MySQL on EC2 with a master in the region and send hourly EBS snapshots to the HQ region
3. For each regional deployment, use RDS MySQL with a master in the region and send hourly RDS snapshots to the HQ region
4. For each regional deployment, use MySQL on EC2 with a master in the region and use S3 to copy data files hourly to the HQ region
5. Use Direct Connect to connect all regional MySQL deployments to the HQ region and reduce network latency for the batch process

Your business is building a new application that will store its entire customer database on a RDS MySQL database, and will have various applications and users that will query that data for different purposes. Large analytics jobs on the database are likely to cause other applications to not be able to get the query results they need to, before time out. Also, as your data grows, these analytics jobs will start to take more time, increasing the negative effect on the other applications. How do you solve the contention issues between these different workloads on the same data?
1. Enable Multi-AZ mode on the RDS instance
2. Use ElastiCache to offload the analytics job data
3. Create RDS Read-Replicas for the analytics work
4. Run the RDS instance on the largest size possible
If I have multiple Read Replicas for my master DB Instance and I promote one of them, what happens to the rest of the Read Replicas?
1. The remaining Read Replicas will still replicate from the older master DB Instance
2. The remaining Read Replicas will be deleted
3. The remaining Read Replicas will be combined to one read replica
You need to scale an RDS deployment. You are operating at 10% writes and 90% reads, based on your logging. How best can you scale this in a simple way?
1. Create a second master RDS instance and peer the RDS groups.
2. Cache all the database responses on the read side with CloudFront.
3. Create read replicas for RDS since the load is mostly reads.
4. Create a Multi-AZ RDS installs and route read traffic to standby.

A customer is running an application in US-West (Northern California) region and wants to setup disaster recovery failover to the Asian Pacific (Singapore) region. The customer is interested in achieving a low Recovery Point Objective (RPO) for an Amazon RDS multi-AZ MySQL database instance. Which approach is best suited to this need?
1. Synchronous replication
2. Asynchronous replication
3. Route53 health checks
4. Copying of RDS incremental snapshots
A user is using a small MySQL RDS DB. The user is experiencing high latency due to the Multi AZ feature. Which of the below mentioned options may not help the user in this situation?
1. Schedule the automated back up in non-working hours
2. Use a large or higher size instance
3. Use PIOPS
4. Take a snapshot from standby Replica
My Read Replica appears “stuck” after a Multi-AZ failover and is unable to obtain or apply updates from the source DB Instance. What do I do?
1. You will need to delete the Read Replica and create a new one to replace it.
2. You will need to disassociate the DB Engine and re associate it.
3. The instance should be deployed to Single AZ and then moved to Multi- AZ once again
4. You will need to delete the DB Instance and create a new one to replace it.
A company is running a batch analysis every hour on their main transactional DB running on an RDS MySQL instance to populate their central Data Warehouse running on Redshift. During the execution of the batch their transactional applications are very slow. When the batch completes they need to update the top management dashboard with the new data. The dashboard is produced by another system running on-premises that is currently started when a manually-sent email notifies that an update is required The on-premises system cannot be modified because is managed by another team. How would you optimize this scenario to solve performance issues and automate the process as much as possible?
1. Replace RDS with Redshift for the batch analysis and SNS to notify the on-premises system to update the dashboard
2. Replace RDS with Redshift for the batch analysis and SQS to send a message to the on-premises system to update the dashboard
3. Create an RDS Read Replica for the batch analysis and SNS to notify me on-premises system to update the dashboard
4. Create an RDS Read Replica for the batch analysis and SQS to send a message to the on-premises system to update the dashboard.

References

AWS_RDS_Read_Replicas

AWS RDS Multi-AZ Deployment

December 17, 2023 ~ Last updated on : January 31, 2024 ~ jayendrapatil ~ 1 Comment

RDS Multi-AZ Deployment

RDS Multi-AZ deployments provide high availability and automatic failover support for DB instances

Multi-AZ helps improve the durability and availability of a critical system, enhancing availability during planned system maintenance, DB instance failure, and Availability Zone disruption.
A Multi-AZ DB instance deployment
- has one standby DB instance that provides failover support but doesn’t serve read traffic.
- There is only one row for the DB instance.
- The value of Role is Instance or Primary.
- The value of Multi-AZ is Yes.
A Multi-AZ DB cluster deployment
- has two standby DB instances that provide failover support and can also serve read traffic.
- There is a cluster-level row with three DB instance rows under it.
- For the cluster-level row, the value of Role is Multi-AZ DB cluster.
- For each instance-level row, the value of Role is Writer instance or Reader instance.
- For each instance-level row, the value of Multi-AZ is 3 Zones.

RDS Multi-AZ DB Instance Deployment

RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different AZ.
RDS performs an automatic failover to the standby, so that database operations can be resumed as soon as the failover is complete.

RDS Multi-AZ deployment maintains the same endpoint for the DB Instance after a failover, so the application can resume database operation without the need for manual administrative intervention.
Multi-AZ is a high-availability feature and NOT a scaling solution for read-only scenarios; a standby replica can’t be used to serve read traffic. To service read-only traffic, use a Read Replica.
RDS performs an automatic failover to the standby, so that database operations can be resumed as soon as the failover is complete.

Multi-AZ deployments for Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Mirroring.

RDS Multi-AZ DB Cluster Deployment

RDS Multi-AZ DB cluster deployment is a high-availability deployment mode of RDS with two readable standby DB instances.
RDS Multi-AZ DB cluster has a writer DB instance and two reader DB instances in three separate AZs in the same AWS Region.

With a Multi-AZ DB cluster, RDS semi-synchronously replicates data from the writer DB instance to both of the reader DB instances using the DB engine’s native replication capabilities.
Multi-AZ DB clusters provide high availability, increased capacity for read workloads, and lower write latency when compared to Multi-AZ DB instance deployments.
If an event of an outage, RDS manages failover from the writer DB instance to one of the reader DB instances. RDS does this based on which reader DB instance has the most recent change record.

Multi-AZ DB Instance vs Multi-AZ DB Cluster

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A company is deploying a new two-tier web application in AWS. The company has limited staff and requires high availability, and the application requires complex queries and table joins. Which configuration provides the solution for the company’s requirements?
1. MySQL Installed on two Amazon EC2 Instances in a single Availability Zone (does not provide High Availability out of the box)
2. Amazon RDS for MySQL with Multi-AZ
3. Amazon ElastiCache (Just a caching solution)
4. Amazon DynamoDB (Not suitable for complex queries and joins)
What would happen to an RDS (Relational Database Service) multi-Availability Zone deployment if the primary DB instance fails?
1. IP of the primary DB Instance is switched to the standby DB Instance.
2. A new DB instance is created in the standby availability zone.
3. The canonical name record (CNAME) is changed from primary to standby.
4. The RDS (Relational Database Service) DB instance reboots.

Will my standby RDS instance be in the same Availability Zone as my primary?
1. Only for Oracle RDS types
2. Yes
3. Only if configured at launch
4. No
Is creating a Read Replica of another Read Replica supported?
1. Only in certain regions
2. Only with MySQL based RDS
3. Only for Oracle RDS types
4. No
A user is planning to set up the Multi-AZ feature of RDS. Which of the below mentioned conditions won’t take advantage of the Multi-AZ feature?
1. Availability zone outage
2. A manual failover of the DB instance using Reboot with failover option
3. Region outage
4. When the user changes the DB instance’s server type
When you run a DB Instance as a Multi-AZ deployment, the “_____” serves database writes and reads
1. secondary
2. backup
3. stand by
4. primary
When running my DB Instance as a Multi-AZ deployment, can I use the standby for read or write operations?
1. Yes
2. Only with MSSQL based RDS
3. Only for Oracle RDS instances
4. No

Read Replicas require a transactional storage engine and are only supported for the _________ storage engine
1. OracleISAM
2. MSSQLDB
3. InnoDB
4. MyISAM
A user is configuring the Multi-AZ feature of an RDS DB. The user came to know that this RDS DB does not use the AWS technology, but uses server mirroring to achieve replication. Which DB is the user using right now?
1. MySQL
2. Oracle
3. MS SQL
4. PostgreSQL
If you have chosen Multi-AZ deployment, in the event of a planned or unplanned outage of your primary DB Instance, Amazon RDS automatically switches to the standby replica. The automatic failover mechanism simply changes the ______ record of the main DB Instance to point to the standby DB Instance.
1. DNAME
2. CNAME
3. TXT
4. MX

When automatic failover occurs, Amazon RDS will emit a DB Instance event to inform you that automatic failover occurred. You can use the _____ to return information about events related to your DB Instance
1. FetchFailure
2. DescriveFailure
3. DescribeEvents
4. FetchEvents
The new DB Instance that is created when you promote a Read Replica retains the backup window period.
1. TRUE
2. FALSE
Will I be alerted when automatic failover occurs?
1. Only if SNS configured
2. No
3. Yes
4. Only if Cloudwatch configured

Can I initiate a “forced failover” for my MySQL Multi-AZ DB Instance deployment?
1. Only in certain regions
2. Only in VPC
3. Yes
4. No
A user is accessing RDS from an application. The user has enabled the Multi-AZ feature with the MS SQL RDS DB. During a planned outage how will AWS ensure that a switch from DB to a standby replica will not affect access to the application?
1. RDS will have an internal IP which will redirect all requests to the new DB
2. RDS uses DNS to switch over to standby replica for seamless transition
3. The switch over changes Hardware so RDS does not need to worry about access
4. RDS will have both the DBs running independently and the user has to manually switch over

Which of the following is part of the failover process for a Multi-AZ Amazon Relational Database Service (RDS) instance?
1. The failed RDS DB instance reboots.
2. The IP of the primary DB instance is switched to the standby DB instance.
3. The DNS record for the RDS endpoint is changed from primary to standby.
4. A new DB instance is created in the standby availability zone.
Which of these is not a reason a Multi-AZ RDS instance will failover?
1. An Availability Zone outage
2. A manual failover of the DB instance was initiated using Reboot with failover
3. To autoscale to a higher instance class (Refer link)
4. Master database corruption occurs
5. The primary DB instance fails

How does Amazon RDS multi Availability Zone model work?
1. A second, standby database is deployed and maintained in a different availability zone from master, using synchronous replication. (Refer link)
2. A second, standby database is deployed and maintained in a different availability zone from master using asynchronous replication.
3. A second, standby database is deployed and maintained in a different region from master using asynchronous replication.
4. A second, standby database is deployed and maintained in a different region from master using synchronous replication.
A user is using a small MySQL RDS DB. The user is experiencing high latency due to the Multi AZ feature. Which of the below mentioned options may not help the user in this situation?
1. Schedule the automated back up in non-working hours
2. Use a large or higher size instance
3. Use PIOPS
4. Take a snapshot from standby Replica

What is the charge for the data transfer incurred in replicating data between your primary and standby?
1. No charge. It is free.
2. Double the standard data transfer charge
3. Same as the standard data transfer charge
4. Half of the standard data transfer charge
A user has enabled the Multi AZ feature with the MS SQL RDS database server. Which of the below mentioned statements will help the user understand the Multi AZ feature better?
1. In a Multi AZ, AWS runs two DBs in parallel and copies the data asynchronously to the replica copy
2. In a Multi AZ, AWS runs two DBs in parallel and copies the data synchronously to the replica copy
3. In a Multi AZ, AWS runs just one DB but copies the data synchronously to the standby replica
4. AWS MS SQL does not support the Multi AZ feature

Choosing the Right Data Science Specialization: Where to Focus Your Skills

December 13, 2023 ~ jayendrapatil

Choosing the Right Data Science Specialization: Where to Focus Your Skills

In the rapidly evolving world of technology, data science stands out as a field of endless opportunities and diverse pathways. With its foundations deeply rooted in statistics, computer science, and domain-specific knowledge, data science has become indispensable for organizations seeking to make data-driven decisions. However, the vastness of this field can be overwhelming, making specialization a strategic necessity for aspiring data scientists.

This article aims to navigate through the labyrinth of data science specializations, helping you align your career with your interests, skills, and the evolving demands of the job market.

Understanding the Breadth of Data Science

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to draw knowledge and discover insights from structured and unstructured data. It includes multiranged activities, from data collection and cleaning to complex algorithmic computations and predictive modeling.

Key Areas Within Data Science

Machine Learning: This involves creating algorithms that can learn from pre-fed data and make predictions or decisions based on it.
Deep Learning: A specialized subdomain of machine learning, focusing on neural networks and algorithms inspired by the structure and function of the brain.
Data Engineering: This is the backbone of data science, focusing on the practical aspects of data collection, storage, and retrieval.

Data Visualization: It involves converting complex data sets into understandable and interactive graphical representations.
Big Data Analytics: This deals with extracting meaningful insights from very large, diverse data sets that are often beyond the capability of traditional data-processing applications.
AI and Robotics: This cutting-edge field combines data science with robotics, focusing on creating machines that can perform actions/operations that typically require human intelligence.

Interconnectivity of These Areas

While these specializations are distinct, they are interconnected. For instance, data engineering is foundational for machine learning, and AI applications often rely on insights derived from big data analytics.

Factors to Consider When Choosing a Specialization

Personal Interests and Strengths
- Your choice should resonate with your personal interests. If you are fascinated by how algorithms can mimic human learning, deep learning could be your calling. Alternatively, if you enjoy the challenges of handling and organizing large data sets, data engineering might suit you.

Industry Demand and Job Market Trends
- It’s crucial to align your specialization with the market demand. Fields like AI and machine learning are rapidly growing and offer numerous job opportunities. Tracking industry trends can provide valuable insights into which specializations are most in demand.
Long-term Career Goals
- Consider where you want to be in your career in the next five to ten years. Some specializations may offer more opportunities for growth, leadership roles, or transitions into different areas of data science.
Impact of Emerging Technologies
- Emerging technologies can redefine the landscape of data science. Continuously updating with the knowledge about these changes can help you choose a specialization that remains relevant in the future.

Deep Dive into Popular Data Science Specializations

Machine Learning
- Overview and Applications: From predictive modeling in finance to recommendation systems in e-commerce, machine learning is revolutionizing various industries.
- Required Skills and Tools: Proficiency in programming languages like Python or R, understanding of algorithms, and familiarity with TensorFlow or Scikit-learn like machine learning frameworks are essential.

Data Engineering
- Role in Data Science: Data engineers build and maintain the infrastructure that allows data scientists to analyze and utilize data effectively.
- Key Skills and Technologies: Skills in database management, ETL (Extract, Transform, Load) processes, and knowledge of SQL, NoSQL, Hadoop, and Spark are crucial.

Big Data Analytics
- Understanding Big Data: This specialization deals with extremely large data sets that discover patterns, trends, and associations, particularly relating to human behavior and interactions.
- Tools and Techniques: Familiarity with big data platforms like Apache Hadoop and Spark, along with data mining and statistical analysis, is important.

AI and Robotics
- The Frontier of Data Science: This field is at the cutting edge, developing intelligent systems with the capability of performing tasks that particularly require human intelligence.
- Skills and Knowledge Base: A deep understanding of AI principles, programming, and robotics is necessary, along with skills in machine learning and neural networks.

Educational Pathways for Each Specialization

Academic Courses and Degrees
- Pursuing a formal education in data science or a related field can provide a strong theoretical foundation. Many universities like MIT now offer specialized courses in machine learning, AI, and big data analytics, like the Data Analysis Certificate program.
Online Courses and Bootcamps
- Online platforms like Great Learning offer specialized courses that are more flexible and often industry-oriented. Bootcamps, on the other hand, provide intensive, hands-on training in specific areas of data science.
Certifications and Workshops
- Professional certifications from recognized bodies can add significant value to your resume. Educational choices like the Data Science course showcase your expertise and commitment to professional development.

Self-learning Resources
- The internet is replete with resources for self-learners. From online tutorials and forums to webinars and eBooks, the opportunities for self-paced learning in data science are abundant.

Building Experience in Your Chosen Specialization

Internships and Entry-level Positions
- Gaining practical experience is crucial. Internships and entry-level positions provide real-world experience and help you understand the practical challenges and applications of your chosen specialization.
Personal and Open-source Projects
- Working on personal data science projects or contributing to open-source projects can be a great way to apply your skills. These projects can also be a valuable addition to your portfolio.

Networking and Community Involvement
- Building a professional network and participating in data science communities can lead to job opportunities and collaborations. Attending industry conferences and seminars is also a great way to stay updated and connected.
Industry Conferences and Seminars
- These events are excellent for learning about the latest industry trends, best data science practices, and emerging technologies. They also offer opportunities to meet industry leaders and peers.

Future Trends and Evolving Specializations

Predicting the Future of Data Science
- The field of data science is constantly evolving. Staying informed about future trends is crucial for choosing a specialization that will remain relevant and in demand.

Emerging Specializations and Technologies
- Areas like quantum computing, edge analytics, and ethical AI are emerging as new frontiers in data science. These fields are likely to offer exciting new opportunities for specialization in the coming years.
Staying Adaptable and Continuous Learning
- The work-way to a successful career in data science is adaptability and a commitment to continuous learning. The field is dynamic, and staying abreast of new developments is essential.

Conclusion

Choosing the right data science specialization is a critical decision that can shape your career trajectory. It requires a careful consideration of your personal interests, the current job market, and future industry trends. Whether your passion lies in the intricate algorithms of machine learning, the structural challenges of data engineering, or the innovative frontiers of AI and robotics, there is a niche for every aspiring data scientist. The journey is one of continuous learning, adaptability, and an unwavering curiosity about the power of data. As the field continues to grow and diversify, the opportunities for data scientists are bound to expand, offering a rewarding and dynamic career path.

Kubernetes and Cloud Native Associate KCNA Exam Learning Path

December 13, 2023 ~ Last updated on : February 20, 2026 ~ jayendrapatil

Kubernetes and Cloud Native Associate KCNA Exam Learning Path

I recently certified for the Kubernetes and Cloud Native Associate – KCNA exam.

KCNA exam focuses on a user’s foundational knowledge and skills in Kubernetes and the wider cloud native ecosystem.
KCNA exam is intended to prepare candidates to work with cloud-native technologies and pursue further CNCF credentials, including CKA, CKAD, and CKS.

KCNA validates the conceptual knowledge of
- the entire cloud native ecosystem, particularly focusing on Kubernetes.
- Kubernetes and cloud-native technologies, including how to deploy an application using basic kubectl commands, the architecture of Kubernetes (containers, pods, nodes, clusters), understanding the cloud-native landscape and projects (storage, networking, GitOps, service mesh), and understanding the principles of cloud-native security.

KCNA Exam Pattern

Certification Validity Changed: From April 1, 2024, all Linux Foundation certifications valid for 24 months (previously 36 months)

Curriculum Changes (Effective 2025): Observability domain rolled under Cloud Native Architecture; domains reduced from 5 to 4

KCNA exam curriculum includes these general domains and their weights on the exam:
- Kubernetes Fundamentals – 46%
- Container Orchestration – 22%
- Cloud Native Architecture – 16%
- Cloud Native Observability – 8%
- Cloud Native Application Delivery – 8%
KCNA exam requires you to solve 60 questions in 90 minutes.

Exam questions can be attempted in any order and don’t have to be sequential. So be sure to move ahead and come back later.
Time is more than sufficient if you are well prepared. I was able to get through the exam within an hour.

KCNA Exam Preparation and Tips

I used the courses from KodeKloud KCNA for practicing and it would be good enough to cover what is required for the exam.

KCNA Resources

Go through the KCNA Curriculum
Linux Foundation KCNA Course and Certification Bundle
KodeKloud – Mumshad Mannambeth Kubernetes and Cloud-Native Associate (KCNA) with Practice Tests
Braincert Kubernetes and Cloud Native Associate – KCNA Practice Exams

KCNA Key Topics

Kubernetes Fundamentals

Kubernetes is a highly popular open-source container orchestration platform that can be used to automate deployment, scaling, and the management of containerized workloads.

Kubernetes Architecture
- A Kubernetes cluster consists of at least one main (control) plane, and one or more worker machines, called nodes.
- Both the control planes and node instances can be physical devices, virtual machines, or instances in the cloud.

ETCD (key-value store)
- Etcd is a consistent, distributed, and highly-available key-value store.
- is stateful, persistent storage that stores all of Kubernetes cluster data (cluster state and config).
- is the source of truth for the cluster.
- can be part of the control plane, or, it can be configured externally.

Kubernetes API
- API server exposes a REST interface to the Kubernetes cluster. It is the front end for the Kubernetes control plane.
- All operations against Kubernetes objects are programmatically executed by communicating with the endpoints provided by it.
- It tracks the state of all cluster components and manages the interaction between them.
- It is designed to scale horizontally.
- It consumes YAML/JSON manifest files.
- It validates and processes the requests made via API.
Scheduling
- The scheduler is responsible for assigning work to the various nodes. It keeps watch over the resource capacity and ensures that a worker node’s performance is within an appropriate threshold.
- It schedules pods to worker nodes.
- It watches api-server for newly created Pods with no assigned node, and selects a healthy node for them to run on.
- If there are no suitable nodes, the pods are put in a pending state until such a healthy node appears.
- It watches API Server for new work tasks.
- Factors taken into account for scheduling decisions include:
  - Individual and collective resource requirements.
  - Hardware/software/policy constraints.
  - Affinity and anti-affinity specifications.
  - Data locality.
  - Inter-workload interference.
  - Deadlines and taints.

Controller Manager
- Controller manager is responsible for making sure that the shared state of the cluster is operating as expected.
- It watches the desired state of the objects it manages and watches their current state through the API server.
- It takes corrective steps to make sure that the current state is the same as the desired state.
- It is a controller of controllers.
- It runs controller processes. Logically, each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.
Kubelet
- A Kubelet tracks the state of a pod to ensure that all the containers are running and healthy
- provides a heartbeat message every few seconds to the control plane.
- runs as an agent on each node in the cluster.
- acts as a conduit between the API server and the node.
- instantiates and executes Pods.
- watches API Server for work tasks.
- gets instructions from master and reports back to Masters.
Kube-proxy
- Kube proxy is a networking component that routes traffic coming into a node from the service to the correct containers.
- is a network proxy that runs on each node in a cluster.
- manages IP translation and routing.
- maintains network rules on nodes. These network rules allow network communication to Pods from inside or outside of cluster.
- ensures each Pod gets a unique IP address.
- makes possible that all containers in a pod share a single IP.
- facilitates Kubernetes networking services and load-balancing across all pods in a service.
- It deals with individual host sub-netting and ensures that the services are available to external parties.

Kubernetes Resources
- Nodes manage and run pods; it’s the machine (whether virtualized or physical) that performs the given work.
- Namespaces
  - provide a mechanism for isolating groups of resources within a single cluster.
  - Kubernetes starts with four initial namespaces:
    - default – default namespace for objects with no other namespace.
    - kube-system – namespace for objects created by the Kubernetes system.
    - kube-public – namespace is created automatically and is readable by all users (including those not authenticated).
    - kube-node-lease – namespace holds Lease objects associated with each node. Node leases allow the kubelet to send heartbeats so that the control plane can detect node failure.
  - Resource Quotas can be defined for each namespace to limit the resources consumed.
  - Resources within the namespaces can refer to each other with their service names.
- Pods
  - is a group of containers and is the smallest unit that Kubernetes administers.
  - Containers in a pod share the same resources such as memory and storage.
- ReplicaSet
  - ensures a stable set of replica Pods running at any given time.
  - helps guarantee the availability of a specified number of identical Pods.
- Deployments
  - provide declarative updates for Pods and ReplicaSets.
  - describe the number of desired identical pod replicas to run and the preferred update strategy used when updating the deployment.
  - supports Rolling Update and Recreate update strategy.
- Services
  - is an abstraction over the pods, and essentially, the only interface the various application consumers interact with.
  - exposes a single machine name or IP address mapped to pods whose underlying names and numbers are unreliable.
  - supports the following types
    - ClusterIP
    - NodePort
    - Load Balancer
- Ingress
  - exposes HTTP and HTTPS routes from outside the cluster to services within the cluster.
- DaemonSet
  - ensures that all (or some) Nodes run a copy of a Pod.
  - ensures pods are added to the newly created nodes and garbage collected as nodes are removed.
- StatefulSet
  - is ideal for stateful applications using ReadWriteOnce volumes.
  - designed to deploy stateful applications and clustered applications that save data to persistent storage, such as persistent disks.
- ConfigMaps
  - helps to store non-confidential data in key-value pairs.
  - can be consumed by pods as environment variables, command-line arguments, or configuration files in a volume.
- Secrets
  - provides a container for sensitive data such as a password without putting the information in a Pod specification or a container image.
  - are not encrypted but only base64 encoded.
- Job & ConJobs
  - creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
  - A CronJob creates Jobs on a repeating schedule.
- Volumes
  - supports Persistent volumes that exist beyond the lifetime of a pod.
  - When a pod ceases to exist, Kubernetes destroys ephemeral volumes; however, Kubernetes does not destroy persistent volumes.
  - PersistentVolume (PV) is a cluster scoped piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes.
  - PersistentVolumeClaim (PVC) is a request for storage by a user.
- Labels and Annotations attach metadata to objects in Kubernetes.
  - Labels are identifying key/value pairs that can be attached to Kubernetes objects and are used in conjunction with selectors to identify groups of related resources.
  - Annotations are key/value pairs designed to hold non-identifying information that can be leveraged by tools and libraries.

Containers
- Container runtime is responsible for running containers (in Pods).
- Kubernetes supports any implementation of the Kubernetes Container Runtime Interface CRI specifications
- To run the containers, each worker node has a container runtime engine.
- It pulls images from a container image registry and starts and stops containers.
- Kubernetes supports several container runtimes:
  - Docker
  - contained
  - CRI-O

Container Orchestration

Container Orchestration Fundamentals
- Containers help manage the dependencies of an application and run much more efficiently than spinning up a lot of virtual machines.
- While virtual machines emulate a complete machine, including the operating system and a kernel, containers share the kernel of the host machine and are only isolated processes.
- Virtual machines come with some overhead, be it boot time, size or resource usage to run the operating system. Containers on the other hand are processes, like the browser, therefore they start a lot faster and have a smaller footprint.
Runtime
- Container runtime is responsible for running containers (in Pods).
- Kubernetes supports any implementation of the Kubernetes Container Runtime Interface CRI specifications
- To run the containers, each worker node has a container runtime engine.
- It pulls images from a container image registry and starts and stops containers.
- Kubernetes supports several container runtimes:
  - Docker – Standard for a long time but the usage of Docker as the runtime for Kubernetes has been deprecated and removed in Kubernetes 1.24
  - contained – containerd is the most popular lightweight and performant implementation to run containers used by all major cloud providers for the Kubernetes As A Service products.
  - CRI-O – CRI-O was created by Red Hat and with a similar code base closely related to podman and buildah.
  - gvisor – Made by Google, provides an application kernel that sits between the containerized process and the host kernel.
  - Kata Containers – A secure runtime that provides a lightweight virtual machine, but behaves like a container
- Security
  - 4C’s of Cloud Native security are Cloud, Clusters, Containers, and Code.
  - Containers are started on a machine and they always share the same kernel, which then becomes a risk for the whole system, if containers are allowed to call kernel functions like for example killing other processes or modifying the host network by creating routing rules.
  - Kubernetes provides security features
    - Authentication using Users & Certificates
      - Certificates are the recommended way
      - Service accounts can be used to provide bearer tokens to authenticate with Kubernetes API.
    - Authorization using Node, ABAC, RBAC, Webhooks
      - Role-based access control is the most secure and recommended authorization mechanism in Kubernetes.
    - Admission Controller is an interceptor to the Kubernetes API server requests prior to persistence of the object, but after the request is authenticated and authorized.
    - Security Context helps define privileges and access control settings for a Pod or Container that includes
    - Service Mesh like Istio and Linkerd can help implement MTLS for intra-cluster pod-to-pod communication.
    - Network Policies help specify how a pod is allowed to communicate with various network “entities” over the network.
    - Kubernetes auditing provides a security-relevant, chronological set of records documenting the sequence of actions in a cluster for activities generated by users, by applications that use the Kubernetes API, and by the control plane itself.
- Networking
  - Container Network Interface (CNI) is a standard that can be used to write or configure network plugins and makes it very easy to swap out different plugins in various container orchestration platforms.
  - Kubernetes networking addresses four concerns:
    - Containers within a Pod use networking to communicate via loopback.
    - Cluster networking provides communication between different Pods.
    - Service API helps expose an application running in Pods to be reachable from outside your cluster.
      - Ingress provides extra functionality specifically for exposing HTTP applications, websites, and APIs.
      - Gateway API is an add-on that provides an expressive, extensible, and role-oriented family of API kinds for modeling service networking.
    - Services can also be used to publish services only for consumption inside the cluster.
- Service Mesh
  - Service Mesh is a dedicated infrastructure layer added to the applications that allows you to transparently add capabilities without adding them to your own code.
  - Service Mesh provides capabilities like service discovery, load balancing, failure recovery, metrics, and monitoring and complex operational requirements, like A/B testing, canary deployments, rate limiting, access control, encryption, and end-to-end authentication.
  - Service mesh uses a proxy to intercept all your network traffic, allowing a broad set of application-aware features based on the configuration you set.
  - Istio is an open source service mesh that layers transparently onto existing distributed applications.
  - An Envoy proxy is deployed along with each service that you start in the cluster, or runs alongside services running on VMs.
  - Istio provides
    - Secure service-to-service communication in a cluster with TLS encryption, strong identity-based authentication and authorization
    - Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic
    - Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection
    - A pluggable policy layer and configuration API supporting access controls, rate limits and quotas
    - Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress
- Storage
  - Container images are read-only and consist of different layers that include everything added during the build phase ensuring that a container from an image provides the same behavior and functionality.
  - To allow writing files, a read-write layer is put on top of the container image when you start a container from an image.
  - Container on-disk files are ephemeral and lost if the container crashes.
  - Container Storage Interface (CSI) provides a uniform and standardized interface that allows attaching different storage systems no matter if it’s cloud or on-premises storage.
  - Kubernetes supports Persistent volumes that exist beyond the lifetime of a pod. When a pod ceases to exist, Kubernetes destroys ephemeral volumes; however, Kubernetes does not destroy persistent volumes.
  - Persistent Volumes is supported using API resources
    - PersistentVolume (PV)
      - is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes.
      - is a cluster-level resource and not bound to a namespace
      - are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV.
    - PersistentVolumeClaim (PVC)
      - is a request for storage by a user.
      - is similar to a Pod.
      - Pods consume node resources and PVCs consume PV resources.
      - Pods can request specific levels of resources (CPU and Memory).
      - Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany, or ReadWriteMany, see AccessModes).
  - Persistent Volumes can be provisioned
    - - Statically – where the cluster administrator creates the PVs which is available for use by cluster users
      - Dynamically using StorageClasses where the cluster may try to dynamically provision a volume, especially for the PVC.

Cloud Native Architecture

Cloud Native Architecture Fundamentals
- Cloud native architecture guides us to optimize the software for scalability, high availability, cost efficiency, reliability, security, and faster time-to-market by using a combination of cultural, technological, and architectural design patterns.
- Cloud native architecture includes containers, service meshes, microservices, immutable infrastructure, and declarative APIs.
- Cloud native techniques enable loosely coupled systems that are resilient, manageable, and observable.
Microservices
- Microservices are small, independent applications with a clearly defined scope of functions and responsibilities.
- Microservices help break down an application into multiple decoupled applications, that communicate with each other in a network, which are more manageable.
- Microservices enable multiple teams to hold ownership of different functions of the application,
- Microservices also enable functions to be operated and scaled individually.
Autoscaling
- Autoscaling pattern provides the ability to dynamically adjust the resources based on the current demand without the need to over or under provision the resources.
- Autoscaling can be performed using
  - Horizontal scaling – Adds new compute resources which can be new copies of the application, Virtual Machines, or physical servers.
  - Vertical scaling – Adds more resources to the existing underlying hardware.
Serverless
- Serverless allows you to just focus on the code while the cloud provider takes care of the underlying resources required to execute the code.
- Most cloud providers provide this feature as Function as a Service (FaaS) like AWS Lambda, GCP Cloud Functions, etc.
- Serverless enables on-demand provisioning and scaling of the applications with a pay-as-you-use model.
- CloudEvents aims to standardize serverless and event-driven architectures on multiple platforms.
  - It provides a specification of how event data should be structured.
  - Events are the basis for scaling serverless workloads or triggering corresponding functions.
Community and Governance
- Open source projects hosted and supported by the CNCF are categorized according to maturity and go through a sandbox and incubation stage before graduating.
- CNCF Technical Oversight Committee – TOC
  - is responsible for defining and maintaining the technical vision, approving new projects, accepting feedback from the end-user committee, and defining common practices that should be implemented in CNCF projects.
  - does not control the projects, but encourages them to be self-governing and community owned and practices the principle of “minimal viable governance”.
- CNCF Project Maturity Levels
  - Sandbox Stage
    - Entry point for early stage projects.
  - Incubating Stage
    - Project meeting the sandbox stage requirements plus full technical due diligence performed, including documentation, a healthy number of committers, contributions, clear versioning scheme, documented security processes, and at least one public reference implementation
  - Graduation Stage
    - Project meeting the incubation stage criteria plus committers from at least two organizations, well-defined project governance, and committer process, maintained Core Infrastructure Initiative Best Practices Badge, third party security audit, public list of project adopters, received a supermajority vote from the TOC.
Personas
- SRE, Security, Cloud, DevOps, and Containers have opened up a lot of different Cloud Native roles
  - Cloud Engineer & Architect
  - DevOps Engineer
  - Security Engineer
  - DevSecOps Engineer
  - Data Engineer
  - Full-Stack Developer
  - Site Reliability Engineer (SRE)
- Site Reliability Engineer – SRE
  - Founded around 2003 by Google, SRE has become an important job for many organizations.
  - SRE’s goal is to create and maintain software that is reliable and scalable.
  - To measure performance and reliability, SREs use three main metrics:
    - Service Level Objectives – SLO: Specify a target level for the reliability of your service.
    - Service Level Indicators – SLI: A carefully defined quantitative measure of some aspect of the level of service that is provided
    - Service Level Agreements – SLA: An explicit or implicit contract with your users that includes consequences of meeting (or missing) the SLOs they contain.
  - Around these metrics, SREs might define an error budget. An error budget defines the amount (or time) of errors the application can have before actions are taken, like stopping deployments to production.
Open Standards
- Open Standards help provide a standardized way to build, package, run, and ship modern software.
- Open standards covers
  - Open Container Initiative (OCI) Spec: image, runtime, and distribution specification on how to run, build, and distribute containers
  - Container Network Interface (CNI): A specification on how to implement networking for Containers.
  - Container Runtime Interface (CRI): A specification on how to implement container runtimes in container orchestration systems.
  - Container Storage Interface (CSI): A specification on how to implement storage in container orchestration systems.
  - Service Mesh Interface (SMI): A specification on how to implement Service Meshes in container orchestration systems with a focus on Kubernetes.
- OCI provides open industry standards for container technologies and defines
  - Image-spec defines how to build and package container images.
  - Runtime-spec specifies the configuration, execution environment, and lifecycle of containers.
  - Distribution-Spec, which provides a standard for the distribution of content in general and container images in particular.

Cloud Native Observability

Telemetry & Observability
- Telemetry is the process of measuring and collecting data points and then transferring them to another system.
- Observability is the ability to understand the state of a system or application by examining its outputs, logs, and performance metrics.
- It’s a measure of how well the internal states of a system can be inferred from knowledge of its external outputs.
- Observability mainly consists of
  - Logs: Interactions between data and the external world with messages from the application.
  - Metrics: Quantitative measurements with numerical values describing service or component behavior over time
  - Traces: Records the progression of the request while passing through multiple distributed systems.
    - Trace consists of Spans, which can include information like start and finish time, name, tags, or a log message.
    - Traces can be stored and analyzed in a tracing system like Jaeger.
- OpenTelemetry
  - is a set of APIs, SDKs, and tools that can be used to integrate telemetry such as metrics, and protocols, but especially traces into applications and infrastructures.
  - OpenTelemetry clients can be used to export telemetry data in a standardized format to central platforms like Jaeger.
Prometheus
- Prometheus is a popular, open-source monitoring system.
- Prometheus can collect metrics that were emitted by applications and servers as time series data
- Prometheus data model provides four core metrics:
  - Counter: A value that increases, like a request or error count
  - Gauge: Values that increase or decrease, like memory size
  - Histogram: A sample of observations, like request duration or response size
  - Summary: Similar to a histogram, but also provides the total count of observations.
- Prometheus provides PromQL (Prometheus Query Language) to query data stored in the Time Series Database (TSDB).
- Prometheus integrates with Grafana, which can be used to build visualization and dashboards from the collected metrics.
- Prometheus integrates with Alertmanager to configure alerts when certain metrics reach or pass a threshold.
Cost Management
- All the Cloud providers work on the Pay-as-you-use model.
- Cost optimization can be performed by analyzing what is really needed, how long, and scaling dynamically as per the needs.
- Some of the cost optimization techniques include
  - Right sizing the workloads and dynamically scaling as per the demand
  - Identify wasted, unused resources and have proper archival techniques
  - Using Reserved or Spot instances as per the workloads
  - Defining proper budgets and alerts

Cloud Native Application Delivery

Application Delivery Fundamentals
- Application delivery includes the application lifecycle right from source code, versioning, building, testing, packaging, and deployments.
- The old process included a lot of error-prone manual steps and the constant fear that something would break.
- DevOps process includes both the developers and administrators and focuses on frequent, error-free, repeatable, rapid deployments.
- Version control systems like Git provide a decentralized system that can be used to track changes in the source code.
CI/CD
- Continuous Integration/Continuous Delivery (CI/CD) provides very fast, more frequent, and higher quality software rollouts with automated builds, tests, code quality checks, and deployments.
  - Continuous Integration focuses on building and testing the written code. High automation and usage of version control allow multiple developers and teams to work on the same code base.
  - Continuous Delivery focuses on automated deployment of the pre-built software.
- CI/CD tools include Jenkins, Spinnaker, Gitlab, ArgoCD, etc.
- CI/CD can be performed using two different approaches
  - Push-based
    - The pipeline is started and runs tools that make the changes in the platform. Changes can be triggered by a commit or merge request.
  - Pull-based
    - An agent watches the git repository for changes and compares the definition in the repository with the actual running state.
    - If changes are detected, the agent applies the changes to the infrastructure.
GitOps
- Infrastructure as a Code with tools like Terraform provides complete automation with versioning and better controls increasing the quality and speed of providing infrastructure.
- GitOps takes the idea of Git as the single source of truth a step further and integrates the provisioning and change process of infrastructure with version control operations.
GitOps frameworks that use the pull-based approach are Flux and ArgoCD.
- ArgoCD is implemented as a Kubernetes controller
- Flux is built with the GitOps Toolkit

KCNA General information and practices

The exam can be taken online from anywhere.
Make sure you have prepared your workspace well before the exams.
Make sure you have a valid government-issued ID card as it would be checked.
You are not allowed to have anything around you and no one should enter the room.
The exam proctor will be watching you always, so refrain from doing any other activities. Your screen is also always shared.

All the Best …

Certified Kubernetes Security Specialist CKS Learning Path

Certified Kubernetes Security Specialist Certificate

November 26, 2023 ~ Last updated on : February 20, 2026 ~ jayendrapatil ~ 3 Comments

Certified Kubernetes Security Specialist CKS Learning Path

With Certified Kubernetes Security Specialist CKS certification, I have recertified the triad of Kubernetes certification. After knowing how to use and administer Kubernetes, the last piece was to understand the security intricacies and CKS preparation does provide you a deep dive into it.

CKS is more of an open-book test, where you have access to the official Kubernetes documentation exam, but it focuses more on hands-on experience.
CKS focuses on securing container-based applications and Kubernetes platforms during build, deployment, and runtime.

Unlike AWS and GCP certifications, you would be required to solve, debug actual problems, and provision resources on a live Kubernetes cluster.
Even though it is an open book test, you need to know where the information is.
Trust me, if you are not prepared this time is not going to be sufficient.

CKS Exam Pattern

IMPORTANT: CKS Exam Updated October 15, 2024 – New topics added including Cilium, SBOM, Kubesec, KubeLinter, and updated security practices
Certification Validity Changed: From April 1, 2024, all Linux Foundation certifications are valid for 24 months (previously 36 months)
CKS exam curriculum includes these general domains and their weights on the exam:
- Cluster Setup – 10%
- Cluster Hardening – 15%
- System Hardening – 15%
- Minimize Microservice Vulnerabilities – 20%
- Supply Chain Security – 20%
- Monitoring, Logging and Runtime Security – 20%

CKS exam has been upgraded and requires you to solve 15-20 questions in 2 hours. I got 16 questions.
CKS was upgraded to use k8s 1.31 version (as of October 2024). Exam keeps being upgraded with new Kubernetes versions.
You are allowed to open another browser tab which can be from kubernetes.io or other product documentation like Falco, Cilium. Do not open any other windows.

Exam questions can be attempted in any order and don’t have to be sequential. So be sure to move ahead and come back later.

CKS Exam Preparation and Tips

I used the courses from KodeKloud CKS for practicing and it would be good enough to cover what is required for the exam.
Prepare yourself with the imperative commands as much as you can. This will help cut down the time required to solve half of the questions.

Each exam question carries weight so be sure you attempt the exams with higher weights before focusing on the lower ones. So target the ones with higher weights and quicker solutions like debugging ones.
CKS exam provides 6-8 different preconfigured K8s clusters. Each question refers to a different Kubernetes cluster, and the context needs to be switched. Be sure to execute the kubectl use context command, which is available with every question and you just need to copy-paste it.
Check for the namespace mentioned in the question, to find resources and create resources. Use the -n <namespace>

You would be performing most of the interaction from the client node. However, pay attention to the node (master or worker) you need to execute the exams and make sure you return back to the base node.
With CKS is important to move the master node for any changes to the cluster kube-apiserver .
SSH to nodes and gaining root access is allowed if needed.

Read carefully the Information provided within the questions with the i mark. They would provide very useful hints in addressing the question and save time. for e.g., namespaces to look into for a failed pod, what has already been created like configmap, secrets, network policies so that you do not create the same.
Make sure you know the imperative commands to create resources, as you won’t have much time to create and edit YAML files.
If you need to edit further use --dry-run=client -o yaml to get a headstart with the YAML spec file and edit the same.

I personally use alias kk=kubectl to avoid typing kubectl

CKS Resources

Go through the CKS Curriculum
Linux Foundation CKS Course and CKS Certification Bundle
KodeKloud – Mumshad Mannambeth Certified Kubernetes Security Specialist (CKS) with Practice Tests
- Excellent course which covers the right topics required for the CKS
- It also provides hands-on labs for each of the topics, giving you actual experience working on the Kubernetes cluster.
- Make sure to practice the labs, as long as you don’t need to refer to the hints and can do most of it without documentation.
Udemy Kubernetes CKS 2021 Complete Course – Theory – Practice

Cover Kubernetes Security Overview
Practice CKS Exercises
Cover Kubernetes tutorials which provide a good hands-on guide
Cover kubectl cheatsheet for commands
Cover Tasks from Kubernetes documentation

CKS Key Topics

Cluster Setup – 10%

Practice CKS Exercises – Cluster Setup
Securing a Cluster covers a lot of these features
Use Network security policies to restrict cluster level access
- Understand Network Policies
- Use Network security policies to restrict cluster level access
- Exam tip: Know how to create Network Policies using proper selectors
Use CIS benchmark to review the security configuration of Kubernetes components (etcd, kubelet, kubedns, kubeapi)
- Center of Internet Security – CIS defines security best practices for Kubernetes and can help evaluate and recommendation for the fixes.
- Aqua Security kube-bench is a free tool that can help evaluate the k8s cluster for CIS rules.
- Exam tip: Know how to read the CIS report, identify failures, map it to the recommendation, and fix the same.

Properly set up Ingress objects with security control
- Ingress endpoint can be configured with TLS endpoint
- Exam tip: Know how to create a TLS secret and associate the same with the Ingress
Protect node metadata and endpoints
- Authentication using Certificates and Service Accounts
- Authorization using Node and RBAC
- Exam tip: Know how to create Service Accounts, Roles, and Cluster Roles and associate them together using Role Binding and Cluster Role Binding.
- Exam tip: Know to create Service Accounts with automount disabled using the automountServiceAccountToken flag.
Minimize use of, and access to, GUI elements
- Kubernetes Dashboard is a GUI component that needs to be secured.
Verify platform binaries before deploying
- Exam tip: Know how to verify platform binaries digest using sha

Cluster Hardening – 15%

Practice CKS Exercises – Cluster Harding
Restrict access to Kubernetes API
- Control anonymous requests to Kube-apiserver
Use Role-Based Access Controls to minimize exposure
- Exam tip: Know how to create Service Accounts, Roles, and Cluster Roles and associate them together using Role Binding and Cluster Role Binding.

Exercise caution in using service accounts e.g. disable defaults, minimize permissions on newly created ones.
- Exam tip: Know how to create Service Accounts, Roles, and Cluster Roles and associate them together using Role Binding and Cluster Role Binding.
- Exam tip: Know automountServiceAccountToken can be used to prevent the service account from being auto-mounted.

Update Kubernetes frequently
- Kubernetes supports N to N-2 versions and it is recommended to upgrade the components
- Exam tip: Know how to upgrade a Kubernetes cluster (although it did not appear on my exam)

System Hardening – 15%

Practice CKS Exercises – System Harding
Minimize host OS footprint (reduce attack surface)
- Control access using SSH, disable root and password-based logins
- Remove unwanted packages and ports
Minimize IAM roles
- IAM roles are usually with Cloud providers and relate to the least privilege access principle.

Minimize external access to the network
- External access can be controlled using Network Policies through egress policies.
Appropriately use kernel hardening tools such as AppArmor, seccomp
- Runtime classes provided by gvisor and kata containers can help provide further isolation of the containers
- Secure Computing – Seccomp tool helps control syscalls made by containers
- AppArmor can be configured for any application to reduce its potential host attack surface and provide a greater in-depth defense.
- PodSecurityPolicies – PSP enables fine-grained authorization of pod creation and updates.
  - DEPRECATED: PSP was deprecated in Kubernetes 1.21 and removed in 1.25. Replaced by Pod Security Standards (PSS) and Pod Security Admission (PSA)
  - Pod Security Standards (PSS) define three policy levels: Privileged, Baseline, and Restricted
  - Pod Security Admission (PSA) is the built-in replacement that enforces PSS policies
  - Alternative: Use Policy-as-Code solutions like Kyverno, OPA Gatekeeper, or Kubewarden
  - Apply host updates
  - Install minimal required OS fingerprint
  - Identify and address open ports
  - Remove unnecessary packages
  - Protect access to data with permissions
  - Restrict allowed hostpaths
- Exam tip: Know how to load AppArmor profiles, and enable them for the pods. AppArmor is in beta and needs to be enabled using container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>

Minimize Microservice Vulnerabilities – 20%

Practice CKS Exercises – Minimize Microservice Vulnerabilities
Setup appropriate OS-level security domains e.g. using PSP, OPA, security contexts.
- Pod Security Contexts help define security for pods and containers at the pod or at the container level. Capabilities can be added at the container level only.
- Pod Security Policies enable fine-grained authorization of pod creation and updates and is implemented as an optional admission controller.
- Open Policy Agent helps enforce custom policies on Kubernetes objects without recompiling or reconfiguring the Kubernetes API server.
- Admission controllers
  - can be used for validating configurations as well as mutating the configurations.
  - Mutating controllers are triggered before validating controllers.
  - Allows extension by adding custom controllers using MutatingAdmissionWebhook and ValidatingAdmissionWebhook.
- Exam tip: Know how to configure Pod Security Context, Pod Security Policies
Manage Kubernetes secrets
- Exam Tip: Know how to read secret values, create secrets and mount the same on the pods.
Use container runtime sandboxes in multi-tenant environments (e.g. gvisor, kata containers)
- Exam tip: Know how to create a Runtime and associate it with a pod using runtimeClassName

Implement pod to pod encryption by use of mTLS
- Practice manage TLS certificates in a Cluster
- Service Mesh Istio can be used to establish MTLS for Intra pod communication.
- Istio automatically configures workload sidecars to use mutual TLS when calling other workloads. By default, Istio configures the destination workloads using PERMISSIVE mode. When PERMISSIVE mode is enabled, a service can accept both plain text and mutual TLS traffic. In order to only allow mutual TLS traffic, the configuration needs to be changed to STRICT mode.
- Exam tip: No questions related to mTLS appeared in the exam

Supply Chain Security – 20%

Practice CKS Exercises – Supply Chain Security
Minimize base image footprint
- Remove unnecessary tools. Remove shells, package manager & vi tools.
- Use slim/minimal images with required packages only. Do not include unnecessary software like build tools and utilities, troubleshooting, and debug binaries.
- Build the smallest image possible – To reduce the size of the image, install only what is strictly needed
- Use distroless, Alpine, or relevant base images for the app.
- Use official images from verified sources only.

Secure your supply chain: whitelist allowed registries, sign and validate images
- Work with images securely using a private repository
- Consider before using public images as you cannot control what’s inside them
- Configure the Kubernetes cluster to pull the images from a private registry instead of an external registry.
- Using ImagePolicyWebhook admission Controller to whitelist allowed image registries to sign and validate images.
- Task @ Pulling Image from Private Registry
Use static analysis of user workloads (e.g.Kubernetes resources, Docker files)
- Tools like Kubesec can be used to perform a static security risk analysis of the configurations files.

Scan images for known vulnerabilities
- Aqua Security Trivy & Anchore can be used for scanning vulnerabilities in the container images.
- Exam Tip: Know how to use the Trivy tool to scan images for vulnerabilities. Also, remember to use the --severity for e.g. --severity=CRITICAL flag for filtering a specific category.

Monitoring, Logging and Runtime Security – 20%

Practice CKS Exercises – Monitoring, Logging, and Runtime Security
Perform behavioral analytics of syscall process and file activities at the host and container level to detect malicious activities
Detect threats within a physical infrastructure, apps, networks, data, users, and workloads
Detect all phases of attack regardless of where it occurs and how it spreads

Perform deep analytical investigation and identification of bad actors within the environment
- Tools like strace and Aqua Security Tracee can be used to check the syscalls. However, with a number of processes, it would be tough to track and monitor all and they do not provide alerting.
- Tools like Falco & Sysdig provide deep, process-level visibility into dynamic, distributed production environments and can be used to define rules to track, monitor, and alert on activities when a certain rule is violated.
- Exam Tip: Know how to use Falco, define new rules, enable logging. Make use of the falco_rules.local.yaml file for overrides. (I did not get questions for Falco in my exam).
Ensure immutability of containers at runtime
- Immutability prevents any changes from being made to the container or to the underlying host through the container.
- It is recommended to create new images and perform a rolling deployment instead of modifying the existing running containers.
- Launch the container in read-only mode using the --read-only flag from the docker run or by using the readOnlyRootFilesystem option in Kubernetes.
- PodSecurityContext and PodSecurityPolicy can be used to define and enforce container immutability
  - ReadOnlyRootFilesystem – Requires that containers must run with a read-only root filesystem (i.e. no writable layer).
  - Privileged – determines if any container in a pod can enable privileged mode. This allows the container nearly all the same access as processes running on the host.
- Task @ Configure Pod Container Security Context
- Exam Tip: Know how to define a PodSecurityPolicy to enforce rules. Remember, Cluster Roles and Role Binding needs to be configured to provide access to the PSP to make it work.

Use Audit Logs to monitor access
- Kubernetes auditing is handled by the kube-apiserver which requires defining an audit policy file.
- Auditing captures the stages as RequestReceived -> (Authn and Authz) -> ResponseStarted (-w) -> ResponseComplete (for success) OR Panic (for failures)
- Exam Tip: Know how to configure audit policies and enable audit on the kube-apiserver. Make sure the kube-apiserver is up and running.
- Task @ Kubernetes Auditing

CKS Articles

Securing a Cluster
11 ways not to get hacked
GKE Best Practices for Building Containers
Security Best Practices (A bit older but still parts are relevant)

CKS General information and practices

The exam can be taken online from anywhere.
Make sure you have prepared your workspace well before the exams.

Make sure you have a valid government-issued ID card as it would be checked.
You are not allowed to have anything around you and no one should enter the room.
The exam proctor will be watching you always, so refrain from doing any other activities. Your screen is also always shared.

Copy + Paste works fine.
You will have an online notepad on the right corner to note down. I hardly used it, but it can be useful to type and modify text instead of using VI editor.

All the Best …

2025 Black Friday & Cyber Monday Deals

November 21, 2023 ~ Last updated on : November 22, 2025 ~ jayendrapatil

Udemy – Black Friday Sale (Upto 85% Off)- till 28th Nov

Braincert – till 27th Nov

Use Coupon Code – BLACK_FRIDAY

AWS Certifications

KodeKloud – Black Friday Sale – till 30th Nov

Coursera – till 28th Nov

Whizlabs – Black Friday Sale – till 27th Nov

Kubernetes Security

November 20, 2023 ~ Last updated on : December 6, 2023 ~ jayendrapatil ~ 2 Comments

Kubernetes Security

Security in general is not something that can be achieved only at the container layer. It’s a continuous process that needs to be adapted on all layers and all the time.

4C’s of Cloud Native security are Cloud, Clusters, Containers, and Code.
Containers are started on a machine and they always share the same kernel, which then becomes a risk for the whole system, if containers are allowed to call kernel functions like for example killing other processes or modifying the host network by creating routing rules.

Authentication

Users

Kubernetes does not support the creation of users
Users can be passed as --basic-auth-file or --token-auth-file to the kube-apiserver using a static user + password (deprecated) or static user + token file.
This approach is deprecated.

X509 Client Certificates

Kubernetes requires PKI certificates for authentication over TLS.
Kubernetes requires PKI for the following operations:
- Client certificates for the kubelet to authenticate to the API server
- Server certificate for the API server endpoint
- Client certificates for administrators of the cluster to authenticate to the API server
- Client certificates for the API server to talk to the kubelet
- Client certificate for the API server to talk to etcd
- Client certificate/kubeconfig for the controller manager to talk to the API server
- Client certificate/kubeconfig for the scheduler to talk to the API server.
  Client and server certificates for the front-proxy
Client certificates can be signed in two ways so that they can be used to authenticate with the Kubernetes API.
1. Internally signing the certificate using the Kubernetes API.
  1. It involves the creation of a certificate signing request (CSR) by a client.
  2. Administrators can approve or deny the CSR.
  3. Once approved, the administrator can extract and provide a signed certificate to the requesting client or user.
  4. This method cannot be scaled for large organizations as it requires manual intervention.
2. Use enterprise PKI, which can sign the client-submitted CSR.
  1. The signing authority can send signed certificates back to clients.
  2. This approach requires the private key to be managed by an external solution.

Refer Authentication Exercises

Service Accounts

Kubernetes service accounts can be used to provide bearer tokens to authenticate with Kubernetes API.
Bearer tokens can be verified using a webhook, which involves API configuration with option --authentication-token-webhook-config-file, which includes the details of the remote webhook service.

Kubernetes internally uses Bootstrap and Node authentication tokens to initialize the cluster.
Each namespace has a default service account created.
Each service account creates a secret object which stores the bearer token.
Existing service account for a pod cannot be modified, the pod needs to be recreated.

The service account can be associated with the pod using the serviceAccountName field in the pod specification and the service account secret is auto-mounted on the pod.
automountServiceAccountToken flag can be used to prevent the service account from being auto-mounted.

Practice Service Account Exercises

Authorization

Node

Node authorization is used by Kubernetes internally and enables read, write, and auth-related operations by kubelet.

In order to successfully make a request, kubelet must use a credential that identifies it as being in the system:nodes group.
Node authorization can be enabled using the --authorization-mode=Node option in Kubernetes API Server configurations.

ABAC

Kubernetes defines attribute-based access control (ABAC) as “an access control paradigm whereby access rights are granted to users through the use of policies which combine attributes together.”

ABAC can be enabled by providing a .json file to --authorization-policy-file and --authorization-mode=ABAC options in Kubernetes API Server configurations.
The .json file needs to be present before Kubernetes API can be invoked.
Any changes in the ABAC policy file require a Kube API Server restart and hence the ABAC approach is not preferred.

AlwaysDeny/AlwaysAllow

AlwaysDeny or AlwaysAllow authorization mode is usually used in development environments where all requests to the Kubernetes API need to be allowed or denied.
AlwaysDeny or AlwaysAllow mode can be enabled using the option --authorization-mode=AlwaysDeny/AlwaysAllow while configuring Kubernetes API.
This mode is considered insecure and hence is not recommended in production environments.

RBAC

Role-based access control is the most secure and recommended authorization mechanism in Kubernetes.
It is an approach to restrict system access based on the roles of
users within the cluster.
It allows organizations to enforce the principle of least privileges.

Kubernetes RBAC follows a declarative nature with clear permissions (operations), API objects (resources), and subjects (users, groups, or service accounts) declared in authorization requests.
RBAC authorization can be enabled using the --authorization-mode=RBAC option in Kubernetes API Server configurations.
RBAC can be configured using
- Role or ClusterRole – is made up of verbs, resources, and subjects, which provide a capability (verb) on a resource
- RoleBinding or ClusterRoleBinding – helps assign privileges to the user, group, or service account.
Role vs ClusterRole AND RoleBinding vs ClusterRoleBinding
- ClusterRole is a global object whereas Role is a namespace object.
- Roles and RoleBindings are the only namespaced resources.
- ClusterRoleBindings (global resource) cannot be used with Roles, which is a namespaced resource.
- RoleBindings (namespaced resource) cannot be used with ClusterRoles, which are global resources.
- Only ClusterRoles can be aggregated.

RBAC Role Binding

Practice RBAC Exercises

Admission Controllers

Admission Controller is an interceptor to the Kubernetes API server requests prior to persistence of the object, but after the request is authenticated and authorized.
Admission controllers limit requests to create, delete, modify or connect to (proxy). They do not support read requests.
Admission controllers may be “validating”, “mutating”, or both.
Mutating controllers may modify the objects they admit; validating controllers may not.

Mutating controllers are executed before the validating controllers.
If any of the controllers in either phase reject the request, the entire request is rejected immediately and an error is returned to the end-user.
Admission Controllers provide fine-grained control over what can be performed on the cluster, that cannot be handled using Authentication or Authorization.

Kubernetes Admission Controllers

Admission controllers can only be enabled and configured by the cluster administrator using the --enable-admission-plugins and --admission-control-config-file flags.

Few of the admission controllers are as below
- PodSecurityPolicy acts on the creation and modification of the pod and determines if it should be admitted based on the requested security context and the available Pod Security Policies.
- ImagePolicyWebhook to decide if an image should be admitted.
- MutatingAdmissionWebhook to modify a request.
- ValidatingAdmissionWebhook to decide whether the request should be allowed to run at all.

Practice Admission Controller Exercises

Pod Security Policies

Pod Security Policies enable fine-grained authorization of pod creation and updates and is implemented as an optional admission controller.
A Pod Security Policy is a cluster-level resource that controls security-sensitive aspects of the pod specification.
PodSecurityPolicy is disabled, by default. Once enabled using --enable-admission-plugins, it applies itself to all the pod creation requests.

PodSecurityPolicies enforced without authorizing any policies will prevent any pods from being created in the cluster. The requesting user or target pod’s service account must be authorized to use the policy, by allowing the use verb on the policy.
PodSecurityPolicy acts both as validating and mutating admission controller. PodSecurityPolicy objects define a set of conditions that a pod must run with in order to be accepted into the system, as well as defaults for the related fields.

Practice Pod Security Policies Exercises

Pod Security Context

Security Context helps define privileges and access control settings for a Pod or Container that includes
- Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID)
- Security-Enhanced Linux (SELinux): Objects are assigned security labels.
- Running as privileged or unprivileged.
- Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
- AppArmor: Use program profiles to restrict the capabilities of individual programs.
- Seccomp: Filter a process’s system calls.
- AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged OR 2) has CAP_SYS_ADMIN.
- readOnlyRootFilesystem: Mounts the container’s root filesystem as read-only.
PodSecurityContext holds pod-level security attributes and common container settings.
Fields present in container.securityContext over the field values of PodSecurityContext.

# Pod Security Context example
apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext: # Pod level Security Context, can also be defined at container level
    runAsUser: 1000 # run as uid
    runAsGroup: 3000 # run as gid
    fsGroup: 2000
    runAsNonRoot: true # Prevents running a container with 'root' user as part of the pod
    readOnlyRootFilesystem: Controls whether a container will be able to write into the root filesystem.
    seccompProfile: # secure computing i.e. seccomp profile
      type: RuntimeDefault
    seLinuxOptions: # se linux options
      level: "s0:c123,c456"
  containers:
  - name: sec-ctx-demo
    image: gcr.io/google-samples/node-hello:1.0
    securityContext: # Container level security context overrides Pod level settings
      runAsUser: 2000
      allowPrivilegeEscalation: false # allow running as privileged user
      capabilities: # controls the Linux capabilities assigned to the container
        add: ["NET_ADMIN", "SYS_TIME"]

# Pod Security Context example

apiVersion: v1

kind: Pod

metadata:

spec:

securityContext: # Pod level Security Context, can also be defined at container level

runAsUser: 1000 # run as uid

runAsGroup: 3000 # run as gid

fsGroup: 2000

runAsNonRoot: true # Prevents running a container with 'root' user as part of the pod

readOnlyRootFilesystem: Controls whether a container will be able to write into the root filesystem.

seccompProfile: # secure computing i.e. seccomp profile

type: RuntimeDefault

seLinuxOptions: # se linux options

level: "s0:c123,c456"

containers:

- name: sec-ctx-demo

image: gcr.io/google-samples/node-hello:1.0

securityContext: # Container level security context overrides Pod level settings

runAsUser: 2000

allowPrivilegeEscalation: false # allow running as privileged user

capabilities: # controls the Linux capabilities assigned to the container

add: ["NET_ADMIN", "SYS_TIME"]

Practice Pod Security Context Exercises

MTLS or Two Way Authentication

Service Mesh like Istio and Linkerd can help implement MTLS for intra-cluster pod-to-pod communication.

Istio deploys a side-car container that handles the encryption and decryption transparently.
Istio supports both permissive and strict modes

Network Policies

By default, pods are non-isolated; they accept traffic from any source.
NetworkPolicies help specify how a pod is allowed to communicate with various network “entities” over the network.

NetworkPolicies can be used to control traffic to/from Pods, Namespaces or specific IP addresses
Pod- or namespace-based NetworkPolicy uses a selector to specify what traffic is allowed to and from the Pod(s) that match the selector.

# Kubernetes Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy # defines the Network Policy
metadata:
  name: test-network-policy
  namespace: default
spec:
  podSelector: # selects the pod - leave it empty {} to apply to all the pods 
    matchLabels: # match the pods based on the labels
      role: db
  policyTypes:
  - Ingress # Enables Ingress rules
  - Egress #Enables Egress rules
  ingress: # Ingress rules incoming to the target
  - from:
    - ipBlock: # access limited through IPs
        cidr: 172.17.0.0/16
        except:
        - 172.17.1.0/24
    - namespaceSelector: # access limited through Namespace labels
        matchLabels:
          project: myproject
    - podSelector: # access limited through pods with matching labels
        matchLabels:
          role: frontend
    ports: # ingress rules for the ports - if not specified its opens for all ports
    - protocol: TCP
      port: 6379
  egress: # egress rules outgoing from the target
  - to:
    - ipBlock: # access limited through IPs
        cidr: 10.0.0.0/24
    ports: # ingress rules for the ports - if not specified its opens for all ports
    - protocol: TCP
      port: 5978

# Kubernetes Network Policy

apiVersion: networking.k8s.io/v1

kind: NetworkPolicy # defines the Network Policy

metadata:

namespace: default

spec:

podSelector: # selects the pod - leave it empty {} to apply to all the pods

matchLabels: # match the pods based on the labels

role: db

policyTypes:

- Ingress # Enables Ingress rules

- Egress #Enables Egress rules

ingress: # Ingress rules incoming to the target

- from:

- ipBlock: # access limited through IPs

cidr: 172.17.0.0/16

except:

- 172.17.1.0/24

- namespaceSelector: # access limited through Namespace labels

matchLabels:

project: myproject

- podSelector: # access limited through pods with matching labels

matchLabels:

role: frontend

ports: # ingress rules for the ports - if not specified its opens for all ports

- protocol: TCP

port: 6379

egress: # egress rules outgoing from the target

- to:

- ipBlock: # access limited through IPs

cidr: 10.0.0.0/24

ports: # ingress rules for the ports - if not specified its opens for all ports

- protocol: TCP

port: 5978

Practice Network Policies Exercises

Kubernetes Auditing

Kubernetes auditing provides a security-relevant, chronological set of records documenting the sequence of actions in a cluster for activities generated by users, by applications that use the Kubernetes API, and by the control plane itself.

Audit records begin their lifecycle inside the kube-apiserver component.
Each request on each stage of its execution generates an audit event, which is then pre-processed according to a certain policy and written to a backend.
Audit policy determines what’s recorded and the backends persist the records.
Backend implementations include logs files and webhooks.
Each request can be recorded with an associated stage as below
- RequestReceived – generated as soon as the audit handler receives the request, and before it is delegated down the handler chain.
- ResponseStarted – generated once the response headers are sent, but before the response body is sent. This stage is only generated for long-running requests (e.g. watch).
- ResponseComplete – generated once the response body has been completed and no more bytes will be sent.
- Panic – generated when a panic or a failure occurs.

Kubernetes Audit Policy

# kubernetes audit policy
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy # Policy object
omitStages: # audit events to be omitted or ignored
  - "RequestReceived" # Options RequestReceived, ResponseStarted, ResponseComplete, Panic
rules:
  - level: RequestResponse # Log pod level changes, Options RequestResponse, Request, Metadata, None
    namespace: ["prod"] # limit to namespace - optional
    resources: # resources array which is consistent with the RBAC policy.
    - group: ""
      resources: ["pods"]

# kubernetes audit policy

apiVersion: audit.k8s.io/v1 # This is required.

kind: Policy # Policy object

omitStages: # audit events to be omitted or ignored

- "RequestReceived" # Options RequestReceived, ResponseStarted, ResponseComplete, Panic

rules:

- level: RequestResponse # Log pod level changes, Options RequestResponse, Request, Metadata, None

namespace: ["prod"] # limit to namespace - optional

resources: # resources array which is consistent with the RBAC policy.

- group: ""

resources: ["pods"]

Kubernetes kube-apiserver.yaml file with audit configuration

# kubernetes audit configuration
--audit-policy-file=/etc/kubernetes/audit-policy.yaml # audit policy file
--audit-log-path=/var/log/audit.log # specifies the log file path that log backend uses to write audit events.
--audit-log-maxage=1 # defined the maximum number of days to retain old audit log files
--audit-log-maxbackup=1 #defines the maximum number of audit log files to retain
--audit-log-maxsize=1 # defines the maximum size in megabytes of the audit log file before it gets rotated

# kubernetes audit configuration

--audit-policy-file=/etc/kubernetes/audit-policy.yaml # audit policy file

--audit-log-path=/var/log/audit.log # specifies the log file path that log backend uses to write audit events.

--audit-log-maxage=1 # defined the maximum number of days to retain old audit log files

--audit-log-maxbackup=1 #defines the maximum number of audit log files to retain

--audit-log-maxsize=1 # defines the maximum size in megabytes of the audit log file before it gets rotated

Practice Kubernetes Auditing Exercises

Seccomp – Secure Computing

Seccomp stands for secure computing mode and has been a feature of the Linux kernel since version 2.6.12.
Seccomp can be used to sandbox the privileges of a process, restricting the calls it is able to make from user space into the kernel.

Kubernetes lets you automatically apply seccomp profiles loaded onto a Node to the Pods and containers.

Seccomp profile

# fine grained Seccomp profile
{
    "defaultAction": "SCMP_ACT_ERRNO", # default deny
    "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
    ],
    "syscalls": [
        {
            "names": [
                "accept4",
                "epoll_wait",
                "pselect6",
                ....
            ],
            "action": "SCMP_ACT_ALLOW" # explicitly whitelist calls
        }
    ]
}

# fine grained Seccomp profile

{

"defaultAction": "SCMP_ACT_ERRNO", # default deny

"architectures": [

"SCMP_ARCH_X86_64",

"SCMP_ARCH_X86",

"SCMP_ARCH_X32"

"syscalls": [

{

"names": [

"accept4",

"epoll_wait",

"pselect6",

....

"action": "SCMP_ACT_ALLOW" # explicitly whitelist calls

}

]

}

Seccomp profile attached to the pod

# Seccomp profile attached to pod
apiVersion: v1
kind: Pod
metadata:
  name: audit-pod
  labels:
    app: audit-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/audit.json
  containers:
  - name: nginx
    image: nginx

# Seccomp profile attached to pod

apiVersion: v1

kind: Pod

metadata:

labels:

app: audit-pod

spec:

securityContext:

seccompProfile:

type: Localhost

localhostProfile: profiles/audit.json

containers:

- name: nginx

image: nginx

Practice Seccomp Exercises

AppArmor

AppArmor is a Linux kernel security module that supplements the standard Linux user and group-based permissions to confine programs to a limited set of resources.
AppArmor can be configured for any application to reduce its potential attack surface and provide a greater in-depth defense.

AppArmor is configured through profiles tuned to allow the access needed by a specific program or container, such as Linux capabilities, network access, file permissions, etc.
Each profile can be run in either enforcing mode, which blocks access to disallowed resources or complain mode, which only reports violations.
AppArmor helps to run a more secure deployment by restricting what containers are allowed to do, and/or providing better auditing through system logs.

Use aa-status to check AppArmor status and profiles are loaded
Use apparmor_parser -q <<profile file>> to load profiles
AppArmor is in beta and needs annotations to enable it using container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>

AppArmor profile

# sample AppArmor profile
profile k8s-apparmor-example-deny-write flags=(attach_disconnected) {
  file, # all access to files
  deny /** w, # Deny all file writes.
}

# sample AppArmor profile

profile k8s-apparmor-example-deny-write flags=(attach_disconnected) {

file, # all access to files

deny /** w, # Deny all file writes.

}

AppArmor usage

# AppArmor usage
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  annotations: # define apparmor security 
    container.apparmor.security.beta.kubernetes.io/nginx: localhost/k8s-apparmor-example-deny-write
spec:
  containers:
  - name: nginx
    image: nginx

# AppArmor usage

apiVersion: v1

kind: Pod

metadata:

annotations: # define apparmor security

container.apparmor.security.beta.kubernetes.io/nginx: localhost/k8s-apparmor-example-deny-write

spec:

containers:

- name: nginx

image: nginx

Practice App Armor Exercises

Kubesec

Kubesec can be used to perform a static security risk analysis of the configurations files.

Sample configuration file

# pod with privileged container
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
    securityContext:
      privileged: true # security issue
      readOnlyRootFilesystem: false # security issue

# pod with privileged container

apiVersion: v1

kind: Pod

metadata:

spec:

containers:

- image: nginx

securityContext:

privileged: true # security issue

readOnlyRootFilesystem: false # security issue

Kubesec Report

# Kubesec Report
[
  {
    "object": "Pod/nginx.default",
    "valid": true,
    "fileName": "kubesec-test.yaml",
    "message": "Failed with a score of -30 points",
    "score": -30,
    "scoring": {
      "critical": [
        {
          "id": "Privileged",
          "selector": "containers[] .securityContext .privileged == true",
          "reason": "Privileged containers can allow almost completely unrestricted host access",
          "points": -30
        }
      ],
      "advise": [
        ...
        {
          "id": "ReadOnlyRootFilesystem",
          "selector": "containers[] .securityContext .readOnlyRootFilesystem == true",
          "reason": "An immutable root filesystem can prevent malicious binaries being added to PATH and increase attack cost",
          "points": 1
        },
        ...
      ]
    }
  }
]

# Kubesec Report

[

{

"object": "Pod/nginx.default",

"valid": true,

"fileName": "kubesec-test.yaml",

"message": "Failed with a score of -30 points",

"score": -30,

"scoring": {

"critical": [

{

"id": "Privileged",

"selector": "containers[] .securityContext .privileged == true",

"reason": "Privileged containers can allow almost completely unrestricted host access",

"points": -30

}

"advise": [

...

{

"id": "ReadOnlyRootFilesystem",

"selector": "containers[] .securityContext .readOnlyRootFilesystem == true",

"reason": "An immutable root filesystem can prevent malicious binaries being added to PATH and increase attack cost",

"points": 1

...

]

}

]

Practice Kubesec Exercises

Trivy (or Clair or Anchore)

Trivy is a simple and comprehensive scanner for vulnerabilities in container images, file systems, and Git repositories, as well as for configuration issues.
Trivy detects vulnerabilities of OS packages (Alpine, RHEL, CentOS, etc.) and language-specific packages (Bundler, Composer, npm, yarn, etc.).
Trivy scans Infrastructure as Code (IaC) files such as Terraform, Dockerfile, and Kubernetes, to detect potential configuration issues that expose your deployments to the risk of attack.

Use trivy image <<image_name>> to scan images
Use --severity flag to filter the vulnerabilities as per the category.

Practice Trivy Exercises

Falco

Falco Architecture

Falco can be installed as a package on the nodes OR as Daemonsets on the Kubernetes cluster
Falco is driven through configuration (defaults to /etc/falco/falco.yaml ) files which includes
1. Rules
  1. Name and description
  2. Condition to trigger the rule
  3. Priority emergency, alert, critical, error, warning, notice, info, debug
  4. Output data for the event
  5. Multiple rule files can be specified, with the last one taking the priority in case of the same rule defined in multiple files
2. Log attributes for Falco i.e. level, format
3. Output file and format i.e JSON or text
4. Alerts output destination which includes stdout, file, HTTP, etc.

Practice Falco Exercises

Reduce Attack Surface

Follow the principle of least privilege and limit access
Limit Node access,
- keep nodes private
- disable login using the root account PermitRootLogin No and use privilege escalation using sudo .
- disable password-based authentication PasswordAuthentication No and use SSH keys.
Remove any unwanted packages
Block or close unwanted ports
Keep the base image light and limited to the bare minimum required
Identify and fix any open ports

Certified Kubernetes Administrator CKA Learning Path

November 12, 2023 ~ Last updated on : February 20, 2026 ~ jayendrapatil ~ 9 Comments

Certified Kubernetes Administrator CKA Learning Path

Recertified Certified Kubernetes Administrator CKA certification recently with 91%. After knowing how to use Kubernetes, it was really interesting and intriguing to know Kubernetes internals and how the overall system works.

CKA is more of an open-book test, where you have access to the official Kubernetes documentation exam, but it focuses more on hands-on experience.
CKA focuses on “The skills required to be a successful Kubernetes Administrator “. It tests the candidate’s ability to do basic installation as well as configuring and managing production-grade Kubernetes clusters.

Unlike AWS and GCP certifications, you would be required to solve, debug actual problems, and provision resources on a live Kubernetes cluster.
Even though it is an open book test, you need to know where the information is.
Trust me, if you are not prepared this time is not going to be sufficient.

CKA Exam Pattern

IMPORTANT: CKA Exam Updated February 10, 2025 – New topics: Gateway API, Helm, Kustomize, CRDs & Operators
Certification Validity Changed: From April 1, 2024, all Linux Foundation certifications valid for 24 months (previously 36 months)
CKA exam curriculum includes these general domains and their weights on the exam:
- Cluster Architecture, Installation & Configuration – 25%
- Workloads & Scheduling – 15%
- Services & Networking – 20%
- Storage – 10%
- Troubleshooting – 30%

~~CKA requires you to solve 24 questions in 3 hours.~~
CKA exam has been upgraded and requires you to solve 15-20 questions in 2 hours. I got 17 questions.
CKA was already upgraded to use the k8s 1.31 (as of November 2024) version. But it keeps on being upgraded with new Kubernetes versions.

You are allowed to open another browser tab which can be from kubernetes.io or other product documentation like Falco. Do not open any other windows.
Exam questions can be attempted in any order and don’t have to be sequential. So be sure to move ahead and come back later.

CKA Exam Preparation and Tips

I used the courses from KodeKloud CKA for practicing and it would be good enough to cover what is required for the exam.

Prepare yourself with the imperative commands as much as you can. This will help cut down the time required to solve half of the questions. I was not stretched for time for CKA and had much time to review.
Each exam question carries weight so be sure you attempt the exams with higher weights before focusing on the lower ones. So target the ones with higher weights and quicker solutions like debugging ones.
CKA exam provides 6-8 different preconfigured K8s clusters. Each question refers to a different Kubernetes cluster, and the context needs to be switched. Be sure to execute the kubectl use context command, which is available with every question and you just need to copy-paste it.

Check for the namespace mentioned in the question, to find resources and create resources. Use the -n <namespace>
You would be performing most of the interaction from the client node. However, pay attention to the node (master or worker) you need to execute the exams and make sure you return back to the base node.
With CKA is important to move the master node for any changes to the cluster kube-apiserver .

SSH to nodes and gaining root access is allowed if needed.
Read carefully the Information provided within the questions with the i mark. They would provide very useful hints in addressing the question and save time. for e.g., namespaces to look into for a failed pod, what has already been created like configmap, secrets, network policies so that you do not create the same.
Make sure you know the imperative commands to create resources, as you won’t have much time to create and edit YAML files.

If you need to edit further use --dry-run=client -o yaml to get a headstart with the YAML spec file and edit the same.
I personally use alias kk=kubectl to avoid typing kubectl

CKA Learning Path

Go through the CKA Curriculum
Mumshad Mannambeth Kodekloud course
- Excellent course which covers the right topics required for the CKA
- It also provides hands-on labs for each of the topics, giving you actual experience working on the Kubernetes cluster.
- Make sure to practice the labs, as long as you don’t need to refer to the hints and can do most of it without documentation.

Udemy Certified Kubernetes Administrator by Zeal Vora. It does offer practical hands-on though.
Practice CKA Exercises
Cover Kubernetes tutorials which provide a good hands-on guide
Cover kubectl cheatsheet for commands
Cover Tasks from Kubernetes documentation

CKA Key Topics

Cluster Architecture, Installation & Configuration – 25%

Practice CKA Exercises – Cluster Architecture, Installation & Configuration
Manage role based access control (RBAC)
- Authorization using Node and RBAC
Use Kubeadm to install a basic cluster
- Practice creating Kubernetes Cluster using Kubeadm

Manage a highly-available Kubernetes cluster
- Configure a highly-available Kubernetes cluster
Provision underlying infrastructure to deploy a Kubernetes cluster
Perform a version upgrade on a Kubernetes cluster using Kubeadm
- Practice Upgrading kubeadm clusters
Implement etcd backup and restore
- Make sure you read ETCD backup and practice using documentation

Workloads & Scheduling – 15%

Practice CKA Exercises – Workloads & Scheduling
Understand deployments and how to perform rolling update and rollbacks
- Understand deployments and how to perform rolling update and rollbacks. Practice kubectl rollout commands to check status and undo deployments.

Use ConfigMaps and Secrets to configure applications
- ConfigMaps are used to store non-confidential data in key-value pairs.
- Task Create a ConfigMap and mount it as a volume.
- Know how to Manage Kubernetes secrets
- Task Create Secrets and refer to them in a Pod.
- Exam Tip: Know how to read secret values, create secrets, and mount the same on the pods.
- Exam Tip: Know how to create ConfigMaps and mount the same on the pods.

Know how to scale applications
- Understand Scaling an Application using Deployment
Understand the primitives used to create robust, self-healing, application deployments
- Know how to scale and create self-healing applications using replicas

Understand how resource limits can affect Pod scheduling
- Know how to assign consumed CPU and Memory resources
- Exam Tip: Know how to configure pods with requests and limits.
Awareness of manifest management and common templating tools

Services & Networking – 20%

Practice CKA Exercises – Services & Networking
Understand host networking configuration on the cluster nodes
Understand connectivity between Pods
- Understand Cluster Networking
Understand ClusterIP, NodePort, LoadBalancer service types and endpoints
- Understand Service Networking and practice how to expose pod and. deployments as service.
Know how to use Ingress controllers and Ingress resources
- Know Ingress and how to use Ingress rules
Know how to configure and use CoreDNS
- Practice DNS for Services and Pods using nslookup
- Understand CoreDNS for Service Discovery

Choose an appropriate container network interface plugin
- Know Network Plugins

Storage – 10%

Practice CKA Exercises – Storage
Understand storage classes, persistent volumes
- Understand and focus on creating Persistent Volumes,
Understand volume mode, access modes, and reclaim policies for volumes
- Understand volume mode, access modes, and reclaim policies

Understand persistent volume claims primitive
- Understand Persistent Volume Claims and associate them with Pods
Know how to configure applications with persistent storage
- Practice Configure a Pod to Use a Volume for Storage – focus on using Empty Dir as the volume, so the storage is ephemeral to pod.
- Practice Configure Pod Container Persistent Volume Storage – focus on creating Pods with host path volumes

Troubleshooting – 30%

Practice CKA Exercises – Troubleshooting
Evaluate cluster and node logging
- Refer Cluster logging
Understand how to monitor applications
- Know resource usage monitoring as you would be needed to check resource usage using the kubectl top command

Manage container stdout & stderr logs
- Know how to Debug running pods using the kubectl logs command
Troubleshoot application failure
- Practice Debug application for troubleshooting application failures

Troubleshoot cluster component failure
- Practice Debug cluster for troubleshooting control plane failure and worker node failure.
  - Understand the control plane architecture.
  - Focus on kube-apiserver, static pod config which causes the control panel pods to be referred and deployed.
  - Check pods in kube-system if they are all running. Use docker ps -a command on the node to inspect the reason for exiting containers.
  - Check kubelet service if the worker node is shown not ready
Troubleshoot networking

Scheduling

Understand label selectors to schedule Pods on nodes using nodeSelector and Practice Assign Pod Nodes

Understand DaemonSets and how to provision. Remember there is no imperative way to create DaemonSet, so either create a deployment and filter of copy from the documentation.
Understand how resource limits can affect Pod scheduling
Understand how to run multiple schedulers and how to configure Pods to use them
Practice how to Create Static Pods esp. on worker nodes. Static pods can be configured using yaml files located in staticPodPath referred by the kube-apiserver. Make sure the property is defined.

Security

Know how to configure authentication and authorization using CertificateSigningRequest and RBAC authorization
Know how to configure network policies
Practice manage TLS certificates in a Cluster
Work with images securely using private repository
Define security contexts
Secure persistent key value store using Secrets. Practice passing Secrets to Pods using Volumes and Environment variables.

CKA General information and practices

The exam can be taken online from anywhere.
Make sure you have prepared your workspace well before the exams.

Make sure you have a valid government-issued ID card as it would be checked.
You are not allowed to have anything around you and no one should enter the room.
The exam proctor will be watching you always, so refrain from doing any other activities. Your screen is also always shared.

Copy + Paste works fine.
You will have an online notepad on the right corner to note down. I hardly used it, but it can be useful to type and modify text instead of using VI editor.

All the Best …

Certified Kubernetes Application Developer CKAD Learning Path

October 21, 2023 ~ Last updated on : February 20, 2026 ~ jayendrapatil ~ 38 Comments

Certified Kubernetes Application Developer CKAD Learning Path

After working on Kubernetes for quite some time, it was time to recertify my Certified Kubernetes Application Developer, and am glad to have cleared it with a score of 89 with minimal preparation.

CKAD is more of an open-book test, where you have access to the official Kubernetes documentation exam, but it focuses more on hands-on experience.
CKAD focuses on “Using a Kubernetes cluster once already provisioned“. It tests the candidate’s ability to design, build, configure, and expose cloud native applications for Kubernetes.

Unlike AWS and GCP certifications, you would be required to solve, debug actual problems, and provision resources on a live Kubernetes cluster.
Even though it is an open book test, you need to know where the information is.
Trust me, if you are not prepared this time is not going to be sufficient.

CKAD Exam Pattern

Certification Validity Changed: From April 1, 2024, all Linux Foundation certifications valid for 24 months (previously 36 months)
CKAD exam curriculum includes these general domains and their weights on the exam:
- Application Design and Build – 20%
- Application Environment, Configuration and Security – 25%
- Application Deployment – 20%
- Services & Networking – 20%
- Application observability and maintenance – 15%
CKAD requires you to solve 16 questions in 2 hours.

CKAD was already upgraded to use the k8s 1.31 (as of November 2024) version. But it keeps on being upgraded with new Kubernetes versions.
You are allowed to open another browser tab that can be from kubernetes.io or other product documentation like Falco. Do not open any other windows.
Exam questions can be attempted in any order and don’t have to be sequential. So be sure to flag them and move ahead and come back later.

CKAD Exam Preparation and Tips

I used the courses from KodeKloud CKAD for practicing and it would be good enough to cover what is required for the exam.
Prepare yourself with the imperative commands as much as you can. This will help cut down the time required to solve half of the questions. I was not stretched for time for CKAD and had much time to review.
Each exam question carries weight so be sure you attempt the exams with higher weights before focusing on the lower ones. So target the ones with higher weights and quicker solutions like debugging ones.

CKAD exam provides 6-8 different preconfigured K8s clusters. Each question refers to a different Kubernetes cluster, and the context needs to be switched. Be sure to execute the kubectl use context command, which is available with every question and you just need to copy-paste it.
Check for the namespace mentioned in the question, to find resources and create resources. Use the -n <namespace>
You would be performing most of the interaction from the client node. However, pay attention to the node (master or worker) you need to execute the exams and make sure you return back to the base node.

SSH to nodes and gaining root access is allowed if needed.
Read carefully the Information provided within the questions with the i mark. They would provide very useful hints in addressing the question and save time. for e.g. namespaces to look into. for a failed pod, what has already been created like configmap, secrets, network policies so that you do not create the same.
Make sure you know the imperative commands to create resources, as you won’t have much time to create and edit YAML files.

If you need to edit further use --dry-run=client -o yaml to get a headstart with the YAML spec file and edit the same.
I personally use alias kk=kubectl to avoid typing kubectl

CKAD Resources

Go through the CKAD Curriculum. Check for the latest one.

Mumshad Mannambeth KodeKloud
- Excellent course which covers the right topics required for the CKAD exam
- It also provides hands-on labs for each of the topics, giving you actual experience working on the Kubernetes cluster

Udemy Certified Kubernetes Application Developer by Zeal Vora. It does offer practical hands-on though.
Practice CKAD Exercises
Cover kubectl Cheatsheet for commands
Cover Tasks from Kubernetes documentation

CKAD Key Topics

Application Design and Build – 20%

Practice CKAD Exercises – Application Design and Build
Define, build and modify container images
Understand Jobs and CronJobs
- Know how to Create Cron Jobs with recurring frequency and set a time limit for completion.

Understand multi-container Pod design patterns (e.g. sidecar, init, and others)
- Understand Init Containers and usage
- Know how to Create a multi-container pod
Utilize persistent and ephemeral volumes
- Practice Configure Pod Container Persistent Volume Storage

Application Environment, Configuration and Security – 25%

Practice CKAD Exercises – Application Environment, Configuration and Security
Discover and use resources that extend Kubernetes (CRD)
- Understand Custom Resources

Understand authentication, authorization, and admission control
- Authentication using Certificates and Service Accounts
- Authorization using Node and RBAC
- Admission controllers
  1. can be used for validating configurations as well as mutating the configurations.
  2. Mutating controllers are triggered before validating controllers.
  3. Allows extension by adding custom controllers using MutatingAdmissionWebhook and ValidatingAdmissionWebhook.
Understanding and defining resource requirements, limits, and quotas
- Know how to assign consumed CPU and Memory resources
- Exam Tip: Know how to configure pods with requests and limits.

Understand ConfigMaps
- ConfigMaps are used to store non-confidential data in key-value pairs.
- Task Create a ConfigMap and mount it as a volume.
- Exam Tip: Know how to create ConfigMaps and mount the same on the pods.
Create & consume Secrets
- Know how to Manage Kubernetes secrets
- Task Create Secrets and refer to them in a Pod.
- Exam Tip: Know how to read secret values, create secrets and mount the same on the pods.
Understand ServiceAccounts
- Understand Service Accounts & Managing Service Accounts
- Task Create a Service Account and configure a pod to run with it.
- Exam tip: Know how to create Service Accounts, Roles, and Cluster Roles and associate them together using Role Binding and Cluster Role Binding.
Understand SecurityContexts
- Pod Security Contexts help define security for pods and containers at the pod or at the container level. Capabilities can be added at the container level only.
- Task Configure pod container security context
- Exam tip: Know how to run containers using different users and groups and add capabilities to the containers.

Application Deployment – 20%

Practice CKAD Exercises – Application Deployment
Use Kubernetes primitives to implement common deployment strategies (e.g. blue/green or canary)
- Kubernetes supports only Recreate and Rolling deployments within the same cluster.
- A service mesh like Istio can be used for traffic management and canary deployments.
Understand Deployments and how to perform rolling updates
- Understand Deployments & Create deployments, update deployments, and rollback deployments.
- Task Stateless Application Deployment
- Exam tip: Know how to create and update pods and deployments
- NOTE – Unlike Pods, Deployments can be edited inline and the Pods are recreated.
Use the Helm package manager to deploy existing packages

Services & Networking – 20%

Practice CKAD Exercise – Application Deployment
Demonstrate basic understanding of NetworkPolicies
- Understand Network Policies & Use Network security policies to restrict cluster level access
- Task Declare Network Policy.
- Exam tip: Know how to create Network Policies using proper selectors
Provide and troubleshoot access to applications via services
- Understand Services & Connecting applications with Services
- Task Access Application Cluster using Service
- Exam tip: Know how to expose a port for a pod, expose a service for a deployment
Use Ingress rules to expose applications
- Understand Ingress
- Task Access Application Cluster using Ingress

Application observability and maintenance – 15%

Practice CKAD Exercise – Application Observability and Maintenance
Understand API deprecations
- Know Kubernetes Deprecation Policy
Implement probes and health checks
- Know the difference between Liveness and Readiness. Readiness provides an indication of when the pod is ready and liveness provides an indication if the pod is healthy.
- Task Configure liveness readiness startup probes
Use provided tools to monitor Kubernetes applications
Utilize container logs
- Know Kubernetes Logging Architecture
- Know how to check logs for Pods. kubectl logs pod_name
Debugging in Kubernetes

CKAD General information and practices

The exam can be taken online from anywhere.
Make sure you have prepared your workspace well before the exams.
Make sure you have a valid government-issued ID card as it would be checked.
You are not allowed to have anything around you and no one should enter the room.

The exam proctor will always watch you, so refrain from doing other activities. Your screen is also always shared.
Copy + Paste works fine.
You will have an online notepad on the right corner to note down. I hardly used it, but it can be useful to type and modify text instead of using the VI editor if you are not comfortable with it.

All the Best …

AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

September 19, 2023 ~ Last updated on : May 13, 2024 ~ jayendrapatil ~ 19 Comments

AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

I recently revalidated my AWS Certified Database – Specialty (DBS-C01) certification just before it expired. The format and domains are pretty much the same as the previous exam, however, it has been enhanced to cover a lot of new services.

AWS Certified Database – Specialty (DBS-C01) Exam Content

AWS Certified Database – Specialty (DBS-C01) exam validates your understanding of databases, including the concepts of design, migration, deployment, access, maintenance, automation, monitoring, security, and troubleshooting, and covers the following tasks:

Understand and differentiate the key features of AWS database services.

Analyze needs and requirements to design and recommend appropriate database solutions using AWS services

Refer to AWS Database – Specialty Exam Guide

AWS Certified Database – Specialty (DBS-C01) Exam Summary

Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
DBS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question.

DBS-C01 exam includes two types of questions, multiple-choice and multiple-response.
DBS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
Specialty exams currently cost $ 300 + tax.
You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.

As always, mark the questions for review, move on, and come back to them after you are done with all.
As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.

Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Database – Specialty (DBS-C01) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified Database Specialty Exam
- Whizlabs – AWS Certified Database Specialty Course
Practice tests
- Braincert – AWS Certified Database – Specialty (DBS-C01) Practice Exams
- Stephane Maarek – AWS Database – Specialty Practice Tests
- Whizlabs – AWS Certified Database Specialty Practice Tests

AWS Certified Database – Specialty (DBS-C01) Exam Summary

AWS Certified Database – Specialty exam focuses completely on AWS Data services from relational, non-relational, graph, caching, and data warehousing. It also covers deployments, automation, migration, security, monitoring, and troubleshooting aspects of them.

Database Services

Make sure you know and cover all the services in-depth, as 80% of the exam is focused on topics like Aurora, RDS, DynamoDB

DynamoDB
- is a fully managed NoSQL database service providing single-digit millisecond latency.
- DynamoDB provisioned throughput supports On-demand and provisioned throughput capacity modes.
  - On-demand mode
    - provides a flexible billing option capable of serving thousands of requests per second without capacity planning
    - does not support reserved capacity
  - Provisioned mode
    - requires you to specify the number of reads and writes per second as required by the application
    - Understand the provisioned capacity calculations
- DynamoDB Auto Scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns.
- Know DynamoDB Burst capacity, Adaptive capacity
- DynamoDB Consistency mode determines the manner and timing in which the successful write or update of a data item is reflected in a subsequent read operation of that same item.
  - supports eventual and strongly consistent reads.
  - Eventual requires less throughput but might return stale data, whereas, Strongly consistent reads require higher throughput but would always return correct data.
- DynamoDB secondary indexes provide efficient access to data with attributes other than the primary key.
  - LSI uses the same partition key but a different sort key, whereas, GSI is a separate table with a different partition key and/or sort key.
  - GSI can cause primary table throttling if under-provisioned.
  - Make sure you understand the difference between the Local Secondary Index and the Global Secondary Index
- DynamoDB Global Tables is a new multi-master, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
  - Understand the differences between DynamoDB Global tables and Aurora Global databases esp. in terms of allowing writes in multiple regions.
- DynamoDB Time to Live – TTL enables a per-item timestamp to determine when an item is no longer needed. (hint: know TTL can expire the data and this can be captured by using DynamoDB Streams)
- DynamoDB cross-region replication allows identical copies (called replicas) of a DynamoDB table (called master table) to be maintained in one or more AWS regions.
- DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
- DynamoDB Triggers (just like database triggers) is a feature that allows the execution of custom actions based on item-level updates on a table.
- DynamoDB Accelerator – DAX is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement even at millions of requests per second.
  - DAX does not support fine-grained access control like DynamoDB.
- DynamoDB Backups support PITR
  - AWS Backup can be used to backup and restore, and it supports cross-region snapshot copy as well.
- VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway
- Understand DynamoDB Best practices (hint: selection of keys to avoid hot partitions and creation of LSI and GSI)

Aurora
- is a relational database engine that combines the speed and reliability with the simplicity and cost-effectiveness of open-source databases.
- provides MySQL and PostgreSQL compatibility
- Aurora Disaster Recovery & High Availability can be achieved using Read Replicas with very minimal downtime.
  - Aurora promotes read replicas as per the priority tier (tier 0 is the highest), the largest size if the tier matches
- Aurora Global Database provides cross-region read replicas for low-latency reads. Remember it is not multi-master and would not provide low latency writes across regions as DynamoDB Global tables.
- Aurora Connection endpoints support
  - Cluster for primary read/write
  - Reader for read replicas
  - Custom for a specific group of instances
  - Instance for specific single instance – Not recommended
- Aurora Fast Failover techniques
  - set TCP keepalives low
  - set Java DNS caching timeouts low
  - Set the timeout variables used in the JDBC connection string as low
  - Use the provided read and write Aurora endpoints
  - Use cluster cache management for Aurora PostgreSQL. Cluster cache management ensures that application performance is maintained if there’s a failover.
- Aurora Serverless is an on-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora.
- Aurora Backtrack feature helps rewind the DB cluster to the specified time. It is not a replacement for backups.
- Aurora Server Auditing Events for different activities cover log-in, DML, permission changes DCL, schema changes DDL, etc.
- Aurora Cluster Cache management feature which helps fast failover
- Aurora Clone feature which allows you to create quick and cost-effective clones
- Aurora supports fault injection queries to simulate various failovers like node down, primary failover, etc.
- RDS PostgreSQL and MySQL can be migrated to Aurora, by creating an Aurora Read Replica from the instance. Once the replica lag is zero, switch the DNS with no data loss
- Aurora Database Activity Streams help stream audit logs to external services like Kinesis
- Supports stored procedures calling lambda functions
Relational Database Service (RDS)
- provides a relational database in the cloud with multiple database options.
- RDS Snapshots, Backups, and Restore
  - restoring a DB from a snapshot does not retain the parameter group and security group
  - automated snapshots cannot be shared. Make a manual backup from the snapshot before sharing the same.
- RDS Read Replicas
  - allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads.
  - increased scalability and database availability in the case of an AZ failure.
  - supports cross-region replicas.
- RDS Multi-AZ provides high availability and automatic failover support for DB instances.
- Understand the differences between RDS Multi-AZ vs Read Replicas
  - Multi-AZ failover can be simulated using Reboot with Failure option
  - Read Replicas require automated backups enabled
- Understand DB components esp. DB parameter group, DB options groups
  - Dynamic parameters are applied immediately
  - Static parameters need manual reboot.
  - Default parameter group cannot be modified. Need to create custom parameter group and associate to RDS
  - Know max connections also depends on DB instance size
- RDS Custom automates database administration tasks and operations. while making it possible for you as a database administrator to access and customize the database environment and operating system.
- RDS Performance Insights is a database performance tuning and monitoring feature that helps you quickly assess the load on the database, and determine when and where to take action.
- RDS Security
  - RDS supports security groups to control who can access RDS instances
  - RDS supports data at rest encryption and SSL for data in transit encryption
  - RDS supports IAM database authentication with temporary credentials.
  - Existing RDS instance cannot be encrypted, create a snapshot -> encrypt it –> restore as encrypted DB
  - RDS PostgreSQL requires rds.force_ssl=1 and sslmode=ca/verify-full to enable SSL encryption
  - Know RDS Encrypted Database limitations
- Understand RDS Monitoring and Notification
  - Know RDS supports notification events through SNS for events like database creation, deletion, snapshot creation, etc.
  - CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance.
  - Enhanced Monitoring metrics are useful to understand how different processes or threads on a DB instance use the CPU.
  - RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
- RDS instance cannot be stopped if with read replicas
ElastiCache
- is a managed web service that helps deploy and run Memcached or Redis protocol-compliant cache clusters in the cloud easily.
- Understand the differences between Redis vs. Memcached
Neptune
- is a fully managed database service built for the cloud that makes it easier to build and run graph applications. Neptune provides built-in security, continuous backups, serverless compute, and integrations with other AWS services.
- provides Neptune loader to quickly import data from S3
- supports VPC endpoints

Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service.
Amazon Quantum Ledger Database (Amazon QLDB) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.
Redshift
- is a fully managed, fast, and powerful, petabyte-scale data warehouse service. It is not covered in depth.
- Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
  - COPY command which allows parallelism, and performs better than multiple COPY commands
  - COPY command can use manifest files to load data
  - COPY command handles encrypted data
- Know Redshift cross region encrypted snapshot copy
  - Create a new key in destination region
  - Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the destination region.
  - In the source region, enable cross-region replication and specify the name of the copy grant created.
- Know Redshift supports Audit logging which covers authentication attempts, connections and disconnections usually for compliance reasons.
Data Migration Service (DMS)
- DMS helps in migration of homogeneous and heterogeneous database
- DMS with Full load plus Change Data Capture (CDC) migration capability can be used to migrate databases with zero downtime and no data loss.
- DMS with SCT (Schema Conversion Tool) can be used to migrate heterogeneous databases.
- Premigration Assessment evaluates specified components of a database migration task to help identify any problems that might prevent a migration task from running as expected.
- Multiserver assessment report evaluates multiple servers based on input that you provide for each schema definition that you want to assess.
- DMS provides support for data validation to ensure that your data was migrated accurately from the source to the target.
- DMS supports LOB migration as a 2-step process. It can do a full or limited LOB migration
  - In full LOB mode, AWS DMS migrates all LOBs from source to target regardless of size. Full LOB mode can be quite slow.
  - In limited LOB mode, a maximum LOB size can be set that AWS DMS should accept. Doing so allows AWS DMS to pre-allocate memory and load the LOB data in bulk. LOBs that exceed the maximum LOB size are truncated and a warning is issued to the log file. In limited LOB mode, you get significant performance gains over full LOB mode.
  - Recommended to use limited LOB mode whenever possible.

Security, Identity & Compliance

Identity and Access Management (IAM)
- Understand IAM in depth
- Understand IAM Roles
Key Management Services
- is a managed encryption service that allows the creation and control of encryption keys to enable data encryption.
- provides data at rest encryption for the databases.
AWS Secrets Manager
- protects secrets needed to access applications, services, etc.
- enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle
- supports automatic rotation of credentials for RDS, DocumentDB, etc.

Secrets Manager vs. Systems Manager Parameter Store
- Secrets Manager supports automatic rotation while SSM Parameter Store does not
- Parameter Store is cost-effective as compared to Secrets Manager.
Trusted Advisor provides RDS Idle instances

Management & Governance Tools

Understand AWS CloudWatch for Logs and Metrics.
- EventBridge (CloudWatch Events) provides real-time alerts
- CloudWatch can be used to store RDS logs with a custom retention period, which is indefinite by default.
- CloudWatch Application Insights support .Net and SQL Server monitoring
Know CloudFormation for provisioning, in terms of
- Stack drifts – to understand the difference between current state and on actual environment with any manual changes
- Change Set – allows you to verify the changes before being propagated
- parameters – allows you to configure variables or environment-specific values
- Stack policy defines the update actions that can be performed on designated resources.
- Deletion policy for RDS allows you to configure if the resources are retained, snapshot, or deleted once destroy is initiated
- Supports secrets manager for DB credentials generation, storage, and easy rotation
- System parameter store for environment-specific parameters

Whitepapers and articles

AWS Database Services Cheat Sheet

On the Exam Day

Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.

If you are taking the AWS Online exam
- Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
- The online verification process does take some time and usually, there are glitches.
- Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
- Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂