- SageMaker is a fully managed machine learning service to build, train, and deploy machine learning (ML) models quickly.
- SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
- SageMaker is designed for high availability with no maintenance windows or scheduled downtimes
- SageMaker APIs run in Amazon’s proven, high-availability data centers, with service stack replication configured across three facilities in each AWS region to provide fault tolerance in the event of a server failure or AZ outage
- SageMaker provides a full end-to-end workflow, but users can continue to use their existing tools with SageMaker.
- SageMaker supports Jupyter notebooks.
- SageMaker allows users to select the number and type of instance used for the hosted notebook, training & model hosting.
SageMaker Machine Learning
Generate example data
- Involves exploring and preprocessing, or “wrangling,” example data before using it for model training.
- To preprocess data, you typically do the following:
- Fetch the data
- Clean the data
- Prepare or transform the data
Train a model
- Model training includes both training and evaluating the model, as follows:
- Training the model
- Needs an algorithm, which depends on a number of factors.
- Need compute resources for training.
- Evaluating the model
- determine whether the accuracy of the inferences is acceptable.
Training Data Format – File mode vs Pipe mode
- Most Amazon SageMaker algorithms work best when using the optimized protobuf recordIO format for the training data.
- Using RecordIO format allows algorithms to take advantage of Pipe mode when training the algorithms that support it.
- File mode loads all of the data from S3 to the training instance volumes
- In Pipe mode, the training job streams data directly from S3.
- Streaming can provide faster start times for training jobs and better throughput.
- With Pipe mode, reduce the size of the EBS volumes for the training instances is also reduced Pipe mode needs only enough disk space to store your final model artifacts.
- File mode needs disk space to store both the final model artifacts and the full training dataset.
- SageMaker provides several built-in machine learning algorithms that can be used for a variety of problem types
- Write a custom training script in a machine learning framework that SageMaker supports, and use one of the pre-built framework containers to run it in SageMaker.
- Bring your own algorithm or model to train or host in SageMaker.
- SageMaker provides pre-built Docker images for its built-in algorithms and the supported deep learning frameworks used for training and inference
- By using containers, machine learning algorithms can be trained and deploy models quickly and reliably at any scale.
- Use an algorithm that you subscribe to from AWS Marketplace.
Deploy the model
- Re-engineer a model before integrating it with application and deploy it.
- supports both hosting services and batch transform
- provides an HTTPS endpoint where the machine learning model is available to provide inferences.
- supports Canary deployment using ProductionVariant and deploying multiple variants of a model to the same SageMaker HTTPS endpoint.
- supports automatic scaling for production variants. Automatic scaling dynamically adjusts the number of instances provisioned for a production variant in response to changes in your workload
- to inferences on entire datasets, consider using batch transform as an alternative to hosting services.
- SageMaker ensures that ML model artifacts and other system artifacts are encrypted in transit and at rest.
- SageMaker allows using encrypted S3 buckets for model artifacts and data, as well as pass a KMS key to SageMaker notebooks, training jobs, and endpoints, to encrypt the attached ML storage volume.
- Requests to the SageMaker API and console are made over a secure (SSL) connection.
- SageMaker stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.
- SageMaker notebooks are collaborative notebooks that are built into SageMaker Studio that can be launched quickly.
- can be accessed without setting up compute instances and file storage
- charged only for the resources consumed when notebooks is running
- instance types can be easily switching if more or less computing power is needed, during the experimentation phase.
SageMaker Built-in Algorithms
Please refer SageMaker Built-in Algorithms for details
Elastic Inference (EI)
- helps speed up the throughput and decrease the latency of getting real-time inferences from the deep learning models deployed as SageMaker hosted models
- adds inference acceleration to a hosted endpoint for a fraction of the cost of using a full GPU instance.
SageMaker Ground Truth
- provides automated data labeling using machine learning
- helps building highly accurate training datasets for machine learning quickly.
- offers easy access to labelers through Amazon Mechanical Turk and provides them with built-in workflows and interfaces for common labeling tasks.
- allows using your own labelers or using vendors recommended by Amazon through AWS Marketplace.
- helps lower the labeling costs by up to 70% using automatic labeling, which works by training Ground Truth from data labeled by humans so that the service learns to label data independently.
- significantly reduces the time and effort required to create datasets for training to reduce costs
- provides annotation consolidation to help improve the accuracy of the data object’s labels. It combines the results of multiple worker’s annotation tasks into one high-fidelity label.
- first selects a random sample of data and sends it to Amazon Mechanical Turk to be labeled.
- results are then used to train a labeling model that attempts to label a new sample of raw data automatically.
- labels are committed when the model can label the data with a confidence score that meets or exceeds a threshold you set.
- for confidence score falling below the defined threshold, the data is sent to human labelers.
- Some of the data labeled by humans is used to generate a new training dataset for the labeling model, and the model is automatically retrained to improve its accuracy.
- process repeats with each sample of raw data to be labeled.
- labeling model becomes more capable of automatically labeling raw data with each iteration, and less data is routed to humans.
SageMaker Automatic Model Training
- Hyperparameters are parameters exposed by machine learning algorithms that control how the underlying algorithm operates and their values affect the quality of the trained models
- Automatic model tuning is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model.
- Best Practices for Hyperparameter tuning
- Choosing the Number of Hyperparameters – limit the search to a smaller number as difficulty of a hyperparameter tuning job depends primarily on the number of hyperparameters that Amazon SageMaker has to search
- Choosing Hyperparameter Ranges – DO NOT specify a very large range to cover every possible value for a hyperparameter. Range of values for hyperparameters that you choose to search can significantly affect the success of hyperparameter optimization.
- Using Logarithmic Scales for Hyperparameters – log-scaled hyperparameter can be converted to improve hyperparameter optimization.
- Choosing the Best Number of Concurrent Training Jobs – running one training job at a time achieves the best results with the least amount of compute time.
- Running Training Jobs on Multiple Instances – Design distributed training jobs so that you get they report the objective metric that you want.
- SageMaker Neo enables machine learning models to train once and run anywhere in the cloud and at the edge.
- Automatically optimizes models built with popular deep learning frameworks that can be used to deploy on multiple hardware platforms.
- Optimized models run up to two times faster and consume less than a tenth of the resources of typical machine learning models.
- Users pay for ML compute, storage and data processing resources their use for hosting the notebook, training the model, performing predictions & logging the outputs.