- provides highly optimized implementations of the Word2vec and text classification algorithms.
- Word2vec algorithm
  - useful for many downstream natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, machine translation, etc.
  - maps words to high-quality distributed vectors, whose representation is called word embeddings
  - word embeddings capture the semantic relationships between words.
- Text classification
  - is an important task for applications performing web searches, information retrieval, ranking, and document classification
- provides the Skip-gram and continuous bag-of-words (CBOW) training architectures

DeepAR forecasting algorithm

- is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN).
- use the trained model to generate forecasts for new time series that are similar to the ones it has been trained on.

Factorization machine

- is a general-purpose supervised learning algorithm used for both classification and regression tasks.
- extension of a linear model designed to capture interactions between features within high dimensional sparse datasets economically

Image classification algorithm

- a supervised learning algorithm that supports multi-label classification
- takes an image as input and outputs one or more labels
- uses a convolutional neural network (ResNet) that can be trained from scratch or trained using transfer learning when a large number of training images are not available.
- recommended input format is Apache MXNet RecordIO. Also supports raw images in .jpg or .png format.

IP Insights

- is an unsupervised learning algorithm that learns the usage patterns for IPv4 addresses.
- designed to capture associations between IPv4 addresses and various entities, such as user IDs or account numbers

K-means algorithm

- is an unsupervised learning algorithm for clustering
- attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups

K-nearest neighbors (k-NN) algorithm

- is an index-based algorithm.
- uses a non-parametric method for classification or regression.
- For classification problems, the algorithm queries the k points that are closest to the sample point and returns the most frequently used label of their class as the predicted label.
- For regression problems, the algorithm queries the k closest points to the sample point and returns the average of their feature values as the predicted value.

Latent Dirichlet Allocation (LDA) algorithm

- is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories.
- used to discover a user-specified number of topics shared by documents within a text corpus.

Linear Learner

- are supervised learning algorithms used for solving either classification or regression problems

Neural Topic Model (NTM) Algorithm

- is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution
- Topic modeling can be used to classify or summarize documents based on the topics detected or to retrieve information or recommend content based on topic similarities.

Object2Vec algorithm

- is a general-purpose neural embedding algorithm that is highly customizable
- can learn low-dimensional dense embeddings of high-dimensional objects.

Object Detection algorithm

- detects and classifies objects in images using a single deep neural network.
- is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene.

Principal Component Analysis

- is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible

Random Cut Forest (RCF)

- is an unsupervised algorithm for detecting anomalous data points within a data set.

Semantic segmentation algorithm

- provides a fine-grained, pixel-level approach to developing computer vision applications

SageMaker Sequence to Sequence (seq2seq)

- is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens.
- key uses cases are machine translation (input a sentence from one language and predict what that sentence would be in another language), text summarization (input a longer string of words and predict a shorter string of words that is a summary), speech-to-text (audio clips converted into output sentences in tokens)

XGBoost (eXtreme Gradient Boosting)

- is a popular and efficient open-source implementation of the gradient boosted trees algorithm.
- Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler, weaker models

Jayendra's Cloud Certification Blog

AWS SageMaker Built-in Algorithms Summary

SageMaker Built-in Algorithms

BlazingText algorithm

DeepAR forecasting algorithm

Factorization machine

Image classification algorithm

IP Insights

K-means algorithm

K-nearest neighbors (k-NN) algorithm

Latent Dirichlet Allocation (LDA) algorithm

Linear Learner

Neural Topic Model (NTM) Algorithm

Object2Vec algorithm

Object Detection algorithm

Principal Component Analysis

Random Cut Forest (RCF)

Semantic segmentation algorithm

SageMaker Sequence to Sequence (seq2seq)

XGBoost (eXtreme Gradient Boosting)