provides highly optimized implementations of the Word2vec and text classification algorithms.
useful for many downstream natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, machine translation, etc.
maps words to high-quality distributed vectors, whose representation is called word embeddings
word embeddings capture the semantic relationships between words.
is an important task for applications performing web searches, information retrieval, ranking, and document classification
provides the Skip-gram and continuous bag-of-words (CBOW) training architectures
DeepAR forecasting algorithm
is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN).
use the trained model to generate forecasts for new time series that are similar to the ones it has been trained on.
is a general-purpose supervised learning algorithm used for both classification and regression tasks.
extension of a linear model designed to capture interactions between features within high dimensional sparse datasets economically
Image classification algorithm
a supervised learning algorithm that supports multi-label classification
takes an image as input and outputs one or more labels
uses a convolutional neural network (ResNet) that can be trained from scratch or trained using transfer learning when a large number of training images are not available.
recommended input format is Apache MXNet RecordIO. Also supports raw images in .jpg or .png format.
is an unsupervised learning algorithm that learns the usage patterns for IPv4 addresses.
designed to capture associations between IPv4 addresses and various entities, such as user IDs or account numbers
is an unsupervised learning algorithm for clustering
attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups
K-nearest neighbors (k-NN) algorithm
is an index-based algorithm.
uses a non-parametric method for classification or regression.
For classification problems, the algorithm queries the k points that are closest to the sample point and returns the most frequently used label of their class as the predicted label.
For regression problems, the algorithm queries the k closest points to the sample point and returns the average of their feature values as the predicted value.
Latent Dirichlet Allocation (LDA) algorithm
is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories.
used to discover a user-specified number of topics shared by documents within a text corpus.
are supervised learning algorithms used for solving either classification or regression problems
Neural Topic Model (NTM) Algorithm
is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution
Topic modeling can be used to classify or summarize documents based on the topics detected or to retrieve information or recommend content based on topic similarities.
is a general-purpose neural embedding algorithm that is highly customizable
can learn low-dimensional dense embeddings of high-dimensional objects.
Object Detection algorithm
detects and classifies objects in images using a single deep neural network.
is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene.
Principal Component Analysis
is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible
Random Cut Forest (RCF)
is an unsupervised algorithm for detecting anomalous data points within a data set.
Semantic segmentation algorithm
provides a fine-grained, pixel-level approach to developing computer vision applications
SageMaker Sequence to Sequence (seq2seq)
is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens.
key uses cases are machine translation (input a sentence from one language and predict what that sentence would be in another language), text summarization (input a longer string of words and predict a shorter string of words that is a summary), speech-to-text (audio clips converted into output sentences in tokens)
XGBoost (eXtreme Gradient Boosting)
is a popular and efficient open-source implementation of the gradient boosted trees algorithm.
Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler, weaker models