MLlib (DataFrame-based) for Spark Connect#
Warning
The namespace for this package can change in the future Spark version.
Pipeline APIs#
Abstract class for transformers that transform one dataset into another.  | 
|
Abstract class for estimators that fit models to data.  | 
|
  | 
Abstract class for models that are fitted by estimators.  | 
Base class for evaluators that compute metrics from predictions.  | 
|
  | 
A simple pipeline, which acts as an estimator.  | 
  | 
Represents a compiled pipeline with transformers and fitted models.  | 
Feature#
  | 
Rescale each feature individually to range [-1, 1] by dividing through the largest maximum absolute value in each feature.  | 
  | 
Model fitted by MaxAbsScaler.  | 
  | 
Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.  | 
  | 
Model fitted by StandardScaler.  | 
  | 
A feature transformer that merges multiple input columns into an array type column.  | 
Classification#
  | 
Logistic regression estimator.  | 
  | 
Model fitted by LogisticRegression.  | 
Functions#
  | 
Converts a column of array of numeric type into a column of pyspark.ml.linalg.DenseVector instances  | 
  | 
Converts a column of MLlib sparse/dense vectors into a column of dense arrays.  | 
Tuning#
  | 
K-fold cross validation performs model selection by splitting the dataset into a set of non-overlapping randomly partitioned folds which are used as separate training and test datasets e.g., with k=3 folds, K-fold cross validation will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing.  | 
  | 
CrossValidatorModel contains the model with the highest average cross-validation metric across folds and uses this model to transform input data.  | 
Evaluation#
  | 
Evaluator for Regression, which expects input columns prediction and label.  | 
  | 
Evaluator for binary classification, which expects input columns prediction and label.  | 
Evaluator for multiclass classification, which expects input columns prediction and label.  | 
Utilities#
The base interface Estimator / Transformer / Model / Evaluator needs to inherit for supporting saving and loading.  | 
|
Meta-algorithm such as pipeline and cross validator must implement this interface.  |