Getting Started

K8s Install

If you are running inside of the Splice Machine Cloud Service in a Jupyter Notebook, MLManager will already be installed for you. If you’d like to install it (or upgrade it), you can install from git with

[sudo] pip install [--upgrade] splicemachine

External Installation

If you would like to install outside of the K8s cluster (and use the ExtPySpliceContext), you can install the stable build with

[sudo] pip install [--upgrade] splicemachine

Package Extras

The splicemachine pypi package has 2 extra installs, stats and notebook. These include extra dependencies for usage with build-in ML/Statistics functionality and extra jupyter specific functionality (like Feature Store Feature search)

To install them, you can install with the standard extra syntax for Pypi. If you’d like both (recommended), you can run

[sudo] pip install [--upgrade] splicemachine[all]

If you are using zsh you must escape the package extra with

[sudo] pip install [--upgrade] splicemachine\[all\]

Usage

This section covers importing and instantiating the Native Spark DataSource

To use the Native Spark DataSource inside of the `cloud service<https://cloud.splicemachine.io/register?utm_source=pydocs&utm_medium=header&utm_campaign=sandbox>`_., first create a Spark Session and then import your PySpliceContext

from pyspark.sql import SparkSession
from splicemachine.spark import PySpliceContext
from splicemachine.mlflow_support import * # Connects your MLflow session automatically
from splicemachine.features import FeatureStore # Splice Machine Feature Store

spark = SparkSession.builder.getOrCreate()
splice = PySpliceContext(spark) # The Native Spark Datasource (PySpliceContext) takes a Spark Session
fs = FeatureStore(splice) # Create your Feature Store
mlflow.register_splice_context(splice) # Gives mlflow native DB connection
mlflow.register_feature_store(fs) # Tracks Feature Store work in Mlflow automatically