hip-data-ml-utils¶

This python library package covers the common utility packages that data/ml project will use

hip-data-ml-utils has a few utilities that we try to generalise across projects.

Contents:

Why are packaging this into a python library?¶

There are utilities that are copied and pasted across different repositories (and projects), and we can streamline this to a package import.

Also, this would allow us to save some time by writing lesser tests.

Additionally, this would align the way how analysts query with athena through python too.

Getting Started¶

You can install hip-data-ml-utils from the git repo using pip

$ pip install hip-data-ml-utils --upgrade

Pyathena¶

We try to fit the function calls to be as simple as possible with a one-liner. This covers:

query athena tables and return as pandas dataframe
drop athena tables from offline feature store
create athena tables through schema, and update table with missing partitions

See Pyathena client for more details.

MLflow tracker¶

We try to fit the function calls to be as simple as possible with a one-liner. This covers:

log artifact
log and register a model
log params
log metrics

See MLflow tracker utils for more details.

MLflow utils¶

We try to fit the function calls to be as simple as possible with a one-liner. This covers:

load model
load artifact
get mlflow model evaluation metrics
get registered model run info and mlflow run_id
mlflow promote model

See MLflow utils for more details.

MLflow serve¶

We try to fit the function calls to be as simple as possible with a one-liner. This covers:

enable model endpoint
get endpoint status
get endpoint state status
update databricks model endpoint compute config

See MLflow serve for more details.

MLflow prediction requests¶

We try to fit the function calls to be as simple as possible with a one-liner. This covers:

verify prediction of requests and expected
post requests for integration tests

See MLflow prediction requests for more details.

Fork me on GitHub