Skip to content

glm-tools/pyglmnet

Repository files navigation

pyglmnet

A python implementation of elastic-net regularized generalized linear models

License Travis Codecov Circle Gitter DOI JOSS

[Documentation (stable version)] `[Documentation (development version)]`_

https://user-images.githubusercontent.com/15852194/67919367-70482600-fb76-11e9-9b86-891969bd2bee.jpg

  • Pyglmnet provides a wide range of noise models (and paired canonical link functions): 'gaussian', 'binomial', 'probit', 'gamma', 'poisson', and 'softplus'.
  • It supports a wide range of regularizers: ridge, lasso, elastic net, group lasso, and Tikhonov regularization.
  • We have implemented a cyclical coordinate descent optimizer with Newton update, active sets, update caching, and warm restarts. This optimization approach is identical to the one used in R package.
  • A number of Python wrappers exist for the R glmnet package (e.g. here and here) but in contrast to these, Pyglmnet is a pure python implementation. Therefore, it is easy to modify and introduce additional noise models and regularizers in the future.

Installation

Install the stable PyPI version with pip

$ pip install pyglmnet

For the bleeding edge development version:

Clone the repository.

$ pip install https://api.github.com/repos/glm-tools/pyglmnet/zipball/master

Getting Started

Here is an example on how to use the GLM estimator.

import numpy as np
import scipy.sparse as sps

import matplotlib.pyplot as plt
from pyglmnet import GLM, simulate_glm

n_samples, n_features = 1000, 100
distr = 'poisson'

# sample a sparse model
np.random.seed(42)
beta0 = np.random.rand()
beta = sps.random(1, n_features, density=0.2).toarray()[0]

# simulate data
Xtrain = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytrain = simulate_glm('poisson', beta0, beta, Xtrain)
Xtest = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytest = simulate_glm('poisson', beta0, beta, Xtest)

# create an instance of the GLM class
glm = GLM(distr='poisson', score_metric='pseudo_R2', reg_lambda=0.01)

# fit the model on the training data
glm.fit(Xtrain, ytrain)

# predict using fitted model on the test data
yhat = glm.predict(Xtest)

# score the model on test data
pseudo_R2 = glm.score(Xtest, ytest)
print('Pseudo R^2 is %.3f' % pseudo_R2)

# plot the true coefficients and the estimated ones
plt.stem(beta, markerfmt='r.', label='True coefficients')
plt.stem(glm.beta_, markerfmt='b.', label='Estimated coefficients')
plt.ylabel(r'$\beta$')
plt.legend(loc='upper right')

# plot the true vs predicted label
plt.figure()
plt.plot(ytest, yhat, '.')
plt.xlabel('True labels')
plt.ylabel('Predicted labels')
plt.plot([0, ytest.max()], [0, ytest.max()], 'r--')
plt.show()

More pyglmnet examples and use cases.

Tutorial

Here is an extensive tutorial on GLMs, optimization and pseudo-code.

Here are slides from a talk at PyData Chicago 2016, corresponding tutorial notebooks and a video.

How to contribute?

We welcome pull requests. Please see our developer documentation page for more details.

Citation

If you use pyglmnet package in your publication, please cite us from our JOSS publication using the following BibTex

@article{Jas2020,
doi = {10.21105/joss.01959},
url = {https://doi.org/10.21105/joss.01959},
year = {2020},
publisher = {The Open Journal},
volume = {5},
number = {47},
pages = {1959},
author = {Mainak Jas and Titipat Achakulvisut and Aid Idrizović
          and Daniel Acuna and Matthew Antalek and Vinicius Marques
          and Tommy Odland and Ravi Garg and Mayank Agrawal
          and Yu Umegaki and Peter Foley and Hugo Fernandes
          and Drew Harris and Beibin Li and Olivier Pieters
          and Scott Otterson and Giovanni De Toni and Chris Rodgers
          and Eva Dyer and Matti Hamalainen and Konrad Kording and Pavan Ramkumar},
title = {{P}yglmnet: {P}ython implementation of elastic-net regularized generalized linear models},
journal = {Journal of Open Source Software}
}

Acknowledgments

License

MIT License Copyright (c) 2016-2019 Pavan Ramkumar