Changelog

v3.9 (development)

API Changes

New Features

  • Added new featurizer `FeatureSlicer`. `FeatureSlicer` can slice the ouput of

regular featurizer objects to just the required indices. - Generalized `KappaAngleFeaturizer` to be able compute the angles between arbitrarily offset CA atoms. - Added functions to compute error bars for transition probabilities to account for

finite sampling, and sample transition matrices from these error distributions (i.e. bootstrapping). Located in `msmbuilder.msm.validation.transmat_errorbar`.
  • Added new featurizer `FeatureSlicer`. `FeatureSlicer` can slice the ouput of regular featurizer objects to just the required indices (gh-1022).
  • Added functions to compute error bars for transition probabilities to account for finite sampling, and sample transition matrices from these error distributions (i.e. bootstrapping). Located in `msmbuilder.msm.validation.transmat_errorbar` (gh-1010).
  • Added methods for computing the Kullbeck-Leibler, symmetric KL, and Jensen-Shannon divergences of probability distributions, arrays thereof, or flattened MSM objects. The array and (flattened) MSM metrics are compatible with the custom distance function in `LandmarkAgglomerative` (gh-1035).
  • Added minimum variance cluster analysis (MVCA) for macrostating to msmbuilder.lumping (gh-1045).
  • Added the Bayesian agglomerative clustering engine (BACE) for macrostating to msmbuilder.lumping (gh-1050).

Improvements

  • `SparseTICA` now supports commute mapping.
  • `FeatureSelector` is now compatible with Tree-structure Parzen Estimator method in Osprey (gh-1018).
  • Fixed bug in `from_msm` method for `PCCA` and `PCCAPlus` which now allows a `PCCAPlus` objective function to be specified (gh-1036).
  • `msmbuilder.io.sampling.sample_dimension` with `scheme='edge'` now works properly. (#1043)
  • Changed zippy_maker code so that `Featurizer.describe_features` will return ordered unique lists to make reading and subselecting features easier.

v3.8 (April 26, 2017)

We’re pleased to annoounce the release of MSMBuilder 3.8. This release features updates and improvements to contact featurizers, kernel tICA, HMMs, and preprocessing. There are also some bugfixes and API hygiene improements. We recommend all users upgrade to MSMBuilder 3.8.

API Changes

New Features

  • ContactFeaturizer now lets you use a soft_min option for closest

contact distances.

Improvements

  • The stride parameter in KernelTICA now works as intended to

automatically generate a set of landmark points (gh-972). - The contacts parameter in CommonContactFeaturizer now performs as the contacts method in regular ContactFeaturizer albeit after validating all the contacts. - GaussianHMM and VonMisesHMM are now compatible with sklearn.pipeline.Pipeline workflows (gh-980). - msmbuilder.preprocessing is now compatible with sklearn.pipeline.Pipeline workflows (gh-987). - Fixed error in pickling HMMs (gh-996).

v3.7 (January 26, 2017)

We’re pleased to announce the release of MSMBuilder 3.7. This release introduces several new featurizers that can handle multiple sequences or multiple chains within a topology file. There are also some bugfixes and API hygiene improvements. We recommend all users upgrade to MSMBuilder 3.7.

API Changes

  • TrajFeatureUnion and SubsetFeatureUnion have been removed due to incompatibilities with the scikit-learn API.

New Features

  • KSparseTICA lets you specify the number of non-zero entries, k rather than a regularization strength (gh-916).
  • BootStrapMarkovStateModel optionally saves all the models that it generates (gh-919).
  • tICA supports commute mapping (see 10.1021/acs.jctc.6b00762) (gh-925).
  • CommonContactFeaturizer featurizes different trajectories with different topologies using a common set of inter-residue contacts (gh-876).
  • msmbuilder.tpt.mfpt.mfpts can now compute distributions of MFPTs, accounting for the model error due to finite sampling.
  • Three new featurization schemes for protein-ligand trajectories are now available: LigandContactFeaturizer, BinaryLigandContactFeaturizer, and LigandRMSDFeaturizer (gh-883).

Improvements

  • Compatibility with scikit-learn 0.18 (gh-915).
  • FeatureSelector feature order is deterministic (gh-920).
  • SASAFeaturizer supports the describe_features method (gh-913).
  • All LandmarkAgglomerative clusterers now have cluster_centers_ except when metric = rmsd (gh-958)

v3.6 (September 15, 2016)

We’re pleased to announce the release of MSMBuilder 3.6. This release introduces project templating and a whole host of new sklearn estimators. There are also some bugfixes and API hygiene improvements. We recommend all users upgrade to MSMBuilder 3.6.

API Changes

  • version.short_version is now 3.y instead of 3.y.z (gh-829).
  • weighted_transform is no longer supported in tICA methods (gh-807). Please used kinetic_mapping.
  • The cached filenames and formats for DoubleWell, QuadWell, and MullerPotential example datasets have changed. The API through msmbuilder.example_datasets is still the same, but the data may be re-generated instead of using a cached version from a previous installation of MSMBuilder (gh-854).
  • The alias for Ward clustering has been removed. Modelers should now use LandmarkAgglomerative(linkage='ward') (gh-874). Ward clustering is also available in AgglomerativeClustering, but without a prediction algorithm.

New Features

  • Butterworth, DoubleEWMA, StandardScaler, RobustScaler are available via the command line (gh-895).
  • BinaryContactFeaturizer featurizes a trajectory into a boolean array corresponding to whether each residue-residue distance is below a cutoff (gh-798).
  • LogisticContactFeaturizer produces a logistic transform of residue-residue distances about a center distance (#798).
  • FactorAnalysis, FastICA, and KernelPCA are available in the decomposition module (gh-807).
  • Butterworth, EWMA, and DoubleEWMA are available in the preprocessing module (gh-818).
  • We encourage users to download the msmb_data conda package to easily install example data. The data can be loaded through existing methods in msmbuilder.example_datasets (gh-854, gh-867).
  • An example dataset MinimalFsPeptide is available. This is a strided version of the existing FsPeptide dataset. We use it for testing, when a fully-converged dataset is not required (gh-867).
  • Project templates! Read the new tutorial or the I/O page for details (gh-768).
  • LandmarkAgglomerative clustering now features the ward linkage option. An algorithm for predicting cluster assignments with the ward objective function has been developed and implemented (gh-874).

Improvements

  • Remove a unicode character from ktica.py (gh-833)
  • msmbuilder.decomposition.KernelTICA now includes all parameters in its __init__, making it compatible with Osprey (gh-823).
  • msmbuilder.tpt methods can now handle BayesianMarkovStateModels as input. Please note that we still do not recommend using this module with BootStrapMarkovStateModel.

v3.5 (June 14, 2016)

We’re pleased to announce the release of MSMBuilder 3.5. This release wraps more relevant sklearn estimators and transformers. There are also some bugfixes and API hygiene improvements. We recommend all users upgrade to MSMBuilder 3.5.

API Changes

  • msmbuilder.featurizer.FeatureUnion is now deprecated. Please use msmbuilder.feature_selection.FeatureSelector instead (#799).
  • msmbuilder.feature_extraction has been added to conform to the scikit-learn API. This is essentially an alias of msmbuilder.featurizer (#799).

New Features

  • KernelTICA, Nystroem, and LandmarkNystroem are available in the decomposition module (#807).
  • FeatureSelector and VarianceThreshold are available in the feature_selection module (#799).
  • SparsePCA and MiniBatchSparsePCA are available in the decomposition module (#791).
  • Binarizer, FunctionTransformer, Imputer, KernelCenterer, LabelBinarizer, MultiLabelBinarizer, MinMaxScaler, MaxAbsScaler, Normalizer, RobustScaler, StandardScaler, and PolynomialFeatures are available in the preprocessing module (#796).

Improvements

  • Fix a compilation error on gcc 5 (#783)
  • Fix pickle-ing of ContinuousTimeMSM. The optimizer_state_ parameter is not saved (#822).

v3.4 (March 29, 2016)

We’re pleased to announce MSMBuilder 3.4. It contains a plethora of new features, bug fixes, and improvements.

API Changes

  • Range-based slicing on dataset objects is no longer allowed. Keys in the dataset object don’t have to be continuous. The empty slice, e.g. ds[:] loads all trajectories in a list (#610).
  • Ward clustering has been renamed AgglomerativeClustering in scikit-learn. Please use the new msmbuilder wrapper class AgglomerativeClustering. An alias for Ward has been made available (#685).
  • PCCA.trimmed_microstates_to_macrostates has been removed. This dictionary was actually keyed by untrimmed microstate labels. PCCA.transform would throw an exception when operating on a system with trimming because it was using this misleading dictionary. Please use pcca.microstate_mapping_ for this functionality (#709).
  • UnionDataset has been removed after deprecation in 3.3. Please use FeatureUnion instead (#671).
  • SubsetFeaturizer and ilk have been removed from the msmbuilder.featurizer namespace. Please import them from msmbuilder.featurizer.subset (#738).
  • FirstSlicer has been removed. Use Slicer(first=x) for the same functionality (#738).
  • msmbuilder.featurizer.load has been removed. Featurizer.save has been removed. Please use utils.load, utils.dump (#738).

New Features

  • Dataset objects can call, fit_transform_with() to simplify the common pattern of applying an estimator to a dataset object to produce a new dataset object (#610).
  • kinetic_mapping is a new option to tICA. It’s similar to weighted_transform, but based on a better theoretical framework. weighted_transform is deprecated (#766).
  • VonMisesFeaturizer uses soft bins around the unit-circle to give an alternate representation of dihedral angles (#744).
  • MarkovStateModel has a partial_transform() method (#707).
  • KappaAngleFeaturizer is available via the command line (#681).
  • MarkovStateModel has a new attribute, percent_retained_, for ergodic trimming (#689).
  • AlphaAngleFeaturizer computes the dihedral angles between alpha carbons (#691).
  • FunctionFeaturizer computes features based on an arbitrary Python function or callable (#717).
  • Automatic State Partitioning (APM) uses kinetic information to cluster conformations (#748).

Improvements

  • Consistent counts setup and ergodic cutoff across various flavors of Markov models (#718, #729, #701, #705).
  • Tests no longer depend on sklearn.hmm, which has been removed (#690).
  • Improvements to RSMDFeaturizer (#695, #764).
  • SparseTICA is completely re-written with large performance improvements when dealing with large numbers of features (#704).
  • Links for downloading example data are un-broken after figshare changed URLs (#751).

v3.3 (August 27, 2015)

We’re pleased to announce the release of MSMBuilder v3.3.0. The focus of this release is a completely re-written module for constructing HMMs as well as bug fixes and incremental improvements.

API Changes

  • FeatureUnion is an estimator that deprecates the functionality of UnionDataset. Passing a list of paths to dataset() will no longer automatically yield a UnionDataset. This behavior is still available by specifying fmt="dir-npy-union", but is deprecated (#611).
  • The command line flag for featurizers --out (deprecated in 3.2) now saves the featurizer as a pickle file (#546). Please use --transformed for the old behavior. This is consistent with other command-line commands.
  • The default number of timescales in MarkovStateModel is now one less than the number of states (was 10). This addresses some bugs with implied_timescales and PCCA(+) (#603).

New Features

  • GaussianHMM and VonMisesHMM is rewritten to feature higher code reuse and code quality (#583, #582, #584, #572, #570).
  • KDTree can find n nearest points to e.g. a cluster center (#599).
  • Slicer featurizer can slice feature arrays as part of a pipeline (#567).

Improvements

  • PCCAPlus is compatible with scipy 0.16 (#620).
  • Documentation improvements (#618, #608, #604, #602)
  • Test improvements, especially for Windows (#593, #590, #588, #579, #578, #577, #576)
  • Bug fix: MarkovStateModel.sample() produced trajectories of incorrect length. This function is still deprecated (#556).
  • Bug fix: The muller example dataset did not respect users’ specifications for initial coordinates (#631).
  • MarkovStateModel.draw_samples failed if discrete trajectories did not contain every possible state (#638). Function can now accept a single trajectory, as well as a list of them.
  • SuperposeFeaturizer now respects the topology argument when loading the reference trajectory (#555).

v3.2 (April 14, 2015)

  • tICA ignores too-short trajectories during fitting instead of raising an exception
  • New methods for sampling from MSM models
  • Datasets can be opened in “append” mode
  • Compatibility with scipy 0.16
  • utils.dump saves using the pickle protocol. utils.load is backwards compatible.
  • The command line flag for featurizers --out is deprecated. Use --transformed instead. This is consistent with other command-line commands.
  • Bug fixes

v3.1 (Feb 27, 2015)

  • Numerous improvements to ContinuousTimeMSM optimization
  • Switch ContinuousTimeMSM.score to transmat-style GMRQ
  • New example dataset with Muller potential
  • Assorted bug fixes in the command line layer

v3.0.1 (January 9, 2015)

  • Fix missing file on PyPI.

v3.0.0 (January 9, 2015)

MSMBuilder 3.0 is a complete rewrite of our previous work. The focus is on power and extensibility, with a much wider class of estimators and models supported throughout the codebase. All users are encouraged to switch to MSMBuilder 3.0. Pre-release versions of MSMBuilder 3.0 were called mixtape.