Introduction to SmallK

SmallK is a high performance software package for low rank matrix approximation via the nonnegative matrix factorization (NMF). NMF is a low-rank approximation where a matrix is approximated by a product of two nonnegative factors. The role of NMF in data analytics has been as significant as the singular value decomposition (SVD). However, due to nonnegativity constraints, NMF has far superior interpretability of its results for many practical problems such as image processing, chemometrics, bioinformatics, topic modeling for text analytics and many more. Our approach to solving the NMF nonconvex optimization problem has proven convergence properties and is one of the most efficient methods developed to date.

Distributed Versions

Recently open sourced: MPI-FAUN! Both MPI and OPENMP implementations for MU, HALS and ANLS/BPP based NMF algorithms are now available. The implementations can run off the shelf or can be easily integrated into other source code. These are very highly tuned NMF algorithms to work on super computers. We have tested this software in NERSC as well OLCF cluster. The openmp implementation is tested on many different linux variants with intel processors. The library works well for both sparse and dense matrices.

Please visit MPI-FAUN for more information and source code.

Ground truth data for graph clustering and community detection

Community discovery is an important task for revealing structures in large networks. The massive size of contemporary social networks poses a tremendous challenge to the scalability of traditional graph clustering algorithms and the evaluation of discovered communities.

Please visit dblp ground truth data to obtain the data.


This work was funded in part by the DARPA XDATA program under contract FA8750-12-2-0309. Our DARPA program manager is Mr. Wade Shen and our XDATA Principal Investigator is Prof. Haesun Park of the Georgia Institute of Technology.

Copyright and Software License

SmallK is under copyright by the Georgia Institute of Technology, 2017. All source code is released under the Apache 2.0 license.