A Hybrid Parallel Techniques for Large-Scale Text Classification with Symmetric ADMM and 1-Factorization

Vijayakumar H. Bhajantri

doi:10.52710/fcb.41

PDF

Published: Dec 31, 2024

DOI: https://doi.org/10.52710/fcb.41

Keywords:

Machine Learning, Support Vector Machine, Alternate Direction Method of Multipliers, Symmetric, Crammer and Singer, Weston and Watkins, Multiclass Classification, Large Scale Dataset, LSHTC, and Distributed Computing.

Vijayakumar H. Bhajantri, Shashikumar G. Totad, Geeta R. Bharamagoudar

Abstract

Distributed computing has become essential for handling large-scale datasets in machine learning. This study focuses on the implementation of two prominent multiclass Support Vector Machine (SVM) algorithms—Crammer and Singer (CS) and Weston and Watkins (WW)—in a distributed computing environment using symmetric Alternating Direction Method of Multipliers (ADMM). Designed for multiclass classification, these algorithms extend traditional binary SVMs by optimizing a single objective function that captures all class relationships simultaneously. These algorithms are adapted to leverage the computational power of multi-node clusters, ensuring scalability and efficiency. Symmetric ADMM is employed to decompose the optimization problem across multiple nodes, enabling parallel processing and efficient convergence for large datasets. The implementation distributes the optimization problem across a multi-node cluster, enabling parallel computation for handling large-scale dataset LSHTC. Each node processes a subset of the data, and symmetric ADMM ensures coordination and convergence by exchanging updates between nodes in a balanced manner. The optimization leverages polynomial kernels to handle non-linear separability effectively. The distributed framework reduces computational overhead and memory constraints compared to traditional single-node methods, achieving efficient scaling with the number of nodes. Results from the implementation highlight improved training times and robust classification accuracy, demonstrating the effectiveness of symmetric ADMM in balancing workload and accelerating convergence. This implementation serves as a scalable solution for high-dimensional datasets in resource-intensive environments.

Issue

Volume 2024, Issue 7

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details