| Download PDFOpen PDF in browser Performance Analysis of Parallel Programs with RAPIDS as a Framework of ExecutionEasyChair Preprint 15191, version 217 pages•Date: October 9, 2024AbstractIn this age where data is growing at an astronomical rate, with unfettered access to digital information, complexities have been introduced to scientific computations, analysis,and inferences. However, with innovative designs brought to the fore by NVIDIA and othermarket players in recent times, there have been productions of state-of-the-art GPUs that seamlessly handle complex mathematical simulations and computations, Artificial Intelligence, Machine Learning, and high-performance computing, producing highly improved speed and efficiency, with room for scalability. In this work, we analyzed the parquet-formatted New York City yellow taxi dataset on a RAPIDS and DASK-supported distributed data-parallel training platform using NVIDIA multi-GPUs. The dataset was used to train Extreme Gradient Boosting (XGBoost), RandomForest Regressor, and Elastic Net models for trip fare predictions. Our models achieved notableperformance metrics. The XGBoost achieved a mean squared error of 11.38 and R-squared of 0.9678. The model training and evaluation time took 38.51 seconds despite the huge size of the training dataset, showing how computationally efficient the system was. The model results for the RandomForest MSE was 21.96, and the R-squared was 0.9378. In the bid to show the scalability and versatility of our experimental design to different machine learning domains, our GPU-accelerated training was extended to image classification tasks by using MobileNet-V3-Large pre-trained architecture on a CIFAR-100 dataset. We achieved a ROC AUC of over 95% for the implementation. This work advances the state-of-the-art in parallel computing through implementation of RAPIDS and DASK frameworks on a distributed data-parallel training platform making use of NVIDIA multi-GPUs. Keyphrases: Coefficient of determination, Description and Analysis, GPU(Graphics Processing Unit), Mathematical simulations, RAPIDS-24.06, Scalability, city yellow taxi dataset, data parallel model, deep learning, distributed data parallel training platform, distributed training pipeline, execution time, fare distribution analysis, fare predictions, machine learning, mean squared, nvidia multi gpus, parallel programming, rapids integration figure, trip fare prediction 
 | 

