EWU Institutional Repository

Exploring New Frontiers in Imbalanced Learning: Data Complexity-Based Solutions

Show simple item record

dc.contributor.author Newaz, Asif
dc.date.accessioned 2024-10-09T05:00:30Z
dc.date.available 2024-10-09T05:00:30Z
dc.date.issued 2024-09-27
dc.identifier.uri http://dspace.ewubd.edu:8080/handle/123456789/4442
dc.description This thesis submitted in partial fulfillment of the requirements for the degree of Masters of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh en_US
dc.description.abstract Class imbalance is a frequently occurring scenario in classification tasks. Learning from imbalanced data poses quite a challenge which has instigated a lot of research in this area. Various techniques have been developed over the years to tackle this problem. These approaches are broadly classified into two categories: Data-level modification and Algorithm-level modification. In data-level modification, the original class distribution in the data is altered through resampling techniques. In algorithm-level modification, the traditional classification algorithms are adjusted to the imbalanced scenarios by changing the cost function and making them cost-sensitive (CS). A lot of different data resampling and CS techniques have been proposed by researchers in the past decade. To understand their strengths and weaknesses, a comprehensive experimental analysis is first conducted to obtain insights about these techniques. Several limitations have been identified that limit the performance of these approaches. Most of these techniques do not take into consideration data intrinsic characteristics that complicate the learning process. Several data difficulty factors have been identified in some previous studies which are rarely addressed in most cases. Moreover, the application of many of these techniques overfits the data and causes a loss of generalization, producing poor performance while testing. They are also unable to provide well-generalized performance on a wide range of imbalanced scenarios. In this study, novel strategies have been developed to address these issues. Solutions have been proposed to limit the effects of different data difficulty factors and enhance prediction performance. Moreover, attempts have been made to overcome the shortcomings of the established approaches and obtain better generalization. Three different methods have been proposed in this study. First, a novel data resampling technique that takes into consideration data intrinsic characteristics to effectively balance the dataset. Second, an instance complexity-based CS technique which is an advanced modification to the original CS approach. Third, a hybrid framework combining resampling and CSL. Rigorous experiments have been conducted on a wide range of imbalanced datasets to validate the performance of the proposed approaches. The results have been evaluated on eight different performance measures and compared with other state-of-the-art techniques used in imbalanced learning. Superior results have been obtained from the proposed techniques on different imbalanced scenarios. The results demonstrate the efficacy of the proposed models in learning from imbalanced data. To conclude, this research delineates new trajectories in the field of the imbalanced domain. New approaches have been proposed that introduce fresh perspectives and directions in imbalanced learning. The proposed strategies are remarkably successful, ensuring well-generalized performance when addressing imbalanced data. en_US
dc.language.iso en_US en_US
dc.publisher East West Univertsity en_US
dc.relation.ispartofseries ;CSE00220
dc.subject Class imbalance, Class overlap, Cost-sensitive learning, Data difficulty factors, Empirical study, Multiclass classification, SMOTE. en_US
dc.title Exploring New Frontiers in Imbalanced Learning: Data Complexity-Based Solutions en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account