Methodology: ML Theory & Design: Active Learning and Self-paced Learning

Federated Active Learning

Federated learning (FL) has been intensively investigated in terms of communication efficiency, privacy, and fairness. However, efficient annotation, which is a pain point in real-world FL applications, is less studied. In this project, we propose to apply active learning (AL) and sampling strategy into the FL framework to reduce the annotation workload. We expect that the AL and FL can improve the performance of each other complementarily. In our proposed federated active learning (F-AL) method, the clients collaboratively implement the AL to obtain the instances which are considered as informative to FL in a distributed optimization manner. We compare the test accuracies of the global FL models using the conventional random sampling strategy, client-level separate AL (S-AL), and the proposed F-AL. We empirically demonstrate that the F-AL outperforms baseline methods in image classification tasks.

Annotation strategies for federated learning

Related publications:
Federated Active Learning (F-AL): an Efficient Annotation Strategy for Federated Learning, https://arxiv.org/abs/2202.00195

Self-paced Convolutional Neural Network

The development of a robust and reliable deep learning model for medical image analysis is highly challenging due to the combination of the high heterogeneity in the medical images and the relative lack of training samples. Specifically, annotation and labeling of the medical images is much more expensive and time-consuming than other applications and often involves manual labor from multiple domain experts. In this work, we propose a multi-stage, self-paced learning framework utilizing a convolutional neural network (CNN) to classify Computed Tomography (CT) image patches. The key contribution of this approach is that we augment the size of training samples by refining the unlabeled instances with a self-paced learning CNN. By implementing the framework on high performance computing servers, we obtained the experimental result, showing that the self-pace boosted network consistently outperformed the original network even with very scarce manual labels. The performance gain indicates that applications with limited training samples such as medical image analysis can benefit from using the proposed framework.

Left: Illustration of the conventional CNN approach of performing image classification. Right: Illustration of the spCNN framework consists of the main Classification Module similar to the conventional approach, and the Bootstrapping Module which provides the extra training data through virtual sample selection based on the bootstrapping CNNs performed on the new unlabeled data.

Related publications:
Self-paced convolutional neural network for computer aided detection in medical imaging analysis, 8th International Workshop on Machine Learning in Medical Imaging (MLMI 2017), https://arxiv.org/abs/1707.06145