A Survey on Data Distribution Challenges and Solutions in Vertical and Horizontal Federated Learning
Keywords:
non-IID data, Vertical Federated Learning, Horizontal Federated LearningAbstract
Federated learning is a novel way of training machine learning models on data that is distributed across multiple devices, such as smartphones and IoT sensors, without compromising privacy, efficiency, or security. However, federated learning faces a significant challenge when the data on each device is not independent and identically distributed (non-IID), which means that the data may have different distributions, sizes, or qualities. non-IID data is a major challenge for federated learning, as it affects the accuracy and participation of the local devices. Most existing methods focus on improving the model, algorithm, or framework of federated learning to deal with non-IID data. However, there is a lack of systematic and up-to-date reviews on this topic. In this paper, we survey different approaches to address the challenge of non-IID data in Vertical Federated Learning (VFL) and Horizontal Federated Learning (HFL). We organize the existing literature based on the perspective of the researcher and the sub-tasks involved in each approach. Our goal is to provide a comprehensive and systematic overview of the problem and its solutions.