I'm working with supervised classification of object-based satellite imageryand currently investigate different dimensionality reduction methods on their suitability to this application. As part of my work I'm also analysing the sensitivity to the training set size by visualising the learning curve (as implemented by scikit-learn).
I'm confused by the strange behaviour of Linear Discriminant Analysis (LDA) compared to that of the default Support Vector Machine (SVM) and to of Mutual-Information-based feature selection (MI) and Fisher's-criterion-based feature selection (See image below).
The values are cross-validated using 6 stratified folds. It is a multi-class classification problem with 10 classes with unbalanced classes ranging from ~100 to ~1000 samples per class. I'd appreciate any hints, on what could cause the weird behaviour of the LDA in the Train-test-ratio range from 0.03 to 0.4.