But if you want to test, if a method is usable to help poor readers, wouldn't it be better to eliminate or limit the variance of the variable 'reading ability'?
If you test one hundred boys or girls you will only know if the method works for different reading abilities of one gender. But I would think it was more important to know if it works for all/most members of either gender with poor reading abilities.

I absolutely don't want to say that doing it like 'pretty much everyone' is wrong. I have no experience in educational experiments. I just think from time to time one should take a look on the way thinks are done and decide if the way 'it was always done' is the best for the wanted outcome.