Return to Article Details Web text corpus extraction system for linguistic tasks