Inhalt des Dokuments
Stereotype Discovery in Vector Representations
Societal stereotypes are reflected in all known text corpora, that are used to train vector representations of words. Thus, even though many words are per se gender neutral, we associate them with "male" or "female" features. These stereotypes are reproduced by NLP algorithms and taken into the vector represenation. It is semantically correct to associate the word "sister" with "woman", but it is a stereotype to do the same with the word "kitchen". In this work a method is to be developed that automatically finds these stereotypes by training vector representations on different corpora and compare the results with respect to stereotypical attributes.
If the words are represented in compact vectors we can explore some geometrical characteristics to find the stereotypes. The vector between man and woman is equal to the on between king and queen. Using that we can determine, if a secretary is rather male or female in comparision to an entrepreneur.
The task is to generate various vector representations on different corpora and to compare their characteristics that might be different if they are stereotypical. Once the stereotypes are identified, we can reduce them by moving them into the "opposite" direction in the vector space.
The method is to be evaluated with performance tests on common NLP-Tasks (such as semantic similarity, synonyms, semantic analysis, ...) and show, to which extent the reduction of stereotypes has on the performance of the NLP task.
basic knowledge in statistics and natural language processing, version control systems and python or a comparable script language