AI, Ethics, and Society
homework | report | math代做 | Algorithm | Machine learning | project代做 | Python代做 – 该题目是一个常规的ai的练习题目代写, 是有一定代表意义的report/math/Algorithm/Machine learning/Python等代写方向, 该题目是值得借鉴的project代写的题目
AI, Ethics, and Society
homework project #
Readings: Chapter 7: Weapons of math Destruction (Sweating Bullets: On the Job) A Few Useful Things to Know about Machine learning by Pedro Domingos https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
In this assignment, youll apply AI/ML algorithms related to two applications word embeddings and facial recognition.
Task Set #1: Here you will use distributional vectors trained using Googles deep learning Word2vec system.
- Familiarize yourself with the original paper on word2vec – Mikolov et al. (2013) (http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their- compositionality.pdf). To learn more about the system and how to train your own vectors, you can find more information here (https://code.google.com/archive/p/word2vec). To learn about the Python wrapper around Word2vec, you can find more information here (https://rare- technologies.com/word2vec-tutorial/)
- Install Gensim (Example: pip install gensim. | pip install –upgrade gensim)
- Download the provided reducedvector.bin file on Canvas which is a a pre-trained Word2vec model based on the Google News dataset (https://code.google.com/archive/p/word2vec/) from gensim.models import Word2Vec import gensim.models import nltk newmodel = gensim.models.KeyedVectors.load_word2vec_format(
, binary=True) - We can compute similarity measures associated with words within the model. For example, to find different measures of similarity based on the data in the Word2vec model, we can use: # Find the five nearest neighbors to the word man newmodel.most_similar(‘man’, topn=5)
# Compute a measure of similarity between woman and man
newmodel.similarity('woman', 'man')
- To complete analogies like man is to woman as king is to ??, we can use: newmodel.most_similar(positive=[king, ‘woman’], negative=[‘man’], topn=1)
Q1: We will use the target words – man and woman. Use the pre-trained word2vec model to rank the following 15 words from the most similar to the least similar to each target word. For each word-target word pair, provide the similarity score. Provide your results in table format. wife husband child queen king man
woman
birth
doctor
nurse
teacher
professor
engineer
scientist
president
Q2: The Bigger Analogy Test Set (BATS) Word analogy task has been one of the standard benchmarks for word embeddings since 2013 (https://vecto.space/projects/BATS/ ). A) Select any file from the downloaded dataset (BATS_3.0.zip). For each row in your selected file, choose a target word from the row and provide the measure of similarity between your target word and the other words on the row (Remember to document the file used). B) Think of three words that identify membership in one of the protected classes (choose only one class): race, color, religion, or national origin. For each row in your selected BATS_3.0 file, compute the similarity between your target word and each of your three words. Indicate when there are noticeable differences in the similarity scores based on membership in the protected class. Provide your results in table format.
Q3: Sentences: king is to throne as judge is to ___? giant is to dwarf as genius is to ___? college is to dean as jail is to ___? arc is to circle as line is to ___? French is to France as Dutch is to ___? man is to woman as king is to ___? water is to ice as liquid is to ___? bad is to good as sad is to ___? nurse is to hospital as teacher is to ___? usa is to pizza as japan is to ___? human is to house as dog is to ___? grass is to green as sky is to ___? video is to cassette as computer is to ____? universe is to planet as house is to ____? poverty is to wealth as sickness is to ___?
a. Complete the above sentences with your own word analogies. Use the Word2Vec model to find the
similarity measure between your pair of words. Provide your results.
Example:
man is to woman as king is to _queen__?
newmodel.similarity('king', 'queen') -> 0.
b. Use the Word2Vec model to find the word analogy and corresponding similarity score. Provide
your results.
Example:
man is to woman as king is to ___?
newmodel.most_similar(positive=[king, 'woman'], negative=['man'], topn=1) -> queen,
0.
c. Lastly, compute and print the correlation between the vector of similarity scores from your
analogies versus the Word2Vec analogy-generated similarity scores. What is the strength of the
correlation?
o .00-.19 very weak correlation
o .20-.39 weak correlation
o .40-.59 moderate correlation
o .60-.79 strong correlation
o .80-1.0 very strong correlation
Task Set #2: For this part of the assignment, we will work with the UTK dataset (UTKface_cropped.tar.gz) available on Canvas and based on the original UTKFace dataset (https://susanqq.github.io/UTKFace/)
Q1: Each image in the dataset has a unique value representing age, gender, and race based on the following legend: age: indicates the age of the person in the picture and can range from 0 to 116. gender: indicates the gender of the person and is either 0 (male) or 1 (female). race: indicates the race of the person and can from 0 to 4, denoting White, Black, Asian, Indian, and Others (like Hispanic, Latino, Middle Eastern).
Complete and answer the following:
- Compute and document the frequency of images associated with each subgroup for age (subdivide based on – (0-20), (21,40), (41,60), (61,80), (81, 116)), gender (0,1), and race (0 to 4).
- For age, which subgroup has the largest representation? Which subgroup has the least representation?
- For gender, which subgroup has the largest representation? Which subgroup has the least representation?
- For race, which subgroup has the largest representation? Which subgroup has the least representation?
- Recreate a table of the age group, gender, and race distributions of subjects based on the UTK dataset subgroups. Please see the table below as an example – inspired by the one discussed in the lecture.
- Based on what youve learned so far, if an Algorithm is trained based on this dataset, which group(s) will be impacted the most? Explain why.
http://biometrics.cse.msu.edu/Publications/Face/HanJain_UnconstrainedAg
eGenderRaceEstimation_MSUTechReport2014.pdf
Turn in a report (in PDF format) documenting your outputs in each Step. The report should follow the JDF format. Jupyter notebook (ipynb files) submission is optional, but a final PDF document per JDF format is required. The file name for submission is GTuserName_Assignment_4, for example, Joyner03_Assignment_4. Reports that are not neat and well organized will receive up to a 10-point deduction. All charts, graphs, and tables should be generated in Python or Excel, or any other suitable