Then to get hidden layer values for "heart", you just take the 958th row of the embedding matrix. We encode the words as integers, for example "heart" is encoded as 958, "mind" as 18094. Instead of doing the matrix multiplication, we use the weight matrix as a lookup table. We can do this because the multiplication of a one-hot encoded vector with a matrix returns the row of the matrix corresponding the index of the "on" input unit. We skip the multiplication into the embedding layer by instead directly grabbing the hidden layer values from the weight matrix. We call this layer the embedding layer and the weights are embedding weights. Embeddings are just a fully connected layer like you've seen before. To solve this problem and greatly increase the efficiency of our networks, we use what are called embeddings. The matrix multiplication going into the first hidden layer will have almost all of the resulting values be zero. Trying to one-hot encode these words is massively inefficient, you'll have one element set to 1 and the other 50,000 set to 0. When you're dealing with words in text, you end up with tens of thousands of classes to predict, one for each word. An implementation of word2vec from Thushan Ganegedara.NIPS paper with improvements for word2vec also from Mikolov et al.First word2vec paper from Mikolov et al. A really good conceptual overview of word2vec from Chris McCormick.I suggest reading these either beforehand or while you're working on this material. Here are the resources I used to build this notebook. This will come in handy when dealing with things like machine translation. By implementing this, you'll learn about embedding words for use in natural language processing. In this notebook, I'll lead you through using TensorFlow to implement the word2vec algorithm using the skip-gram architecture.
0 Comments
Leave a Reply. |