python - Creating embeddings and training on embeddings using a bigram LSTM model in Tensorflow -


i'm having trouble figuring out how create , train on bigram embeddings lstms in tensorflow.

we given train_data tensor of shape (num_unrollings, batch_size, 27) i.e.num_unrollingsis total number of batches,batch_sizeis size of each batch, and27` size of one-hot-encoded vector characters "a" "z" , including " ".

the lstm takes input single batch @ each time step i.e. takes in tensor of shape (batch_size, 27)

characters() function takes in tensor of shape 27 , returns character represents one-hot-encodings.

what have done far created index lookup each bigram. have total of 27*27 = 729 bigrams (because include " " character). choose represent each bigram vector of log(729) ~ 10 bits.

in end trying make input lstm tensor of shape (batch_size / 2, 10). can train on bigrams.

here relevant code:

batch_size=64 num_unrollings=10 num_embeddings = 729  embedding_size = 10  bigram2id = dict()  key = ""  # build dictionary of bigrams , respective indices:  in range(ord('z') - ord('a') + 2):      key = chr(97 + i)     if (i == 26):          key = " "     j in range(ord('z')- ord('a') + 2):         if j == 26:              bigram2id[key + " "] = i*27 + j              continue         bigram2id[key + chr(97 + j)] = i*27 + j  graph = tf.graph()   graph.as_default():       # embeddings      embeddings = tf.variable(tf.random_uniform([num_embeddings, embedding_size], -1.0, 1.0), trainable=false)      """     1) load training data      2) embeddings of data there inputs , labels      3) train     """      # load training data, labels both unembedded , embedded data      train_data = list()     embedded_train_data = list()     _ in range(num_unrollings + 1):         train_data.append(tf.placeholder(tf.float32, shape=[batch_size,vocabulary_size]))         embedded_train_data.append(tf.placeholder(tf.float32, shape=[batch_size / 2, embedding_size]))      # embeddings training data , labels (make sure set trainable=false)      batch_ctr in range(num_unrollings + 1):          bigram_ctr in range((batch_size // 2) + 1):              # current bigram             current_bigram = characters(train_data[batch_ctr][bigram_ctr*2]) + characters(train_data[batch_ctr][bigram_ctr*2 + 1])             # id              current_bigram_id = bigram2id[current_bigram]             # embedding             embedded_bigram = tf.nn.embedding_lookup(embeddings, embedded_bigram)             # add current batch              embedded_train_data[batch_ctr][bigram_ctr].append(embedded_bigram) 

but right now, getting shape (64, 27) must of rank 1 error , if fix that, not sure whether taking right approach.


Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -