hadoop - Solutions to put different values for a row-key but the same timestamps in hbase? -

- September 15, 2012

i'm new @ hbase. i'm facing problem when bulk loading data text file hbase. assuming have following table:

key_id | f1:c1 | f2:c2 row1     'a'     'b' row1     'x'     'y'

when parse 2 records , put hbase @ same time (same timestamps), version {row1 'x' 'y'} updated. here explanation:

when put data hbase, timestamp required. timestamp can generated automatically regionserver or can supplied you. timestamp must unique per version of given cell, because timestamp identifies version. modify previous version of cell, instance, issue put different value data itself, same timestamp.

i'm thinking idea specify timestamps don't know how set automatically timestamps bulkloading , affect loading performance?? need fastest , safely importing process big data.

i tried parse , put each record table, speed very slow...so question is: how many records/size of data should in batch before put hbase. (i write simple java program put. it's slower more use imporrtsv tool commands import. dont know how many size in batch of tool..)

many thx advise!

q1: hbase maintains versions using timestamps. if wont provide take default provided hbase system.

in put request can update custom time if have such requirement. doesn't not effect performance.

q2 : can in 2 ways.

simple java client batching technique shown below.
mapreduce importtsv(batch client)

ex: #1 simple java client batching technique.

i used hbase puts in batch list objects of 100000 record parsing json(similar standalone csv client )

below code snippet through achieved this. same thing can done while parsing other formats well)

may need call method in 2 places

1) batch of 100000 records.

2) processing reminder of batch records less 100000

  public void addrecord(final arraylist<put> puts, final string tablename) throws exception {         try {             final htable table = new htable(hbaseconnection.gethbaseconfiguration(), gettable(tablename));             table.put(puts);             log.info("insert record[s] " + puts.size() + " table " + tablename + " ok.");         } catch (final throwable e) {             e.printstacktrace();         } {             log.info("processed ---> " + puts.size());             if (puts != null) {                 puts.clear();             }         }     }

note : batch size internally controlled hbase.client.write.buffer below in 1 of config xmls

<property>          <name>hbase.client.write.buffer</name>          <value>20971520</value> // around 2 mb guess  </property>

which has default value 2mb size. once buffer filled flush puts insert in table.

furthermore, either mapreduce client or stand alone client batch technique. batching controlled above buffer property

Search This Blog

If cop

hadoop - Solutions to put different values for a row-key but the same timestamps in hbase? -

Comments

Post a Comment

Popular posts from this blog

Android volley - avoid multiple requests of the same kind to the server? -

magento2 - Magento 2 admin grid add filter to collection -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -