python - Arrange the multi-similar data efficiently -

- January 15, 2013

the datafile showed here measuring record exported instrument.

i uploaded here, interested can download it.

background

sample record-1 fid1, fid2, front_temperature, laser, laserlow, pressure, mode -925    284 1452    315 143 16653    -28500 -924    281 1462    322 136 16641    -28628 -920    281 1455    311 139 16649    -28756 -923    279 1454    312 139 16636    -28884 ......  sample record-2 fid1, fid2, front_temperature, laser, laserlow, pressure, mode -925    284 1452    315 143 16653    -28500 ...... ......

generally, there several record different samples in order of testing routine. , data record these samples in same format.

my attempt

if there 1 sample in datafile( in *.txt format), can arrange datafile pandas. dataframe, can handle data more analysis process in python.

my code shown here:

# whole datafile several samples record inside open("record.txt") f:      mylist = f.read().splitlines()   ## record each sample length in 803 lines lines = mylist[0:803]  ### sample_name extract third line sample_name = lines[2]  ### each sample, measure record saved in several aspects,  ### regarded columns here columns  = lines[22].split()  ### generate empty columns saving data record later. df  = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],   columns[5][:-1]:[],columns[6][:-1]:[],} #### though dumb method  ## data extracting ### valid data record of sample 1 line 23 in range(0, len(lines[23:]),1):     j in range(0, len(columns),1):         df[columns[j][:-1]].append(lines[23+i].split()[j]) pd.dataframe(df)

the result shows this:

my target

from code above, deal datafile 1 sample. when there several samples represented in record text. couldn't find clue deal efficiently.

here illustration of target. generate dataframe dict saving samples records.

any advice appreciate!

i think looking this:

import pandas pd # whole datafile several samples record inside open("record.txt",'r') f:      mylist = f.read().splitlines()   dataset = [] while true:      try:         ## record each sample length in 803 lines         lines, mylist = mylist[0:803], mylist[803:] #this split list!!         ### sample_name extract third line         sample_name = lines[2]            ### each sample, measure record saved in several aspects,          ### regarded columns here         columns  = lines[22].split()          ### generate empty columns saving data record later.         df  = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],                columns[5][:-1]:[],columns[6][:-1]:[],} #### though dumb method          ## data extracting         ### valid data record of sample 1 line 23         in range(0, len(lines[23:]),1):             j in range(0, len(columns),1):                 df[columns[j][:-1]].append(lines[23+i].split()[j])      except indexerror:         break      df = pd.dataframe(df)     dataset.append(df)

now dataset[0] should contain df of sample 1.

Search This Blog

If cop