python 3.x - Removing strings that match multiple regex patterns from pandas series -

- January 15, 2012

i have pandas dataframe column containing text needs cleaned of strings match various regex patterns. current attempt (given below) loops through each pattern, creating new column containing match if found, , loops through dataframe, splitting column @ found match. drop unneeded matching column 're_match'.

while works current use case, can't think there must more efficient, vectorised way of doing in pandas, without needing use iterrows() , creating new column. question is, there more optimal way of removing strings match multiple regex patterns column?

in current use case unwanted strings @ end of text block, hence, use of split(...)[0]. however, great if unwanted strings extracted point in text.

also, note combining regexes 1 long single pattern unpreferrable, there tens of patterns of change on regular basis.

df = pd.read_csv('data.csv', index_col=0) patterns = [     '( regex1 \d+)',     '((?: regex 2)? \d{1,2} )',     '( \d{0,2}.?\d{0,2}-?\d{1,2}.?\d{0,2}regex3 )', ]  p in patterns:      df['re_match'] = df['text'].str.extract(         pat=p, flags=re.ignorecase, expand=false     )     df['re_match'] = df['re_match'].fillna('xxxxxxxxxxxxxxx')      index, row in df.iterrows():         df.loc[index, 'text'] = row['text'].split(row['re_match'])[0]  df = df.drop('re_match', axis=1)

thank help

there indeed , called df.applymap(some_function).
consider following example:

from pandas import dataframe import pandas pd, re df = dataframe({'key1': ['1000', '2000'], 'key2': ['3000', 'digits(1234)']})  def cleanitup(val):     """ multiplies digit values """     rx = re.compile(r'^\d+$')     if rx.match(val):         return int(val) * 10     else:         return val  # here magic starts df.applymap(cleanitup)

obviously, made up, in every cell only digits before, these have been multiplied 10, every other value has been left untouched.
in mind, can check , rearrange values if necessary in function cleanitup().

Search This Blog

If cop

python 3.x - Removing strings that match multiple regex patterns from pandas series -

Comments

Post a Comment

Popular posts from this blog

Android volley - avoid multiple requests of the same kind to the server? -

magento2 - Magento 2 admin grid add filter to collection -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -