Company logo
  • Empleos
  • Bootcamp
  • Acerca de nosotros
  • Para profesionales
    • Inicio
    • Empleos
    • Cursos y retos
    • Preguntas
    • Profesores
    • Bootcamp
  • Para empresas
    • Inicio
    • Nuestro proceso
    • Planes
    • Pruebas
    • Nómina
    • Blog
    • Comercial
    • Calculadora

0

39
Vistas
Python extracting string

I have a dataframe where one of the columns which is in string format looks like this

    filename
 0  Machine02-2022-01-28_00-21-45.blf.424
 1  Machine02-2022-01-28_00-21-45.blf.425
 2  Machine02-2022-01-28_00-21-45.blf.426
 3  Machine02-2022-01-28_00-21-45.blf.427
 4  Machine02-2022-01-28_00-21-45.blf.428

I want my column to look like this

      filename
 0    2022-01-28 00-21-45 424
 1    2022-01-28 00-21-45 425
 2    2022-01-28 00-21-45 426
 3    2022-01-28 00-21-45 427
 4    2022-01-28 00-21-45 428

I tried this code

df['filename'] = df['filename'].str.extract(r"(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d+)", r"\1 \2 \3")

I am getting this error, unsupported operand type(s) for &: 'str' and 'int'.
Can anyone please tell me where I am doing wrong ?

11 months ago · Santiago Trujillo
4 Respuestas
Responde la pregunta

0

Use str.replace and add .*- to remove strings like Machine02-:

df['filename'] = df['filename'].str.replace(r".*-(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d+)", r"\1 \2 \3")
print(df)

# Output
                  filename
0  2022-01-28 00-21-45 424
1  2022-01-28 00-21-45 425
2  2022-01-28 00-21-45 426
3  2022-01-28 00-21-45 427
4  2022-01-28 00-21-45 428
11 months ago · Santiago Trujillo Denunciar

0

please try this:

df['filename'] = df['filename'].str.split('-',1).apply(lambda x:' '.join(x[1].split('_')).replace('.blf.',' '))
11 months ago · Santiago Trujillo Denunciar

0

Regex are nice, but sometimes is easier and more readable to make a replace, if the arguments won't ever change:

df['filename'] = df['filename'].str.replace('Machine02-','',regex=False)
df['filename'] = df['filename'].str.replace('.blf.',' ',regex=False)
11 months ago · Santiago Trujillo Denunciar

0

Use replace

df['filename']=df['filename'].str.replace('Machine|\.blf\.',' ',regex=True).str.strip().str.replace('^\d+\-','',regex=True)



 filename
0  2022-01-28_00-21-45 424
1  2022-01-28_00-21-45 425
2  2022-01-28_00-21-45 426
3  2022-01-28_00-21-45 427
4  2022-01-28_00-21-45 428

or

Extract values between e02 and .blf

df['filename']=df['filename'].str.extract('((?<=[e02])[\w|\-]+(?=[.blf]))')



    filename
0  02-2022-01-28_00-21-45
1  02-2022-01-28_00-21-45
2  02-2022-01-28_00-21-45
3  02-2022-01-28_00-21-45
4  02-2022-01-28_00-21-45
11 months ago · Santiago Trujillo Denunciar
Responde la pregunta
Encuentra empleos remotos