I have following table in postgres
col1 col2 col3
NCT00000412 Drug Placebo Calcitriol
NCT00000412 Drug Placebo Alendronate
NCT00000774 Biological Placebo version of rgp120/HIV-1MN
NCT00000774 Biological Placebo version of rgp120/HIV-1SF2
NCT00005648 Drug Gemcitabine with Placebo
NCT00007631 Drug Tretinoin 0.1% cream or placebo
NCT00007735 Drug Doxycycline (200mg/day) or Placebo
NCT00032435 Drug PAL-40 Placebo
NCT00032435 Drug abciximab placebo; reteplase placebo, abciximab
NCT00032435 Drug Valsartan placebo + nateglinide placebo
I am trying to extract those rows where placebo word has a single word before or after it. The expected output is:
col1 col2 col3
NCT00000412 Drug Placebo Calcitriol
NCT00000412 Drug Placebo Alendronate
NCT00032435 Drug PAL-40 Placebo
I have tried following regex:
select * from table
where name ~* '\splacebo\w+' or name ~* '\w+placebo\s'
You may use
select * from table
where name ~* '^(?:placebo\s+\S+|\S+\s+placebo)$'
See the regex demo
Details
^ - start of string(?:placebo\s+\S+|\S+\s+placebo) - a non-capturing group matching either
placebo\s+\S+ - placebo, then 1+ whitespaces and then 1+ non-whitespaces| - or\S+\s+placebo - 1+ non-whitespaces, then 1+ whitespaces and then placebo$ - end of string.I think you want:
where col3 ~* '(^\S+\s+placebo)|(placebo\s+\S+$)'
Breakdown:
(
^ beginning of the string
\S+ 1 to N characters other than a space
\s+ 1 to N spaces
placebo literal string "placebo"
)
| OR
(
placebo literal string "placebo"
\s+ 1 to N spaces
\S+ 1 to N characters other than a space
$ end of the string
)
I used \S instead of \w because you want to match on dashes (which \w doesn't do).
As a note, you can do this just using like:
where (name like '% Placebo%' and name not like '% % Placebo%') or
(name like '%Placebo %' and name not like '%Placebo % %')