I have following table in postgres
col1 col2 col3
NCT00000412 Drug Placebo Calcitriol
NCT00000412 Drug Placebo Alendronate
NCT00000774 Biological Placebo version of rgp120/HIV-1MN
NCT00000774 Biological Placebo version of rgp120/HIV-1SF2
NCT00005648 Drug Gemcitabine with Placebo
NCT00007631 Drug Tretinoin 0.1% cream or placebo
NCT00007735 Drug Doxycycline (200mg/day) or Placebo
NCT00032435 Drug PAL-40 Placebo
NCT00032435 Drug abciximab placebo; reteplase placebo, abciximab
NCT00032435 Drug Valsartan placebo + nateglinide placebo
I am trying to extract those rows where placebo word has a single word before or after it. The expected output is:
col1 col2 col3
NCT00000412 Drug Placebo Calcitriol
NCT00000412 Drug Placebo Alendronate
NCT00032435 Drug PAL-40 Placebo
I have tried following regex:
select * from table
where name ~* '\splacebo\w+' or name ~* '\w+placebo\s'
You may use
select * from table
where name ~* '^(?:placebo\s+\S+|\S+\s+placebo)$'
See the regex demo
Details
^
- start of string(?:placebo\s+\S+|\S+\s+placebo)
- a non-capturing group matching either
placebo\s+\S+
- placebo
, then 1+ whitespaces and then 1+ non-whitespaces|
- or\S+\s+placebo
- 1+ non-whitespaces, then 1+ whitespaces and then placebo
$
- end of string.I think you want:
where col3 ~* '(^\S+\s+placebo)|(placebo\s+\S+$)'
Breakdown:
(
^ beginning of the string
\S+ 1 to N characters other than a space
\s+ 1 to N spaces
placebo literal string "placebo"
)
| OR
(
placebo literal string "placebo"
\s+ 1 to N spaces
\S+ 1 to N characters other than a space
$ end of the string
)
I used \S
instead of \w
because you want to match on dashes (which \w
doesn't do).
As a note, you can do this just using like
:
where (name like '% Placebo%' and name not like '% % Placebo%') or
(name like '%Placebo %' and name not like '%Placebo % %')