unicode - Regex to delete emojis from string -
i have list of unicode emojis , want strip emojis them (i.e want whole first part , name @ end of row). sample rows these ones:
1f468 1f3fd 200d 2695 fe0f ; fully-qualified # 👨🏽⚕️ man health worker: medium skin tone 1f469 1f3ff 200d 2695 ; non-fully-qualified # 👩🏿⚕ woman health worker: dark skin tone
(from have deleted spaces sake of simplicity). want match [non-]fully-qualified
part #
, emoji, can delete them sed
. have tried following regex
sed -e 's/\<[on-]*fully-qualified\># *.+?(?=[a-za-z]) //g'
which tries match words [non-]fully-qualified
space, #
symbol, , whatever can find (non-greedy) until first letter, , replace empty string.
i have output:
1f468 1f3fd 200d 2695 fe0f ; man health worker: medium skin tone 1f469 1f3ff 200d 2695 ; woman health worker: dark skin tone
i have tried several posted answers no avail, , besides, i'm trying match pattern between 2 boundaries i'm having trouble
edit: i'm trying run command in git bash shipped git windows
i'm still not pretty sure, might work:
sed 's/;.*fully-qualified\s*#[^a-za-z]*/; /'
this replace semicolon ;
, followed character .*
, followed "fully-qualified" text, followed number of spaces, followed hashtag, followed character not a-za-z [^a-za-z]
, , replace semicolon followed space.
to sure [a-za-z]
captures a z , a z without other characters, seems problem, quick fix command use lc_all=c
:
lc_all=c sed 's/;.*fully-qualified\s*#[^a-za-z]*/; /' file
Comments
Post a Comment