regex - PHP - Parsing URL's in a message while ignoring all HTML Tags -
i trying process messages in small, private, ticketing system automatically parse url's clickable links without messing html may posted. until now, function parse url's has worked well, 1 or 2 users of system want able post embedded images rather attachments.
this existing code converts strings clickable url's, please note have limited knowledge of regex , have relied on assistance others build this
$text = preg_replace( array( '/(^|\s|>)(www.[^<> \n\r]+)/iex', '/(^|\s|>)([_a-za-z0-9-]+(\\.[a-za-z]{2,3})?\\.[a-za-z]{2,4}\\/[^<> \n\r]+)/iex', '/(?(?=<a[^>]*>.+<\/a>)(?:<a[^>]*>.+<\/a>)|([^="\']?)((?:https?):\/\/([^<> \n\r]+)))/iex' ), array( "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a> \\3':'\\0'))", "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a> \\4':'\\0'))", "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\" target=\"_blank\">\\3</a> ':'\\0'))", ), $text); return $text;
how go modifying existing function, such 1 above, exclude hits wrapped in html tags such <img
without hurting functionality of it.
example:
`<img src="https://example.com/image.jpg">`
turns into
`<img src="<a href="https://example.com/image.jpg" target="_blank">example.com/image.jpg</a>">`
i have done searching before posting, popular hits turning are;
obviously common trend "this wrong way it" true - while agree, want keep function quite light. system used privately within organisation , wish process img
tags , url's automatically using this. else left plain, no lists, code tags quotes etc.
i appreciate assistance here.
summary: how modify existing set of regular expression rules exclude matchs found within img or other html tag found within block of text.
from can gather \e
modifier error, php version can maximum of php5.4. preg_replace_callback()
available php5.4 , -- may tight squeeze!
while would not roped big back-and-forth multitude of answer edits, give traction.
my method follow not stake career on. , stated in comments under question , in many, many pages on -- html should not parsed regex. (disclaimer complete)
php5.4.34 demo link & regex pattern demo link
$text='this has img tag <img src="https://example.com/image.jpg"> should igrnored. img needs become tag: https://example.com/image.jpg. <a href="https://www.example.com/image" target="_blank">tagged link</a> target. <a href="https://example.com/image?what=something&when=something">tagged link</a> without target. untagged url http://example.com/image.jpg. (please extend battery of test cases isolate monkeywrenching cases) short url example.com/ short url example.com/index.php?a=b&c=d www.example.com'; $pattern='~<(?:a|img)[^>]+?>(*skip)(*fail)|(((?:https?:)?(?:/{2})?)(w{3})?\s+(\.\s+)+\b(?:[?#&/]\s*)*)~'; function taggify($m){ if(preg_match('/^bmp|gif|png|je?pg/',$m[4])){ // add more filetypes needed return "<img src=\"{$m[0]}\">"; }else{ //var_export(parse_url($m[0])); // if need preparations, consider using parse_url() return "<a href=\"{$m[0]}\" target=\"_blank\">{$m[0]}</a>"; } } $text=preg_replace_callback($pattern,'taggify',$text); echo $text;
output:
this has img tag <img src="https://example.com/image.jpg"> should igrnored. img needs become tag: <img src="https://example.com/image.jpg">. <a href="https://www.example.com/image" target="_blank">tagged link</a> target. <a href="https://example.com/image?what=something&when=something">tagged link</a> without target. untagged url <img src="http://example.com/image.jpg">. (please extend battery of test cases isolate monkeywrenching cases) short url <a href="example.com/" target="_blank">example.com/</a> short url <a href="example.com/index.php?a=b&c=d" target="_blank">example.com/index.php?a=b&c=d</a> <a href="www.example.com" target="_blank">www.example.com</a>
the skip-fail technique works "disqualify" unwanted matches. qualifying matches expressed section of pattern follows pipe (|
) after (*skip)(*fail)
Comments
Post a Comment