<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Web Dev Bros &#187; regular expressions</title>
	<atom:link href="http://www.webdevbros.net/category/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.webdevbros.net</link>
	<description>hot talk about web development</description>
	<lastBuildDate>Thu, 20 Jan 2011 19:55:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>lazy stars within regular expressions</title>
		<link>http://www.webdevbros.net/2007/05/10/lazy-stars-within-regular-expressions/</link>
		<comments>http://www.webdevbros.net/2007/05/10/lazy-stars-within-regular-expressions/#comments</comments>
		<pubDate>Thu, 10 May 2007 18:52:24 +0000</pubDate>
		<dc:creator>Michal</dc:creator>
				<category><![CDATA[Javascript]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.webdevbros.net/2007/05/10/lazy-stars-within-regular-expressions/</guid>
		<description><![CDATA[Today i stumbled across something really funny which is called "lazy star" and is used as a term within regular expressions. I did work on an expression and could not get it to work till i found the lazy "guy" which was the solution for the problem. The problem briefly: Let's assume the following string [...]]]></description>
			<content:encoded><![CDATA[<p>Today i stumbled across something really funny which is called "lazy star" and is used as a term within regular expressions. I did work on an expression and could not get it to work till i found the lazy "guy" which was the solution for the problem. <span id="more-64"></span></p>
<p>The problem briefly:<br />
Let's assume the following string (some pseudo html) is given...</p>
<div class="igBar"><span id="lhtml-5"><a href="#" onclick="javascript:showCodeTxt('html-5'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">HTML:</span>
<div id="html-5">
<div class="html">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #009900;"><a href="http://december.com/html/4/element/script.html"><span style="color: #000000; font-weight: bold;">&lt;script&gt;</span></a></span></div>
</li>
<li style="font-weight: bold;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">....</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/script&gt;</span></span></div>
</li>
<li style="font-weight: bold;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">something in between!</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #009900;"><a href="http://december.com/html/4/element/script.html"><span style="color: #000000; font-weight: bold;">&lt;script&gt;</span></a></span></div>
</li>
<li style="font-weight: bold;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">...</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/script&gt;</span></span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>and we want to strip all script tags out of the string including their contents. So that it looks like the following afterwards:</p>
<div class="igBar"><span id="lhtml-6"><a href="#" onclick="javascript:showCodeTxt('html-6'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">HTML:</span>
<div id="html-6">
<div class="html">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">something in between! </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>Yeah you are right we need a smart regular expression here which will strip those unwanted tags. My first attempt resulted in the following pattern which looks reasonable to me (i will use javascript as the language here but this can be applied to any language supporting regular expressions):</p>
<div class="igBar"><span id="ljavascript-7"><a href="#" onclick="javascript:showCodeTxt('javascript-7'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">JAVASCRIPT:</span>
<div id="javascript-7">
<div class="javascript">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0066FF;">/&lt;script<span style="color: #66cc66;">&#91;</span>\s\S<span style="color: #66cc66;">&#93;</span>*&lt;\/script&gt;/gi</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>"grab everything which starts with &lt;script followed by any character (/s/S) any times (*) and finally ending with &lt;/script&gt;" This whole thing works fine but it removes my whole content and the reason is because it grabs the first &lt;script&gt;-tag and strips everything till the last &lt;/script&gt;-tag. Therfore unfortunately the "something in between!" string also vanished. Then I ended up playing around with expressions (what i reallly love - i cannot imagine doing something better) so that it grabs everything within the script tag but not if a new script tag already started. It took me at least <strong>2 hours</strong> to manage this! damn i was really frustrated :)<br />
But all of a sudden I found the MAGIC question mark (?) which makes stars to lazy stars. And because of this terminology my day was already saved (no coffee, no drugs, no pills anymore,...) and as a "nice" side effect the regular expression finally worked how it should.. here is the working pattern with the lazy star:</p>
<div class="igBar"><span id="ljavascript-8"><a href="#" onclick="javascript:showCodeTxt('javascript-8'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">JAVASCRIPT:</span>
<div id="javascript-8">
<div class="javascript">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#767676;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #0066FF;">/&lt;script<span style="color: #66cc66;">&#91;</span>\s\S<span style="color: #66cc66;">&#93;</span>*?&lt;\/script&gt;/gi</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>can you feel the lazyness? :) I did especially for myself. So as an explanation to this I refer to an article which describes this behavoir best: <a href="http://www.regular-expressions.info/examples.html">http://www.regular-expressions.info/examples.html</a></p>
<blockquote><p>&lt;TAG\b[^&gt;]*&gt;(.*?)&lt;/TAG&gt;  matches the opening and closing pair of a specific HTML tag. Anything between the tags is captured into the first backreference. The question mark in the regex makes the star lazy, to make sure it stops before the first closing tag rather than before the last, like a greedy star would do. This regex will not properly match tags nested inside themselves, like in &lt;TAG&gt;one&lt;TAG&gt;two&lt;/TAG&gt;one&lt;/TAG&gt;.</p></blockquote>
<p>I have the feeling that I understand the usage of the question mark here somehow but I am not 100% confident about this cause the theory says the following about quantification (<a href="http://en.wikipedia.org/wiki/Regular_expression">http://en.wikipedia.org/wiki/Regular_expression</a>):</p>
<blockquote><p><strong>?</strong> The question mark indicates there is 0 or 1 of the previous expression. For example, "colou?r" matches both color and colour.<br />
<strong>*</strong> The asterisk indicates there are 0, 1 or any number of the previous expression. For example, "go*gle" matches ggle, gogle, google, gooogle, etc.<br />
<strong>+</strong> The plus sign indicates that there is at least 1 of the previous expression. For example, "go+gle" matches gogle, google, gooogle, etc. (but not ggle).
</p></blockquote>
<p>Thats kinda strange to me because it does only mention how often that preceding expression is allowed to occur and not how the parsing is done. Maybe someone has a better explanation for me...</p>
<div><table> <td><iframe src='http://digg.com/api/diggthis.php?w=new&amp;u=http://www.webdevbros.net/2007/05/10/lazy-stars-within-regular-expressions/&amp;t=lazy+stars+within+regular+expressions&amp;s=normal' height='80' width='52' frameborder='0' scrolling='no'></iframe></td> <td><iframe src='http://www.reddit.com/button_content?newwindow=1&amp;url=http://www.webdevbros.net/2007/05/10/lazy-stars-within-regular-expressions/&amp;title=lazy+stars+within+regular+expressions&amp;t=2 ' height='80' width='52' scrolling='no' frameborder='0' ></iframe></td> <td><iframe src='http://widgets.dzone.com/links/widgets/zoneit.html?url=http://www.webdevbros.net/2007/05/10/lazy-stars-within-regular-expressions/&amp;title=lazy+stars+within+regular+expressions&amp;t=1 ' height='80' width='52' scrolling='no' frameborder='0' ></iframe></td> <td><script type="text/javascript"><!--yahooBuzzArticleHeadline=lazy+stars+within+regular+expressions;//--></script><script type="text/javascript" src="http://d.yimg.com/ds/badge2.js" badgetype=square></script></td> <td><script type="text/javascript">tweetmeme_url='http://www.webdevbros.net/2007/05/10/lazy-stars-within-regular-expressions/'; tweetmeme_style = 'normal';; </script><script type="text/javascript" src="http://tweetmeme.com/i/scripts/button.js" ></script></td></table></div><!-- This is a HTML comment, it will not display in any page. Feel free to remove this comment if it cause any inconvenient to you.
	Thanks for using digg digg, please visit http://www.mkyong.com/blog/digg-digg-wordpress-plugin for any comments and ideas, 
	
    Author : Yong Mook Kim
    Website : http://www.mkyong.com
	-->]]></content:encoded>
			<wfw:commentRss>http://www.webdevbros.net/2007/05/10/lazy-stars-within-regular-expressions/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

