<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:pingback="http://madskills.com/public/xml/rss/module/pingback/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>popcyclical - regex</title>
    <link>http://popcyclical.com/</link>
    <description>The software development blog of James "poprhythm" Kolpack</description>
    <language>en-us</language>
    <copyright>James Kolpack</copyright>
    <lastBuildDate>Sun, 12 Sep 2010 02:03:15 GMT</lastBuildDate>
    <generator>newtelligence dasBlog 2.3.12105.0</generator>
    <managingEditor>dasblog@example.com</managingEditor>
    <webMaster>dasblog@example.com</webMaster>
    <item>
      <trackback:ping>http://popcyclical.com/Trackback.aspx?guid=368068e5-b898-46f6-90f6-d34c665d7db4</trackback:ping>
      <pingback:server>http://popcyclical.com/pingback.aspx</pingback:server>
      <pingback:target>http://popcyclical.com/PermaLink,guid,368068e5-b898-46f6-90f6-d34c665d7db4.aspx</pingback:target>
      <dc:creator>James Kolpack</dc:creator>
      <wfw:comment>http://popcyclical.com/CommentView,guid,368068e5-b898-46f6-90f6-d34c665d7db4.aspx</wfw:comment>
      <wfw:commentRss>http://popcyclical.com/SyndicationService.asmx/GetEntryCommentsRss?guid=368068e5-b898-46f6-90f6-d34c665d7db4</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
In <a href="http://weblogs.asp.net/jgalloway/">Jon Galloway’s</a><a href="http://weblogs.asp.net/jgalloway/archive/2005/09/27/426087.aspx">Splitting
Camel Case with RegEx</a> blog post, he introduced a simple regular expression replacement
which can split “ThisIsInPascalCase” into “This Is In Pascal Case”.  Here’s the
original code:
</p>
        <pre>
          <code>output = System.Text.RegularExpressions.Regex.Replace( input, "([A-Z])",
" $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim(); </code>
        </pre>
        <p>
Simple and effective.  Matches any capital letters and inserts a space before
them.  But there’s room for improvement.  First, the call to <code>String.Trim()</code> to
remove any spaces potentially added if the first letter is uppercase – this can be
handled with a <a href="http://msdn.microsoft.com/en-us/library/az24scfc.aspx#grouping_constructs">“Match
if prefix is absent” group</a> containing the “beginning of line” character <code>^</code>. 
This prevents any matches from occurring on the first character, which eliminates
the need for the <code>String.Trim()</code> call.  The formal name for this grouping
construct is “Zero-width negative lookbehind assertion”, but just think of it as “if
you see what’s in here, don’t match the next thing”.
</p>
        <pre>
          <code> (?&lt;!^)([A-Z])</code>
        </pre>
        <p>
Next - there’s a potential issue with how acronyms get handled with this.  Given
this fictional book title: “WCFForNoobs” – the split will occur on each uppercase
letter resulting in “W C F For Noobs”.  The fix is simple, though – require that
uppercase letters be followed by a lowercase:
</p>
        <pre>
          <code> (?&lt;!^)([A-Z][a-z]) </code>
        </pre>
        <p>
…Now it’ll result in “WCF For Noobs” (aren’t we all!).  But now it won’t add
a space before the acronym – for “LearnWCFInSixEasyMonths”, the result will be “LearnWCF
In Six Easy Months”.  No problem – add an alternate match for a lowercase letter
coming before the uppercase letter.  The replace pattern makes this more difficult
– we don’t want the space to go before the lowercase letter, we want it between the
lowercase and the first capital letter of the acronym.  RegEx can handle this
with another lookbehind match group – “Match prefix but exclude it” - <code>(?&lt;=)</code>. 
This allows the match to occur on the lowercase-uppercase pair, but only the uppercase
portion will get matched, so when it comes time to run the replacement, the space
will get inserted between the two letters.  By itself, that’ll look like this:
</p>
        <pre>
          <code> ((?&lt;=[a-z])[A-Z]) </code>
        </pre>
        <p>
Great!  But this needs to be combined with previous expression.  Easy accomplished
with an either/or match using the vertical bar “or” construct:
</p>
        <pre>
          <code> (?&lt;!^)([A-Z][a-z]|(?&lt;=[a-z])[A-Z]) </code>
        </pre>
        <p>
The example “LearnWCFInSixEasyMonths” will now be split into “Learn WCF In Six Easy
Months”.  These same techniques can be used for additional splits – perhaps on
numbers or underscores.  More generally, <a href="http://www.regular-expressions.info/lookaround.html">lookbehind
and lookahead are great tools</a> to have in your RegEx toolbelt.
</p>
        <img width="0" height="0" src="http://popcyclical.com/aggbug.ashx?id=368068e5-b898-46f6-90f6-d34c665d7db4" />
        <br />
        <hr />
        <a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=1252729" rel="tag" style="display:none">CodeProject</a>
      </body>
      <title>Splitting Pascal/Camel Case with RegEx Enhancements</title>
      <guid isPermaLink="false">http://popcyclical.com/PermaLink,guid,368068e5-b898-46f6-90f6-d34c665d7db4.aspx</guid>
      <link>http://popcyclical.com/2010/09/12/SplittingPascalCamelCaseWithRegExEnhancements.aspx</link>
      <pubDate>Sun, 12 Sep 2010 02:03:15 GMT</pubDate>
      <description>&lt;p&gt;
In &lt;a href="http://weblogs.asp.net/jgalloway/"&gt;Jon Galloway’s&lt;/a&gt; &lt;a href="http://weblogs.asp.net/jgalloway/archive/2005/09/27/426087.aspx"&gt;Splitting
Camel Case with RegEx&lt;/a&gt; blog post, he introduced a simple regular expression replacement
which can split “ThisIsInPascalCase” into “This Is In Pascal Case”.&amp;nbsp; Here’s the
original code:
&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;output = System.Text.RegularExpressions.Regex.Replace( input, "([A-Z])",
" $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim(); &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
Simple and effective.&amp;nbsp; Matches any capital letters and inserts a space before
them.&amp;nbsp; But there’s room for improvement.&amp;nbsp; First, the call to &lt;code&gt;String.Trim()&lt;/code&gt; to
remove any spaces potentially added if the first letter is uppercase – this can be
handled with a &lt;a href="http://msdn.microsoft.com/en-us/library/az24scfc.aspx#grouping_constructs"&gt;“Match
if prefix is absent” group&lt;/a&gt; containing the “beginning of line” character &lt;code&gt;^&lt;/code&gt;.&amp;nbsp;
This prevents any matches from occurring on the first character, which eliminates
the need for the &lt;code&gt;String.Trim()&lt;/code&gt; call.&amp;nbsp; The formal name for this grouping
construct is “Zero-width negative lookbehind assertion”, but just think of it as “if
you see what’s in here, don’t match the next thing”.
&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; (?&amp;lt;!^)([A-Z])&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
Next - there’s a potential issue with how acronyms get handled with this.&amp;nbsp; Given
this fictional book title: “WCFForNoobs” – the split will occur on each uppercase
letter resulting in “W C F For Noobs”.&amp;nbsp; The fix is simple, though – require that
uppercase letters be followed by a lowercase:
&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; (?&amp;lt;!^)([A-Z][a-z]) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
…Now it’ll result in “WCF For Noobs” (aren’t we all!).&amp;nbsp; But now it won’t add
a space before the acronym – for “LearnWCFInSixEasyMonths”, the result will be “LearnWCF
In Six Easy Months”.&amp;nbsp; No problem – add an alternate match for a lowercase letter
coming before the uppercase letter.&amp;nbsp; The replace pattern makes this more difficult
– we don’t want the space to go before the lowercase letter, we want it between the
lowercase and the first capital letter of the acronym.&amp;nbsp; RegEx can handle this
with another lookbehind match group – “Match prefix but exclude it” - &lt;code&gt;(?&amp;lt;=)&lt;/code&gt;.&amp;nbsp;
This allows the match to occur on the lowercase-uppercase pair, but only the uppercase
portion will get matched, so when it comes time to run the replacement, the space
will get inserted between the two letters.&amp;nbsp; By itself, that’ll look like this:
&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; ((?&amp;lt;=[a-z])[A-Z]) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
Great!&amp;nbsp; But this needs to be combined with previous expression.&amp;nbsp; Easy accomplished
with an either/or match using the vertical bar “or” construct:
&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; (?&amp;lt;!^)([A-Z][a-z]|(?&amp;lt;=[a-z])[A-Z]) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
The example “LearnWCFInSixEasyMonths” will now be split into “Learn WCF In Six Easy
Months”.&amp;nbsp; These same techniques can be used for additional splits – perhaps on
numbers or underscores.&amp;nbsp; More generally, &lt;a href="http://www.regular-expressions.info/lookaround.html"&gt;lookbehind
and lookahead are great tools&lt;/a&gt; to have in your RegEx toolbelt.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://popcyclical.com/aggbug.ashx?id=368068e5-b898-46f6-90f6-d34c665d7db4" /&gt;
&lt;br /&gt;
&lt;hr /&gt;
&lt;a href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=1252729" rel="tag" style="display:none"&gt;CodeProject&lt;/a&gt;</description>
      <comments>http://popcyclical.com/CommentView,guid,368068e5-b898-46f6-90f6-d34c665d7db4.aspx</comments>
      <category>c#</category>
      <category>regex</category>
    </item>
  </channel>
</rss>