Revenue Source

Welcome to the Revenue Source affiliate marketing forums.

You are viewing our internet marketing and SEO forums as a guest which gives you limited access to most of our discussions.  By joining our free community, you will have access to post affiliate marketing topics, communicate privately with other members (PM), exchange SEO strategies, and access many other special features.  Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems, please don't hesitate to contact us.

Go Back   Revenue Source > Affiliate Marketing Hangout > Internet Marketing Articles > SEO / SEM
Reload this Page Scraping 101: Extracting Anchor Text with Regexp
Tags: , , , ,

Reply
 
LinkBack Thread Tools Search this Thread
Old
  (#1 (permalink))
SEO Blogs is Offline
Revenue Source Veteran
SEO Blogs is worth a listen.SEO Blogs is worth a listen.SEO Blogs is worth a listen.
 
SEO Blogs's Avatar
 
Join Date: Jul 2005
Posts: 817
   
Scraping 101: Extracting Anchor Text with Regexp - 02-09-2008

There are many ways to skin a cat, but when it comes to scraping websites, I like parsing content with regexp. One of the biggest problems I bumped into when parsing HTML is matching opening and closing tags.
For example:
(]+>)(.*)
Ok let’s try that in English:
  1. (]+>) matches .
  2. (.*) *should* match anchor text (I’ll elaborate on that).
  3. matches the closing A tag.
search engine land
will correctly extract the anchor text “search engine land.” BUT because (.*) is greedy,
search engine land is cool because vanessa fox posts there.
will incorrectly extract:
search engine land is cool because vanessa fox posts there.
as anchor text. Hmm..
So how do you fix this? Instead of using a .*, use .*? or other non-greedy modifiers like +?, ??, or {m,n}? (I haven’t tested the last three, I assume they work).
(]+>)(.*?) will extract anchor text from web pages.


Scraping 101: Extracting Anchor Text with Regexp - Read More...
  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads for: Scraping 101: Extracting Anchor Text with Regexp
Thread Thread Starter Forum Replies Last Post
DeveloperTutorials.com: Scraping Links With PHP Affiliate Blogs Programming Help 0 01-14-2008 01:52 PM
JSLabs: How to create a self extracting PHP script Affiliate Blogs Programming Help 0 10-09-2007 04:34 PM
MakeBeta Blog: Scraping Links With PHP Affiliate Blogs Programming Help 0 08-15-2007 05:27 PM
Link Exchange Miracles with *Junk* Words in Anchor Texts ToolInventor Search Engine Optimization / Marketing 3 01-18-2007 02:15 AM
Anchor Text Optimization (multi post article) RS Marifer Search Engine Optimization / Marketing 3 03-18-2005 11:38 AM



© 2004-6 RevenueSource.com.  All rights reserved.  Do not duplicate or redistribute in any form.
This website and its logos/design are property of RevenueSource.com.  All rights reserved. vBSEO 3.2.0 RC7


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34