Spidering Hacks

The latest book in the O'Reilly Hacks series, "Spidering Hacks," (written by Kevin "Morbus Iff" Hemenway and Tara "ResearchBuzz" Calishain) is out. It's the site-scraper's bible, with 100 tips and tricks for sucking in data from the Web.

Spidering Hacks takes you to the next level in Internet data retrieval–beyond search engines–by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented–you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you.

Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content.


(via Ben Hammersley)