Spidering Hacks

Cory Doctorow 5:52 am Sat Nov 1, 2003

The latest book in the O'Reilly Hacks series, "Spidering Hacks," (written by Kevin "Morbus Iff" Hemenway and Tara "ResearchBuzz" Calishain) is out. It's the site-scraper's bible, with 100 tips and tricks for sucking in data from the Web.

Spidering Hacks takes you to the next level in Internet data retrieval–beyond search engines–by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented–you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you.

Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content.

(via Ben Hammersley)

Amazon used a China firm on U.S. blacklist for thermal cameras to monitor workers for COVID-19 fever

• Amazon's new Chinese thermal spycam vendor was blacklisted by U.S. over allegations it helped China detain and monitor Uighurs and other Muslim minorities READ THE REST
Financial Times journalist Mark Di Stefano accused of accessing private Zoom meetings, then publishing the information

Mark Di Stefano of the Financial Times is accused by The Independent of accessing private Zoom meetings held by The Independent and The Evening Standard as journalists were learning how… READ THE REST
Hackers tried to break into WHO, which faces more than two-fold increase in cyberattacks

Hackers tried to break into the World Health Organization earlier in March, as the COVID-19 pandemic spread, Reuters reports. Security experts blame an advanced cyber-espionage hacker group known as DarkHotel.… READ THE REST
Short Post, just one paragraph

Disclosure: Boing Boing earns a commission on purchases made through links in this post. Dessert cheesecake wafer bear claw fruitcake. Fruitcake chupa chups donut candy canes marzipan. Apple pie sweet… READ THE REST
Save 50% on a 1-year subscription to Dashlane's premium password manager

Disclosure: Boing Boing earns a commission on purchases made through links in this post. We all know vital information about ourselves and our private digital accounts can be compromised by… READ THE REST
The Bite Helper removes the itch of a mosquito bite in seconds

Disclosure: Boing Boing earns a commission on purchases made through links in this post. While mosquitoes have certainly earned their title as the deadliest animal on earth, their impact on… READ THE REST