Web Scraping with Regex

Link: https://dreamyz.net/webscrape-hpcreatures.php
(the website will be slow to load. be patient and refresh in case of timeout)

Just a few of the 227 entries parsed by the code.

I used Regular Expressions in PHP to parse the Harry Potter wikia page for monsters, automatically entering each page and extracting the image and other info from the site. It does this every time the script is run, as it’s not actually storing any information on disk. This means that if a new Harry Potter or Fantastic Beasts movie comes out, the page will automatically be updated without any extra work from me. While the script is directed at the HP wikia, I have a feeling this script while nearly unchanged can work on many more wiki pages, but that’s yet to be supported.

Leave a Reply

Your email address will not be published. Required fields are marked *