Crawler of Deeds Part 2

Posted August 10, 2016 by Ryan

Building a (small) Toolchain

Quickly after writing a few low level phantomjs lines, I came across a library that wrapped phantomjs with a higher level, cleaner API.

Enter CasperJS, a wrapper for phantomjs with some test suite capabilities as well. It took some finangling and conversion from my Python solution, but eventually I had a working, headless crawler of this website. It was now mostly limited by the site’s speed, with fewer dependencies: 2 javascript libraries.

I still want to host this, however! If only there was an easy way to kick off and test js running server-side. Oh yeah, my friend Node.js. Working at a Microsoft web shop for a while, nodejs was the butt of a lot of jokes. ‘Webscale’ this, and ‘nonblocking’ that. Despite this, I jumped right in during my time at some hipster coffee shop listening to them play the Mountain Goats; I fully accepted what I had become.

Wrapping my fresh code in some node I ran into some coupling issues with libraries – casper and node didn’t play nice. I looked down at my fresh pour over in disgust and said ‘fuck it’, why not – enter SpookyJS to drive Casper from Node, which was a wrapper for Phantom (gasp). Things got pretty weird here: I had a hell of a time with scope. Along with Javascript’s fun scoping out of the box, I was dealing with 3 (3!) different execution environments: node\spooky, casper, and the browser’s javascript environments. Fully embracing dynamic scope and callbacks (note my experience in web development had been almost entirely backend focused), I finished this part of the app. It was beginning to be a bit of a Frankenstein but in this way it was fun as a side project. In fact, I wanted more. I needed to go deeper. Next post I will dive into the steps to host this project to be a web api, as well as diving into some more code.

Back to devlog