HTML

Recursive – Explore the endless web

Wow, well that took longer than expected! 44 days ago I blogged that I had started work on a second version of my Chrome Crawler extension and have only just managed to get it to a state I was happy with enough to release it. To be fair I had been on a trip to New York during that period so perhaps I can be excused. Having said that however I think the time has been well spent and am fairly proud of the result.

TL;DR

Recursive is an experimental tool for visualising the world wide web. Given a URL it downloads the page search for links and then recursively downloads those. The information is then displayed in a node-based graph.

The Name

So what’s this all about? Why is it called ‘Recursive’, why not ‘Chrome Crawler 2′?

Although I would like to have called the spiritual successor to ‘Chrome Crawler’, ‘Chrome Crawler 2′ Chrome’s branding guidelines forbid using the Chrome name or logo (they brought this in since the launch of Chrome Crawler 1).

With that in mind I decided that rather bend Chrome Crawler’s name and logo to fit the guidelines I would create a whole new logo, name and app. The app is a total rewrite from the previous iteration anyway so I thought it justified.

According to dictionary.com there is no definition for “Recursive” or “Recurse” but there is one for “Recursion”:

2. the application of a function to its own values to generate an infinite sequence of values.

So a tool that downloads pages, follows the links on that page to download other pages seemed like a rather apt description of something that is “Recursive”.

Video

Before I go much further, I put together this little video demonstrating some of the extensions core functionality:

Installing

Installing and upgrading is dead simple thanks to how Google Chrome’s extension system works. Just head over to this link and hit install:

https://chrome.google.com/webstore/detail/recursive/hbgbcmcmpiiciafmolmoapfgegbhbmcc

Then to launch it visit any website and hit the little icon in the Omnibox:

How it works

Recursive works by taking in a starting URL which it uses to download the page it points to:

Once that page is downloaded Recursive parses it looking for links and files. If it finds things it thinks are files then it records them against that URL. It then proceeds to visit all the links in turn, downloading the page then parsing the for yet more files and links.

This cycle continues until a certain “depth” is reached which is the maximum number of links away from the starting URL. You can set the maximum depth allowed in the settings:

One of the key improvements of Recursive over Chrome Crawler is the way it visualises the data as it is returned:

Every page is grouped by its domain and is represented by a circular “node”.

So for example “http://mikecann.co.uk/personal-project/tinkering-with-typescript/” would be grouped under the “mikecann.co.uk” domain. Any other pages found while running that match this domain are added as little page icons inside the host node.

Any files that are found on a given page are given an appropriate icon and added to that page’s domain node.

As Recursive downloads pages and follows links it records the path it takes. It then draws lines between the nodes that are linked:

Interacting

Using the mouse wheel you can zoom in and out to get a better perspective. Click and drag to move about the recursive space. You can also run the app in fullscreen if you so desire.

If you click on a node it tells Recursive to explore that node for one extra level depth.

Right clicking a node opens a menu that lets you either open all the pages contained in that node or view the files for that node.

Files

By using the context menu for a node you can checkout all the files that Recursive found for that node. The files are separated into various categories which you can toggle on or off:

Then if you wish you can download all the files as a zip.

The tech

I think i’ll leave this section for the next blog post as this one is long enough already.

If you would like specific info on the tech in the meantime however or have some suggestions for features then don’t hesitate to contact me: mike.cann@gmail.com

Well that’s about it hope you like it. I had a blast making it even if it did take alot longer than I was expecting. It doesn’t have much purpose really but there is a lot of cool new tech in there, which is reason enough to make it surely?

Printomi Maps

Well since we have made the decision to discontinue Printomi I have been backing up the databases and downloading the 90GB+ of images that users have uploaded.

Well it wouldn’t be like me if I didnt start thinking about what cool things I do with all those pixels. I remember seeing those cool Mincraft maps that use the Google Maps API to explore the Minecraft servers and it got me thinking if it could be possible to do something like that but for the Printomi images.

Well it turns out yes, you can: http://www.mikecann.co.uk/projects/PrintomiMaps/

Its only 1,024 of the total 27,497 images that were uploaded but even this small ratio results in a total map size of 115,200 x 86,400 pixels. The Google Maps obviously cant handle an image that large and it would take forever to download so you must split it up into many tiles. To achieve the zooming and performance you must also provide the map at various sizes.

With a map of 32×32 image with each image at 3600×2700 pixels and each tile at 450×337 this results in 87,000 tiles at 8 zoom levels! Generating and uploading all that data takes several days. But the result is you can pan and zoom around what seems like one huge 115k x 84k image.

If you are interested I have uploaded the source code I wrote to generate the 87k tiles at the various zoom levels from the 1024 3600×2700 input images. You can find it over on Github: https://github.com/mikecann/PrintomiMaps

If I had enough disk space on my web server I would love to do the whole 37,000 images in this way. If I did 128×128 that would use 16,384 of the images. This would result in 10 zoom levels and a map 460,800 x 345,600 pixels large. I have no idea or how long it would take to generate and upload all those tiles :P

Chrome Crawler – A web-crawler written in Javascript

Depending on your level of geekness you may or may not enjoy this one.

I proudly present Chrome Crawler, my latest Google Chrome extension:

The idea is simple really. You just give it a URL, it then goes off and finds all the links on that page then follows them to more pages then gets all the links and follows them and so on and so on.

Along the way it checks each page to see if there are any ‘interesting’ files linked there, if it finds an interesting link it will flag it for you so you can check it out.

Theres an options page that lets you customise the way it all works:

If you are still confused check out the video below:

So why did I make this? Well to be frank, I made it mostly “just ’cause I can”!

Also having learned from my last Chrome Extension project PostToTumblr I realised the Chrome API allowed you to do some things that you wouldn’t normally be allowed to do on a website (nameley the Cross-Origin XHR) and I wanted to do something to take advantage of it.

It didnt take me long to knock out this project, one lazy Saturday for the majority of the code and today for a quick fix or two and to write this post and make the video. As such I expect there to be many bugs and problems so if you encounter one drop me an email (my address is found in the options page).

Oh finally, I wouldnt try using this on a google page as you will likely end up seeing this quite often:

Anyways you can grab it over on the Chrome extensions gallery here. If you enjoy it please leave me a review / comment, much love!

1 2  Scroll to top