The beginning of winter is often the occasion to write programs. This year I wrote just for fun w2m.py, answering a question raised by my co-author Charles Bordenave. This Python 2.7 module+program consists in a spider which explores part of the World Wide Web, extracts the adjacency matrix, and computes its spectrum. Below the results for Wikipedia in Kabyle language. On the technical side, w2m.py has a simple object oriented code, which makes use of threading and numpy + matplotlib. I will hopefully convert it to Python 3.x some day (2to3). For Debian/GNU Linux, you may use apt-get install python-matplotlib python-scipy python-argparse in order to install all the necessary packages.
Related: The University of Florida Sparse Matrix Collection (pointed out by Charles Bordenave).
Here is a chunk of the console output of w2m:
INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:27 http://kab.wikipedia.org/wiki INFO:MainThread:work():please be patient, this may take some time INFO:MainThread:work():urls=0(0,0) edges=0 vertices=0 threads=1 ... INFO:MainThread:work():urls=348(334,205) edges=3604 vertices=553 threads=20 ... INFO:MainThread:work():urls=743(713,0) edges=7544 vertices=743 threads=1 INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http://kab.wikipedia.org/wiki INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http://kab.wikipedia.org/wiki INFO:MainThread:show():number of accepted vertices: 743 INFO:MainThread:show():number of accepted edges: 7547 INFO:MainThread:show():number of explored vertices: 714 INFO:MainThread:show():number of http head errors: 6 INFO:MainThread:show():number of http get errors: 30 INFO:MainThread:show():number of analyzed urls: 744 INFO:MainThread:show():vertices coverage 96%. INFO:MainThread:show():saving 'adjacency.jpg' INFO:MainThread:show():computing eigenvalues and eigenvectors INFO:MainThread:show():saving 'spectrum.jpg' INFO:MainThread:show():saving 'octave.mat', use load('octave.mat') in octave INFO:MainThread:show():saving 'vertices.lst' INFO:MainThread:show():show() finished. INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http://kab.wikipedia.org/wiki2 Comments