Press "Enter" to skip to content

Month: December 2011

Web to matrix Python module

The beginning of winter is often the occasion to write programs. This year I wrote just for fun, answering a question raised by my co-author Charles Bordenave. This Python 2.7 module+program consists in a spider which explores part of the World Wide Web, extracts the adjacency matrix, and computes its spectrum. Below the results for Wikipedia in Kabyle language. On the technical side, has a simple object oriented code, which makes use of threading and numpy + matplotlib. I will hopefully convert it to Python 3.x some day (2to3). For Debian/GNU Linux, you may use apt-get install python-matplotlib python-scipy python-argparse in order to install all the necessary packages.

Related: The University of Florida Sparse Matrix Collection (pointed out by Charles Bordenave).





Adjacency matrix spectrum for Wikipedia in Kabyle language (2011-12-27)
Adjacency matrix spectrum for Wikipedia in Kabyle language (obtained in December 2011). The darkness of each point corresponds to the delocalization of the eigenvector.







Adjacency matrix of Wikipedia in Kabyle language (2011-12)
Adjacency matrix of Wikipedia in Kabyle language (obtained in December 2011)





Here is a chunk of the console output of w2m:

INFO:MainThread:_time_stamp() 2011-12-28 12:27
INFO:MainThread:work():please be patient, this may take some time
INFO:MainThread:work():urls=0(0,0) edges=0 vertices=0 threads=1
INFO:MainThread:work():urls=348(334,205) edges=3604 vertices=553 threads=20
INFO:MainThread:work():urls=743(713,0) edges=7544 vertices=743 threads=1
INFO:MainThread:_time_stamp() 2011-12-28 12:30
INFO:MainThread:_time_stamp() 2011-12-28 12:30
INFO:MainThread:show():number of accepted vertices: 743
INFO:MainThread:show():number of accepted edges: 7547
INFO:MainThread:show():number of explored vertices: 714
INFO:MainThread:show():number of http head errors: 6
INFO:MainThread:show():number of http get errors: 30
INFO:MainThread:show():number of analyzed urls: 744
INFO:MainThread:show():vertices coverage 96%.
INFO:MainThread:show():saving 'adjacency.jpg'
INFO:MainThread:show():computing eigenvalues and eigenvectors
INFO:MainThread:show():saving 'spectrum.jpg'
INFO:MainThread:show():saving 'octave.mat', use load('octave.mat') in octave
INFO:MainThread:show():saving 'vertices.lst'
INFO:MainThread:show():show() finished.
INFO:MainThread:_time_stamp() 2011-12-28 12:30
Syntax · Style · .