Press "Enter" to skip to content

Web to matrix Python module

The beginning of winter is often the occasion to write programs. This year I wrote just for fun w2m.py, answering a question raised by my co-author Charles Bordenave. This Python 2.7 module+program consists in a spider which explores part of the World Wide Web, extracts the adjacency matrix, and computes its spectrum. Below the results for Wikipedia in Kabyle language. On the technical side, w2m.py has a simple object oriented code, which makes use of threading and numpy + matplotlib. I will hopefully convert it to Python 3.x some day (2to3). For Debian/GNU Linux, you may use apt-get install python-matplotlib python-scipy python-argparse in order to install all the necessary packages.

Related: The University of Florida Sparse Matrix Collection (pointed out by Charles Bordenave).

 

 

 

 

Adjacency matrix spectrum for Wikipedia in Kabyle language (2011-12-27)
Adjacency matrix spectrum for Wikipedia in Kabyle language (obtained in December 2011). The darkness of each point corresponds to the delocalization of the eigenvector.

 

 

 

 

 

 

Adjacency matrix of Wikipedia in Kabyle language (2011-12)
Adjacency matrix of Wikipedia in Kabyle language (obtained in December 2011)

 

 

 

 

Here is a chunk of the console output of w2m:

INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:27 http://kab.wikipedia.org/wiki
INFO:MainThread:work():please be patient, this may take some time
INFO:MainThread:work():urls=0(0,0) edges=0 vertices=0 threads=1
...
INFO:MainThread:work():urls=348(334,205) edges=3604 vertices=553 threads=20
...
INFO:MainThread:work():urls=743(713,0) edges=7544 vertices=743 threads=1
INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http://kab.wikipedia.org/wiki
INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http://kab.wikipedia.org/wiki
INFO:MainThread:show():number of accepted vertices: 743
INFO:MainThread:show():number of accepted edges: 7547
INFO:MainThread:show():number of explored vertices: 714
INFO:MainThread:show():number of http head errors: 6
INFO:MainThread:show():number of http get errors: 30
INFO:MainThread:show():number of analyzed urls: 744
INFO:MainThread:show():vertices coverage 96%.
INFO:MainThread:show():saving 'adjacency.jpg'
INFO:MainThread:show():computing eigenvalues and eigenvectors
INFO:MainThread:show():saving 'spectrum.jpg'
INFO:MainThread:show():saving 'octave.mat', use load('octave.mat') in octave
INFO:MainThread:show():saving 'vertices.lst'
INFO:MainThread:show():show() finished.
INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http://kab.wikipedia.org/wiki

2 Comments

  1. Florent Benaych-Georges 2011-12-28

    Hi Djalil,

    What do you mean by “each point corresponds to the localization of the associated eigenvector” ?

    Besides, a stupid question : in the second figure (the one of the matrix), the white is where there is an edge, good ?

    A plus,
    flo

  2. Djalil Chafaï 2011-12-28

    In the first graphic, the grey level of point $(x,y)$ is the ratio $\left\Vert\cdot\right\Vert_\infty/\left\Vert\cdot\right\Vert_2$ for the eigenvector associated to the eigenvalue $x+iy$. In the second graphic, yes, white is edge. See the w2m.py code 😉 Best.

Leave a Reply

Your email address will not be published. Required fields are marked *