{"id":3737,"date":"2011-12-28T13:36:53","date_gmt":"2011-12-28T11:36:53","guid":{"rendered":"http:\/\/djalil.chafai.net\/blog\/?p=3737"},"modified":"2012-03-24T14:52:14","modified_gmt":"2012-03-24T13:52:14","slug":"w2m-python-module","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2011\/12\/28\/w2m-python-module\/","title":{"rendered":"Web to matrix Python module"},"content":{"rendered":"<p style=\"text-align: justify;\">The beginning of winter is often the occasion to write programs. This year I wrote just for fun <strong><a href=\"http:\/\/pypi.python.org\/pypi\/w2m\/\">w2m.py<\/a><\/strong>, answering a question raised by my co-author <a title=\"A friend of mine!\" href=\"\/scripts\/search.php?q=Charles+Bordenave\">Charles Bordenave<\/a>. This<strong> <\/strong> <a href=\"http:\/\/en.wikipedia.org\/wiki\/Python_%28programming_language%29\">Python<\/a> 2.7 module+program consists in a spider which explores part of the World Wide Web, extracts the adjacency matrix, and computes its spectrum.   Below the results for <a href=\"http:\/\/kab.wikipedia.org\/wiki\">Wikipedia in Kabyle language<\/a>. On the technical side, <strong>w2m.py<\/strong> has a simple <a href=\"http:\/\/en.wikipedia.org\/wiki\/Object-oriented_programming\">object oriented<\/a> code, which makes use of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Multithreading_%28computer_architecture%29\">threading<\/a> and <a href=\"http:\/\/en.wikipedia.org\/wiki\/NumPy\">numpy<\/a> + <a href=\"http:\/\/en.wikipedia.org\/wiki\/Matplotlib\">matplotlib<\/a>. I will hopefully convert it to Python 3.x some day (<a href=\"http:\/\/wiki.python.org\/moin\/2to3\">2to3<\/a>). For Debian\/GNU Linux, you may use <em>apt-get install python-matplotlib python-scipy python-argparse<\/em> in order to install all the necessary packages.<\/p>\n<p><strong>Related:<\/strong> <a href=\"http:\/\/www.cise.ufl.edu\/research\/sparse\/matrices\/\">The University of Florida Sparse Matrix Collection <\/a> (pointed out by <a href=\"\/scripts\/search.php?q=charles+bordenave\">Charles Bordenave<\/a>).<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_3786\" aria-describedby=\"caption-attachment-3786\" style=\"width: 560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-3786  \" title=\"Adjacency matrix spectrum for Wikipedia in Kabyle language (2011-12-27)\" src=\"\/blog\/wp-content\/uploads\/2011\/12\/spectrum.jpg\" alt=\"Adjacency matrix spectrum for Wikipedia in Kabyle language (2011-12-27)\" width=\"560\" height=\"420\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2011\/12\/spectrum.jpg 800w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2011\/12\/spectrum-300x225.jpg 300w\" sizes=\"(max-width: 560px) 100vw, 560px\" \/><figcaption id=\"caption-attachment-3786\" class=\"wp-caption-text\">Adjacency matrix spectrum for Wikipedia in Kabyle language (obtained in December 2011). The darkness of each point corresponds to the delocalization of the eigenvector.<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_3787\" aria-describedby=\"caption-attachment-3787\" style=\"width: 560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-3787 \" title=\"Adjacency matrix of Wikipedia in Kabyle language (2011-12)\" src=\"\/blog\/wp-content\/uploads\/2011\/12\/adjacency.jpg\" alt=\"Adjacency matrix of Wikipedia in Kabyle language (2011-12)\" width=\"560\" height=\"420\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2011\/12\/adjacency.jpg 800w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2011\/12\/adjacency-300x225.jpg 300w\" sizes=\"(max-width: 560px) 100vw, 560px\" \/><figcaption id=\"caption-attachment-3787\" class=\"wp-caption-text\">Adjacency matrix of Wikipedia in Kabyle language (obtained in December 2011)<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: justify;\">Here is a chunk of the console output of w2m:<\/p>\n<pre>INFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:27 http:\/\/kab.wikipedia.org\/wiki\r\nINFO:MainThread:work():please be patient, this may take some time\r\nINFO:MainThread:work():urls=0(0,0) edges=0 vertices=0 threads=1\r\n...\r\nINFO:MainThread:work():urls=348(334,205) edges=3604 vertices=553 threads=20\r\n...\r\nINFO:MainThread:work():urls=743(713,0) edges=7544 vertices=743 threads=1\r\nINFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http:\/\/kab.wikipedia.org\/wiki\r\nINFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http:\/\/kab.wikipedia.org\/wiki\r\nINFO:MainThread:show():number of accepted vertices: 743\r\nINFO:MainThread:show():number of accepted edges: 7547\r\nINFO:MainThread:show():number of explored vertices: 714\r\nINFO:MainThread:show():number of http head errors: 6\r\nINFO:MainThread:show():number of http get errors: 30\r\nINFO:MainThread:show():number of analyzed urls: 744\r\nINFO:MainThread:show():vertices coverage 96%.\r\nINFO:MainThread:show():saving 'adjacency.jpg'\r\nINFO:MainThread:show():computing eigenvalues and eigenvectors\r\nINFO:MainThread:show():saving 'spectrum.jpg'\r\nINFO:MainThread:show():saving 'octave.mat', use load('octave.mat') in octave\r\nINFO:MainThread:show():saving 'vertices.lst'\r\nINFO:MainThread:show():show() finished.\r\nINFO:MainThread:_time_stamp():w2m.py 2011-12-28 12:30 http:\/\/kab.wikipedia.org\/wiki<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The beginning of winter is often the occasion to write programs. This year I wrote just for fun w2m.py, answering a question raised by my&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2011\/12\/28\/w2m-python-module\/\">Continue reading<span class=\"screen-reader-text\">Web to matrix Python module<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":45},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/3737"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=3737"}],"version-history":[{"count":83,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/3737\/revisions"}],"predecessor-version":[{"id":4663,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/3737\/revisions\/4663"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=3737"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=3737"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=3737"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}