Crawled web sites

Since 2006 July 21, the system has evolved to capture systematically and effectively a growing websites number. In addition, space required to store and give access to these dates also has evolved.

In exhibition to data statistics, we mean “web”, or “web site”, as a resource published on the Internet identified by an independent URL. We mean “capture”, each made capture in web site time. And we mean “file”, each technical files or archives that contains a website. Are included other technical dates that can be interesting for PADICAT public.

The repository contents:

ConceptTotal
Number of websites57.993
Number of harvests229.396
Number of files340.279.750
ARC space (TB)12
Indexing space (TB)1
Total space (TB)13

Origin of harvests

Deposited resources in repository are from: capture .CAT domain of compiled resources to create monographic collections; websites recommended by PADICAT public; and digital resources of institutions that have signed cooperation agreement with Biblioteca de Catalunya.

ConceptNumber of websitesNumber of harvests
Agreements4632.606
Recommended7043.257
Monographics3.24936.682
.cat35.17175.494
Total39.587118.039

 

Distribution of file type that contains PADICAT repository

 

TypeFiles
text/html263.148.45377,33%
image/jpeg37.756.72211,09%
image/gif8.456.9302,49%
image/png5.963.5861,75%
application/pdf5.375.9421,58%
application/atom+xml3.548.7121,04%
text/xml2.302.4670,68%
application/rss+xml2.241.2570,66%
text/css1.792.9740,53%
application/javascript1.388.7950,41%
text/plain1.387.5960,41%
text/dns976.2100,29%
application/x-shockwave-flash903.4660,27%
application/x-javascript683.9540,20%
no-type522.7940,15%
application/xml517.0120,15%
application/octet-stream377.4710,11%
application/msword307.1480,09%
image/pjpeg255.5470,07%
image/jpg208.6740,06%
Altres2.164.0400,64%
Total340.279.750 

Monographic evolution: PADICAT topic collections

PADICAT made eight monographic collections: Catalan museums, folk-rock music in Catalonia, Parliament European Parliament elections campaign (2009), Parliament of Catalonia (2006 and 2010), Spanish Congress and Senate (2008), and local elections (2007 and 2011).

 

ConceptNumber of websibtesNumber of harvestsNumber of filesSpace (GB)
Catalan Elections 2006887754.953.215175
Local Elections 20076151.74713.641.991457
Folkrock music50501.148.31222
Spanish Elections 20081478963.117.638135,11
European Elections 20091706135.404.291233,05
Catalan museums1.3641.3912.146.133147,49
Catalan Elections 201080631.21017.202.999707,65
Local Elections 20111.51847.42917.202.9991.127
Total4.75884.11164.817.5783004,3

 

More about  PADICAT monographic collections in election campaigns. The text is in Spanish:

Ciro Llueca; Daniel Cócera; Natalia Torres; Gerard Suades; Ricard de la Vega (2011). “A ritmo de tweet: archivando elecciones 2.0”. El profesional de la información, vol. 20, nº 3.
http://eprints.rclis.org/handle/10760/15764