Python library to work with WARC files
Project description
WARC (Web ARChive) is a file format for storing web crawls.
http://www.scribd.com/doc/4303719/WARC-ISO-28500-final-draft-v018-Zentveld-080618
This warc library makes it very easy to work with WARC files.:
import warc f = warc.open("test.warc") for record in f: print record['WARC-Target-URI'], record['Content-Length']
Documentation
The documentation of the warc library is available at http://readthedocs.org/docs/warc/en/latest/
License
This software is licensed under the BSD 3-clause license. See LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
warc-0.1.tar.gz
(6.1 kB
view hashes)