Internals

The following section describes some of the internals of mwlib. Only read this if you plan to extend mwlib’s functionality.

Writers

A writer in mwlib generates output from a collection of MediaWiki articles in some writer-specific format.

The writer function

Essentially a writer is just a Python function with the following signature:

def writer(env, output, status_callback, **kwargs): pass

Note that the function doesn’t necessarily have to be called “writer”.

The env argument is an mwlib.wiki.Environment instance which always has the wiki attribute set to the configured WikiDB instance and the metabook attribute set to a filled-in mwlib.metabook.MetaBook instance. If images are used, the images attribute of the env object is set to the configure ImageDB instance.

The output argument is a filename of a file in which the writer should write its output.

The status_callback argument is a callable with the following signature:

def status_callback(status=None, progress=None, article=None): pass

which should be called from time to time to update the status/progress information. status should be set to a short, English description of what’s happening (e.g. “parsing”, “rendering”), progress should be an integer value between 0 and 100 indicating the percentage of progress (actually you don’t have to worry about setting it to 0 at the start and to 100 at the end, this is done by mw-render) and article should be the unicode string of the currently processed article. All parameters are optional, so you can pass only one or two of the parameters to status_callback() and the other parameters will keep their previous value.

The return value of the writer function is not used: If the function returns, this is treated as success. To indicate failure, the writer must raise an exception. Use the WriterError exception defined in mwlib.writerbase (or a subclass thereof) and instantiate it with a human readable English error message if you want the message to be written to the error file specified with the --error-file option of mw-render. For all other exceptions, the traceback is written to the error file.

Your writer function can define additional keyword arguments (indicated by the “**kwargs” above) that can be passed to the writer with the --writer-options argument of the mw-render command (see below). If the user specified a writer option with option=value, the kwarg option gets passed the string "value", if she specified a writer option just with option, the kwarg option gets passed the value True. All writer options should be optional and documented using the options attribute on the writer object (see below).

Attributes

Optionally – and preferably – this function object has the following additional attributes:

writer.description = 'Some short description'
writer.content_type = 'Content-Type of the output'
writer.file_extension = 'File extension for documents'
writer.options = {
    'foo: {
        'help': 'help text for "switch" foo',
    },
    'bar': {
        'param': 'PARAM',
        'help': 'help text for option bar with parameter PARAM',
    }
}

For example the writer “odf” (defined in mwlib.odfwriter) sets the attributes to these values:

writer.description = 'OpenDocument Text'
writer.content_type = 'application/vnd.oasis.opendocument.text'
writer.file_extension = 'odt'

and the writer “rl” from mwlib.rl (defined in mwlib.rl.rlwriter) sets the attributes to these values:

writer.description = 'PDF documents (using ReportLab)'
writer.content_type = 'application/pdf'
writer.file_extension = 'pdf'
writer.options = {
    'coverimage': {
        'param': 'FILENAME',
        'help': 'filename of an image for the cover page',
    }
}

The description is used when the list of writers is displayed with mw-render --list-writers, all information is displayed with mw-render --writer-info SOMEWRITER. The content type and file extension are written to a file, if one is specified with the --status-file argument of mw-render.

Publishing the writer

Writers are made available as plugins using setuptools entry points. They have a name and must belong to the entry point group “mwlib.writers”. To publish writers in your distribution, add all included writers to the entry group by passing the entry_points kwarg to the call to setuptools.setup() in your setup.py file:

setup(
    ...
    entry_points = {
        'mwlib.writers': [
            'foo = somepackage.foo:writer',
            'bar = somepackage.barbaz:bar_writer',
            'baz = somepackage.barbaz:baz_writer',
        ],
    },
    ...
)

Using writers

From the command line, writers can be used with the mw-render command. Called with just the --list-writers option, mw-render lists the available writers together with their description. A name of an available writer can then be passed with the --writer option to produce output with that writer. For example this will use the ODF writer (named “odf”) to produce a document in the OpenOffice Text format:

$ mw-render --config :en --writer odf --output test.odt Test

Additional options for the writer can be specified with the --writer-options argument, whose value is a “;” separated list of keywords or “key=value” pairs.

Metabooks

A Metabook describes a collection of articles and chapters together with some metadata like title or version. The actual data (e.g. the wikitext of articles) is not contained in the Metabook.

The Metabook is a simple dictionary containing lists, integers, strings (which are Unicode-safe; they are represented as unicode in Python) and other dictionaries. When read from/written to a file or sent over the network, it”s serialized in JSON format.

Metabook Types

Every dictionary contained in the Metabook (and the Metabook dicionary itself) has a type. The different types are described below. The Metabook dictionary itself has type “collection”.

Collection

type (string):

Fixed value “collection”

version (integer):

Protocol version, 1 for now

title (string, optional):

Title of the collection

subtitle (string, optional):

Subtitle of the collection

editor (string, optional):

Editor of the collection

items (list of article and/or chapter objects, can be empty):

Chapters and top-level articles contained in the collection

licenses (list of license objects):

List of licenses for articles in this collection

License

type (string)

Fixed value “license”

name (string)

Name of license

mw_license_url (string, optional)

URL to license text in wikitext format

mw_rights_page (string, optional)

Title of article containing license text

mw_rights_icon (string, optional)

URL of license icon

mw_rights_url (string, optional)

URL to license text in any format

mw_rights_text (string, optional)

Name and possibly a short description of the license

Article

type (string):

Fixed value “article”

content_type (string):

Fixed value “text/x-wiki”

title (string):

Title of this article

displaytitle (string, optional):

Title to be used in rendered output instead of the real title

revision (string, optional):

Revision of article, i.e. oldid for MediaWiki. If omitted, the latest revision is used.

timestamp (integer, optional):

UNIX timestamp (seconds since 1970-1-1) of the revision of this article

url (string):

URL to article in source wiki

authors (list of strings):

list of principal authors

source-url (string)

URL of source wiki. This URL is the key to an item in the sources dictionary in the content.json object of the ZIP file.

Chapter

type (string):

Fixed value “chapter”

title (string):

Title of this chapter

items (list of article objects, can be empty):

List of articles contained in this chapter

Source

type (string)

Fixed value “source”

system (string):

Fixed value “MediaWiki” for now

url (string, optional):

“home” URL of source, e.g. “http://en.wikipedia.org/wiki/Main_Page” (same as key for this entry)

name (string):

Unique name of source, e.g. “Wikipedia (en)”

language (string)

2-character ISO code of language, e.g. “en”

interwikimap (dictionary mapping prefixes to interwiki objects, optional)

Interwiki

Interwiki entries can describe language links and interwiki links

type (string)

Fixed value “interwiki”

prefix (string)

Prefix is MediaWiki links, i.e. the part before the “:”. This is the key in the interwikimap attribute of a source object.

url (string)

URL template, the string “$1” gets replaced with the link target (w/out prefx)

local (bool, optional)

True if the interwiki link is a “local” one

language (string, optional)

Name of the language, if this interwiki describes language links

Example

Given in JSON notation:

{
    "type": "collection",
    "version": 1,
    "title": "This is the Collection Title",
    "subtitle": "An optional subtitle",
    "editor": "Jane Doe",
    "items": [
        {
            "type": "article",
            "title": "Top-level Article",
            "content_type": "text/x-wiki"
        },
        {
            "type": "chapter",
            "title": "First Chapter",
            "items": [
                {
                    "type": "article",
                    "title": "First Article in Chapter",
                    "revision": "1234",
                    "timestamp": 122331212312,
                    "content_type": "text/x-wiki"
                    "source-url": "http://en.wikipedia.org/wiki/Main_Page",
                },
                {
                    "type": "article",
                    "title": "Second Article in Chapter",
                    "content_type": "text/x-wiki"
                    "source-url": "http://en.wikipedia.org/wiki/Main_Page",
                }
            ]
        },
    ],
    "licenses": [
        {
            "type": "license",
            "name": "GFDL",
            "mw_license_url": "http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License"
        }
    ]
}