Skip to content

Commit

Permalink
Merge pull request #2 from thorge/split-large-sitemaps-in-multiple-files
Browse files Browse the repository at this point in the history
Split sitemaps in multiple files
  • Loading branch information
sagargg committed Sep 19, 2024
2 parents 9db86ef + 9a6f9a0 commit 88fb3b6
Show file tree
Hide file tree
Showing 6 changed files with 412 additions and 134 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,6 @@ coverage.xml

# Sphinx documentation
docs/_build/

# Generated sitemaps (default directory)
ckanext/sitemap/public/sitemap*
81 changes: 72 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,97 @@
[![Tests](https://github.com//ckanext-sitemap/workflows/Tests/badge.svg?branch=main)](https://github.com//ckanext-sitemap/actions)

# ckanext-sitemap
A CKAN extension that generates a sitemap XML file is designed to create a structured map of a CKAN instance's datasets and resources, making it easier for search engines to discover and index the available data. !

## Installation
A CKAN extension that generates a sitemap XML file is designed to create a structured map of a CKAN instance's datasets and resources, making it easier for search engines to discover and index the available data.

**TODO:** Add any additional install steps to the list below.
For example installing any non-Python dependencies or adding any required
config settings.
## Table of Contents

- [Getting Started](#getting-started)
- [Contributing](#contributing)
- [Versioning](#versioning)
- [License](#license)

## Getting Started

### Installation

To install ckanext-sitemap:

1. Activate your CKAN virtual environment, for example:

. /usr/lib/ckan/default/bin/activate
```bash
. /usr/lib/ckan/default/bin/activate
```

2. Clone the source and install it on the virtualenv
2. Clone the source and install it in the virtual environment

```bash
git clone https://github.com//ckanext-sitemap.git
cd ckanext-sitemap
pip install -e .
pip install -r requirements.txt
pip install -r requirements.txt
```

3. Add `sitemap` to the `ckan.plugins` setting in your CKAN
config file (by default the config file is located at
`/etc/ckan/default/ckan.ini`).

4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
```bash
sudo service apache2 reload
```
### Configuration
You can configure this extension in the `ckan.ini` file of your CKAN instance. Ensure to set these environment variables according to your requirements for sitemap generation and management.
Environment Variable | Default Value | Description
-------------------- | ------------- | -----------
`ckanext.sitemap.directory` | [`./ckanext/sitemap/public`](./ckanext/sitemap/public/) | The directory path for storing generated sitemaps.
`ckanext.sitemap.max_items` | `5000` | Maximum number of items per sitemap file. If the total count of resources exceeds this limit, the sitemap is split into multiple files.
`ckanext.sitemap.autorenew` | `True` | If this option is enabled, the sitemaps will be automatically renewed whenever a user requests a sitemap and the existing sitemap is older than the Time-To-Live (TTL) value specified. Set this to False if you prefer a cron job to handle sitemap generation.
`ckanext.sitemap.ttl` | `8 * 3600` (8 hours) | Time-To-Live (TTL) for sitemaps. Sitemaps older than this value (in seconds) are regenerated when a user visits a sitemap route.
`ckanext.sitemap.resources` | `True` | Determines whether package resources (distributions) should be included in the sitemaps.
`ckanext.sitemap.groups` | `True` | Determines whether groups and organizations should be included in the sitemaps.
`ckanext.sitemap.language_alternatives` | `True` | Determines whether language alternatives should be included in the sitemaps.
`ckanext.sitemap.custom_uris` | `Undefined` | A list of additional sitemap URIs separated by whitespace or newlines. These URIs will be included in the sitemap generation process alongside the default CKAN URIs.
### Using Cron for Regular Sitemap Generation
Using cron to generate sitemaps regularly can be advantageous, especially if the sitemap generation process is time-consuming.
Ensure that the sitemap generation occurs within the time frame specified by `ckanext.sitemap.ttl`, or alternatively, set `ckanext.sitemap.autorenew` to `False` to prevent accidental triggering of sitemap generation by users.
**Example Cron Job:**
To schedule the command to run at 2 AM, 10 AM, and 6 PM:
```bash
0 2,10,18 * * * /usr/lib/ckan/default/bin/ckan -c /etc/ckan/default/ckan.ini ckanext-sitemap generate > /dev/null 2>&1
```
## Available Commands
- `generate`
This command triggers the generation of the sitemap.
Usage:
```bash
ckanext-sitemap generate
```
## Contributing
To contribute to this documentation, create a branch or fork this repository, make
your changes and create a merge request.
## Versioning
We use [SemVer](http://semver.org/) for versioning. For the versions available, see
the tags on this repository.
## License
Expand Down
33 changes: 33 additions & 0 deletions ckanext/sitemap/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# -*- coding: utf-8 -*-

import click
import ckanext.sitemap.sitemap as sm

def get_commands():
return [ckanext_sitemap]


@click.group()
def ckanext_sitemap():
"""ckanext-sitemap
Usage:
ckanext-sitemap generate
- (Re)generate sitemap.
"""


@ckanext_sitemap.command()
def generate():
"""
Command to generate sitemap.
"""
try:
click.echo('Starting sitemap generation..')
sm.generate_sitemap()
click.echo('Finished sitemap generation.')

except Exception as e:
# Handle exceptions that may occur during cleanup
click.echo(f'Error during sitemap generation: {str(e)}', err=True)
7 changes: 5 additions & 2 deletions ckanext/sitemap/plugin.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
import ckan.plugins as plugins
import ckan.plugins.toolkit as toolkit
import ckanext.sitemap.view as view
from ckanext.sitemap import cli


class SitemapPlugin(plugins.SingletonPlugin):
plugins.implements(plugins.IConfigurer)
plugins.implements(plugins.IBlueprint)
plugins.implements(plugins.IClick)

# IConfigurer
def update_config(self, config_):
Expand All @@ -16,6 +18,7 @@ def update_config(self, config_):
# IBlueprint
def get_blueprint(self):
return view.get_blueprints()



# IClick
def get_commands(self):
return cli.get_commands()
Loading

0 comments on commit 88fb3b6

Please sign in to comment.