Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent Compatibility #240

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

JVickery-TBS
Copy link
Contributor

Featre/language support

Added translation support to the extensions. Translates:

  • titles
  • descriptions/notes
  • orginization title
  • resource titles
  • resource descriptions
  • keywords/tags

- Started locale support with usage of the get_translated helper.
- Fixed keywords.
@smotornyuk
Copy link
Member

@amercader, can you take a look at this PR? It's a part of PR series from @JVickery-TBS with translation fixes

ckanext/dcat/converters.py Outdated Show resolved Hide resolved
ckanext/dcat/converters.py Outdated Show resolved Hide resolved
ckanext/dcat/converters.py Outdated Show resolved Hide resolved
ckanext/dcat/converters.py Outdated Show resolved Hide resolved
ckanext/dcat/profiles.py Outdated Show resolved Hide resolved
ckanext/dcat/profiles.py Outdated Show resolved Hide resolved
ckanext/dcat/profiles.py Show resolved Hide resolved
ckanext/dcat/profiles.py Outdated Show resolved Hide resolved
- Removed unnecessary imports and code for the resource translations.
- Reverted profile.py back to master.
- Modified code to condition on fluent being loaded.
- Added code to the `dcat_dataset_show` method to set the values of the data_dict before the serializer profiles.
@JVickery-TBS JVickery-TBS changed the title Locale Support Fluent Compatibility Apr 17, 2023
@JVickery-TBS
Copy link
Contributor Author

@amercader okay I understand this extension a bit better from your comments thanks!

So basically, we do not want to really mess with the Profiles at all because they are just the mappings of all of the different meta formats.

So I found that the logic method dcat_dataset_show here is what is used. And to kind of go off more of your comments, I leaned into making these really just "Fluent Compatibility" because that is what it is really.

So I now check plugin_loaded('fluent').

I have reverted everything from profiles.py and the code in the dcat_dataset_show seems to handle everything well as that is where we can modify the data dict values returned from package_show before the dict is passed through the profiles/serialization.

I realize that it is not the greatest level/place to put the code, but it is clean and easy.

The only thing that I would really like to do still is figure out the landingPage for the correct language URLs. Because this is an example of the current code for a url such as http://127.0.0.1:5009/fr/dataset/d9865e61-227c-4298-a1d1-620e6669b097.xml:

<rdf:RDF>
  <dcat:Dataset rdf:about="http://127.0.0.1:5009/dataset/d9865e61-227c-4298-a1d1-620e6669b097">
    <dct:title>Feeds Testing FR</dct:title>
    <dct:description>Feeds Testing Description FR</dct:description>

So it would be nice to have the landingPage be http://127.0.0.1:5009/fr/dataset/d9865e61-227c-4298-a1d1-620e6669b097

Let me know if you think there is any way to do that?

- Added code for fluent fields to catalog show method.
- Added replacement of `{{LANG}}` to the `ckanext.dcat.base_uri` config value.
- Moved `{{LANG}}` replacement down.
@JVickery-TBS
Copy link
Contributor Author

JVickery-TBS commented Apr 18, 2023

@amercader Okay I figured it out now! Firstly, I added the fluent compatibility to the catalog view/blueprint as well now.

As for the landingPage/URL stuff, I figured that out. So it will not override any set uri field values or extra field values. As an example:

 The value will be the first found of:
        1. The value of the `uri` field
        2. The value of an extra with key `uri`
        3. `catalog_uri()` + '/dataset/' + `id` field

It is now in the catalog_uri where I check if fluent is loaded, and that the user is within a request context to replace the {{LANG}} tag in the config options ckan.site_url or ckanext.dcat.base_uri with the current language.

@seitenbau-govdata
Copy link
Member

Maybe similar to #124

@amercader
Copy link
Member

Thanks @JVickery-TBS . Let's step back for a second and see what means for ckanext-dcat to have multi-language support.
We will assume that for a field to be multilingual it needs to use the ckanext-fluent convention:

dataset_dict = {
    "title": {
        "en": "Some title in English",
        "ca": "Un títol en català",
        "es": "Un título en castellano",
    },
    "notes": {
        "en": "A description in English",
        "ca": "Una descripció en català",
        "es": "Una descripción en castellano",
    },
    "resources": [],
    "maintainer": "xx",

}

Multilingual support means:

  1. Supporting importing fields from the RDF graph to the fluent format above (which is the goal of add support for multilingual RDF #124 by @stefina)
  2. Using the fluent fields to serialize multilingual RDF files (which is the goal of this PR)

Both don't need to be done at the same time, so it's fine to focus on serialization for now.
You current approach is to modify the values of multilingual fields in the serialized RDF file (the RDF/XML, jsonld, ttl...) to match the language that the current web user is using. So if they are visiting https://someckan.org/en they get:

<rdf:RDF>
  <dcat:Dataset rdf:about="https://someckan.org/dataset/d9865e61-227c-4298-a1d1-620e6669b097">
    <dct:title>Some title in English</dct:title>
    <dct:description>A description in English</dct:description>

And if they are visiting https://someckan.org/ca (or ckan.locale_default = ca) they will get:

<rdf:RDF>
  <dcat:Dataset rdf:about="https://someckan.org/dataset/d9865e61-227c-4298-a1d1-620e6669b097">
    <dct:title>Un títol en català</dct:title>
    <dct:description>Una descripció en català</dct:description>

This is very limited in that we are not representing all languages, and the serialization does not provide information on which is the actual language used to display the values are provided. Besides, this will only work in the context of a web request, not when used in the CLI, or as module elsewhere.

The correct and more interoperable approach is to always provide all available languages, and use language codes:

<rdf:RDF>
  <dcat:Dataset rdf:about="https://someckan.org/dataset/d9865e61-227c-4298-a1d1-620e6669b097">
    <dct:title xml:lang="en">Some title in English</dct:title>
    <dct:title xml:lang="ca">Un títol en català</dct:title>
    <dct:title xml:lang="es">Un título en castellano</dct:title>
    <dct:description xml:lang="en">A description in English</dct:description>
    <dct:description xml:lang="ca">Una descripció en català</dct:description>
    <dct:description xml:lang="es">Una descripción en castellano</dct:description>

For this I'm afraid we need to go low level, at the profiles level. But the good news is that by changing it there we will automatically get multilingual serializations regardless of how these are created (API, RDF endpoint, CLI, etc).

Below is a quick patch I tried to add support for multilingual title and notes fields. Hopefully it's easy to expand to other fields. @JVickery-TBS if you could give it a go I believe that could be a good path forward.

@seitenbau-govdata I'd love to get another pair of eyes on the modified _add_triple_from_dict() logic. I think the assumptions I made are fair but maybe we need to consider other combinations of parameters

diff --git a/ckanext/dcat/profiles.py b/ckanext/dcat/profiles.py
index 9b066ef..f91c79e 100644
--- a/ckanext/dcat/profiles.py
+++ b/ckanext/dcat/profiles.py
@@ -727,14 +727,16 @@ class RDFProfile(object):
 
     def _add_triples_from_dict(self, _dict, subject, items,
                                list_value=False,
-                               date_value=False):
+                               date_value=False,
+                               multilingual=False):
         for item in items:
             key, predicate, fallbacks, _type = item
             self._add_triple_from_dict(_dict, subject, predicate, key,
                                        fallbacks=fallbacks,
                                        list_value=list_value,
                                        date_value=date_value,
-                                       _type=_type)
+                                       _type=_type,
+                                       multilingual=multilingual)
 
     def _add_triple_from_dict(self, _dict, subject, predicate, key,
                               fallbacks=None,
@@ -742,7 +744,8 @@ class RDFProfile(object):
                               date_value=False,
                               _type=Literal,
                               _datatype=None,
-                              value_modifier=None):
+                              value_modifier=None,
+                              multilingual=False):
         '''
         Adds a new triple to the graph with the provided parameters
 
@@ -776,6 +779,11 @@ class RDFProfile(object):
             self._add_date_triple(subject, predicate, value, _type)
         elif value:
             # Normal text value
+            if multilingual and isinstance(value, dict):
+                # We assume that all multilingual field values are Literals
+                for lang, translated_value in value.items():
+                    object = Literal(translated_value, lang=lang)
+                    self.g.add((subject, predicate, object))
             # ensure URIRef items are preprocessed (space removal/url encoding)
             if _type == URIRef:
                 _type = CleanedURIRef
@@ -1207,10 +1215,16 @@ class EuropeanDCATAPProfile(RDFProfile):
 
         g.add((dataset_ref, RDF.type, DCAT.Dataset))
 
-        # Basic fields
+        # Multilingual fields
         items = [
             ('title', DCT.title, None, Literal),
             ('notes', DCT.description, None, Literal),
+        ]
+
+        self._add_triples_from_dict(dataset_dict, dataset_ref, items, multilingual=True)
+
+        # Basic fields
+        items = [
             ('url', DCAT.landingPage, None, URIRef),
             ('identifier', DCT.identifier, ['guid', 'id'], URIRefOrLiteral),
             ('version', OWL.versionInfo, ['dcat_version'], Literal),

- Removed fluent compatibility code in converters.
- Removed fluent compatibility code in logic.
- Removed fluent loaded check in `catalog_uri` util method.
- Removed `LANG` replacement in base uri util method.
- Added multilingual support to graph from dataset.
@JVickery-TBS
Copy link
Contributor Author

@amercader Okay here we go again hahaha.

I have done your above implementation for multilingual in the _add_triples_from_dict method. And it seems to be working nicely.

There were a couple places in which I had to do some strange-ish things:

  • tags: because the tags are a dict, I had to loop through those in a specific way;
  • organization: the publisher fallback onto the organization name, the org/group translated fields are not included in the dataset dict, so I had to call an org show here.

I also removed the language from the url for the rdf:about= as that seemed incorrect?

…elper;

- Made the translation keys fallback to the normal core field, like the `get_translated` helper does.
@JVickery-TBS
Copy link
Contributor Author

JVickery-TBS commented May 9, 2023

@amercader Just added in the fallback for the field keys. Realized that if we put in the _translated keys, we would be assuming that a user is translating all of these fields. E.g. a user could be translating the Resource Title, but not translating the Resource Description.

So just a simple check if the _translated key is in the object dicts, kind of like how the core get_translated works.

- Changed conditions for translated tags.
- Renamed `multilingual` to `all_translated` due to core extension name.
@JVickery-TBS
Copy link
Contributor Author

@amercader Ian mentioned that I should rename the multilingual parameter because that is the name of the core extensions Multilingual.

So I just renamed that param to all_translated.

- Fixed syntax error from refactor.
- Fixed syntax error from refactor.
- Fixed fluent tags output.
- Fixed fluent tags output.
Copy link
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me @JVickery-TBS

I don't like all_translated, what about just translated?

And one final thing about the organization_show call. Besides that this is good to go

ckanext/dcat/profiles.py Outdated Show resolved Hide resolved
@inderps
Copy link

inderps commented Oct 18, 2023

Will this ever get merged or not?

- Renamed `all_translated` to `translated`.
- Added a class cache variable for org dicts.
- Updated org test for serializing.
@JVickery-TBS
Copy link
Contributor Author

@inderps Hey! Sorry! Have been busy with things over here. But have just done the feedback now, so we shall see!

@amercader
Copy link
Member

amercader commented May 28, 2024

Quick update here just to say that I've pulled the fluent compatibility work in the wider Scheming / DCAT 3 support effort so at some point during the next few weeks this will be looked at. I just need to think about how it will integrate with the more general scheming support but the majority of the work here should get incorporated as is. Thanks for bearing with me @JVickery-TBS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants