Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_object_value and _object_value_list return BNode identifiers #289

Open
EricSoroos opened this issue Jun 24, 2024 · 1 comment
Open

_object_value and _object_value_list return BNode identifiers #289

EricSoroos opened this issue Jun 24, 2024 · 1 comment

Comments

@EricSoroos
Copy link

EricSoroos commented Jun 24, 2024

While reviewing the scheming PR #281, I've found a couple of places where the DCAT RDF Harvester in json-ld format is having trouble with in-the-wild DCAT 2.1.1 feeds. (Specifically, an ESRI AGOL Inspire feed: https://opendata-ifigeo.hub.arcgis.com/api/feed/dcat-ap/2.1.1.json). (This doesn't appear to be related to the PR, so here it is)

Generally, _object_value and _object_value_list are returning the string value of the node, and in cases where the node has a type and something other than a direct value, this returns the internal node id of the BNode.

For example, with this (not terribly useful, but syntactically representative) provenance:

			"dct:provenance": {
				"@type": "dct:ProvenanceStatement",
				"@label": {
					"@value": ""
				}
			},

We extract: 'provenance', ('extras', 19, 'value'): 'Nc0c0162afbe140a5afa2736468e1da4c',.

Similarly, the theme:

			"dcat:theme": {
				"@type": "skos:Concept",
				"skos:prefLabel": "Geospatial"
			},

also returns a internal node id. This is almost never going to be a useful result, because the identifiers are ephemeral, and only valid while the graph is in memory.

I'm not clear on the best course of action here, I see a couple.

  • Potentially pull out all of the items that are themselves alternate types, e.g. provenance is a dct:ProvenanceStatement and handle them one at a time.
  • Have a generic RDF.type == SKOS.Concept handler, but in some cases that will want to pull out an id, and some cases a prefLabel in the appropriate language. Sometimes we're going to have an enforced vocabulary from the EU, and sometimes it's going to be site defined. (e.g., theme is probably going to be site dependent, HVD Category is going to be EU wide)
  • Have a generic "best string we can get" and keep adding to it as a fallback.
@amercader
Copy link
Member

I think these cases should be handled in the parser methods, and I agree that it's useless to store the BNode value.

Perhaps if the object in in _object_value (or one of the items in _object_value_list it's a BNode then we inspect that node and extract whatever makes more sense, a Literal if it's there, or the value of skos:prefLabel if it's a node of type skos:concept. That should hopefully cover the theme case.

BTW this particular serialization for provenance is not a valid JSON-LD, @label is not a valid keyword. I'm by no means a JSON-LD expert but I think @value should be used instead (otherwise rdflib can not extract anything from that node):

			"dct:provenance": {
				"@type": "dct:ProvenanceStatement",
			        "@value": "Something actually useful"

			},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants