ACDH-CH Linked Open Date Entities (LODE)
Linked Open Data
Linked Open Date Entities (LODE)
ACDH-CH Linked Open Date Entities (LODE) is a data service to provide information about (historical) dates, using Linked Open Data (LOD) techniques. Currently, the entities provide lookups for the time concepts for the precision of day, month, year, decade, century, and millennium. For practical reasons, the 1st implementation covers 6000 years between 3000 BC/BCE and AD/CE 3000 based on ISO8601. The LODE specifications and policies are described below. More details on the project can be found in the paper: Go Sugimoto (2021) Building Linked Open Date Entities for Historical Research. In: Garoufallou E., Ovalle-Perandones MA. (eds) Metadata and Semantic Research. MTSR 2020. Communications in Computer and Information Science, vol 1355. Springer, Cham. https://doi.org/10.1007/978-3-030-71903-6_30".
##### Objectives #####
LODE aims to provide an open and shared standard with a clear URI syntax and data modelling policy in order to serve as a guidance or reference point for Linked Open Data (LOD) implementers. It takes a pragmatic approach to try to meet the basic needs of users, by providing a simple and explicit reference for absolute time entities. It is designd to foster (Named) Entity Linking (NEL) for not only cultural heritage resrouces to be contextualised and integrated, but also any resrouces holding temporal information to be connected each other.
We started LODE, because there are currently very few LOD resrouces which provide the entities we cover. The idea is to allow the users to link heterogeneous LOD on the web through time. If they include LODE URIs in their local datasets, we can traverse LOD through the standardised LODE time entities. As a result, distributed datasets can be integrated and transformed to a giant Knowledge Graph that contains a vast amount of data to provide useful services.
##### ACDH-CH Unit of Time Entities #####
A part of LODE is the ACDH-CH Unit of Time Entities (https://vocabs.acdh.oeaw.ac.at/ut_entities/), which specifies the time unit concepts used for the time instances in LODE. For example, the concept of the 2nd millennium (Date Entity) is an instance of the concept of millennium (Unit of Time Entity).
##### URI syntax #####
URI syntax consists of:
base URI https://vocabs.acdh.oeaw.ac.at/date/
temporal system scheme gregorian_calendar
date entity 1985-10-12
By default (i.e.without specifying temporal system scheme), ISO8601:2019-based syntax is used for date representation (see details below) such as YYYY-MM-DD, YYYY-MM, YYYY, because ISO8601 is used for many software and programming languages (SQL, PHP, Python) and web schemas (XMLSchema 1.1). The most common syntax would look like https://vocabs.acdh.oeaw.ac.at/date/1985-10-12, https://vocabs.acdh.oeaw.ac.at/date/1985-10, and https://vocabs.acdh.oeaw.ac.at/date/1985.
See the details on ISO8601:2019 convention below. To simplify and standardise the syntax, https://vocabs.acdh.oeaw.ac.at/iso8601/1985-10-12 will be redirected to https://vocabs.acdh.oeaw.ac.at/date/1985-10-12, because our syntax is a subset of the ISO8601 syntax.
##### ISO8601:2019 convention #####
It is possible to represent all temporal entities used (day, month, year, decade, century, and millennium) with the ISO8601:2019 standard.
YYYY/YYYY represents a millennium (/ in between represents duration):
https://vocabs.acdh.oeaw.ac.at/date/1001/2000 (2nd Millennium AD)
https://vocabs.acdh.oeaw.ac.at/date/-0999/0000 (2nd Millennium BC)
Exactly 2 digits (YY) represents a century:
https://vocabs.acdh.oeaw.ac.at/date/19 (19th century)
https://vocabs.acdh.oeaw.ac.at/date/-03 (3rd century BC)
(A century can be encoded as a period/duration in ISO8601:2019. For example, https://vocabs.acdh.oeaw.ac.at/date/1901/2000 represents the 20th century. However, lookups are not provided for the time being to avoid confusion of displaying 2 entities in the hiearchical view.)
Exactly 3 digits (YYY) represents a decade:
https://vocabs.acdh.oeaw.ac.at/date/-000 (0s BC)
https://vocabs.acdh.oeaw.ac.at/date/-001 (10s BC)
https://vocabs.acdh.oeaw.ac.at/date/-019 (190s BC)
https://vocabs.acdh.oeaw.ac.at/date/-192 (1920s BC)
Exactly 4 digits (YYYY) represents a year:
https://vocabs.acdh.oeaw.ac.at/date/1922 (AD 1922)
https://vocabs.acdh.oeaw.ac.at/date/0192 (AD 192)
https://vocabs.acdh.oeaw.ac.at/date/0019 (AD 19)
https://vocabs.acdh.oeaw.ac.at/date/0001 (AD 1)
https://vocabs.acdh.oeaw.ac.at/date/-0000 (1 BC)
https://vocabs.acdh.oeaw.ac.at/date/-0019 (20 BC)
https://vocabs.acdh.oeaw.ac.at/date/-0192 (193 BC)
https://vocabs.acdh.oeaw.ac.at/date/-1922 (1923 BC)
Year followed by "-" and exactly 2 digits (typically YYYY-MM) represents a month:
https://vocabs.acdh.oeaw.ac.at/date/12000-03 (March 12000)
https://vocabs.acdh.oeaw.ac.at/date/0001-03 (March 1)
https://vocabs.acdh.oeaw.ac.at/date/-0000-04 (April 1 BC)
https://vocabs.acdh.oeaw.ac.at/date/-12000-03 (March 12001 BC)
Year followed by "-", exactly 2 digits, "-", and exactly 2 digits (typically YYYY-MM-DD) represents a day:
https://vocabs.acdh.oeaw.ac.at/date/12000-03-12 (12 March 12000)
https://vocabs.acdh.oeaw.ac.at/date/0001-03-01 (1 March 1)
https://vocabs.acdh.oeaw.ac.at/date/-0000-04-02 (2 April 1 BC)
https://vocabs.acdh.oeaw.ac.at/date/-12000-03-11 (11 March 12001 BC)
Before the adaptation of the Gregorian calendar, ISO8601:2019 applies the proleptic Gregorian calendar. It is the extension of the Gregorian calendar backward to the dates before AD 1852 including time Before Christ.
While Year Zero does not exist in the Julian and Gregorian calendar, ISO8601 uses 0000 for 1 BC, and "-" (minus) for Before Christ (-0001 is 2 BC).
00 (0th century) does not exist.
Hypothetically centuries more than more than 2 digits exists (e.g. 100 may mean 100th century) and cannot be distinguished from expressions of decades (e.g. 100 is 1000s). However they are out of scope/range of our entities.
There are two major reasons why we have adopted the ISO8601 syntax:
a) Non-opaque URI
We maximise the inference capability of human users. Opaque URIs such as ones in Wikidata (https://www.wikidata.org/wiki/Q2436 means 1982) will be extremely hard to infer or guess. In particular, this strategy would help them to enrich data easily (see Data Enrichment below) and to access the entity lookups without a big effort.
b) Language independent
LODE aims to serve as a language neutral resource. The numeric date entities in DBpedia contain textual strings in their URIs, which LODE would like to avoid. For example, the 14th century can be found at http://de.dbpedia.org/resource/14._Jahrhundert (in German) and http://nl.dbpedia.org/resource/14e_eeuw (in Dutch).
##### Extending LODE (minting URIs beyond 6000 years) #####
More than 4 digits are reserved for years. For instance, https://vocabs.acdh.oeaw.ac.at/date/12000 (AD 12000), and https://vocabs.acdh.oeaw.ac.at/date/-250000 (250001 BC). However, LODE only covers from 3000 BC/BCE to AD/CE 3000 in the first instance. Nevertheless, this does not limit data creators to mint such URIs, thus, data can still be interlinked, as long as URIs are identical (see above about the motivation for human inferable URIs).
There is absolutely no problem to extend LODE, although we do not offer the lookup service due to our technical constraints. For example, if users would like to encode hours, minutes, and seconds for linking and data integration purposes, they can mint new LODE URIs, using ISO syntax YYYY-MM-DDThh:mm:ss. For example, the date of the event: "Crew Dragon Resilience docked to ISS at 04:01 UTC on November 17th 2020" can be encoded as https://vocabs.acdh.oeaw.ac.at/date/2020-11-17T04:01Z (which is equivalent to 05:01 CET: https://vocabs.acdh.oeaw.ac.at/date/2020-11-17T04:01+01:00)
##### LODE Data Model #####
The basic structure of the two Entity Models is based on the Wikidata ontology. However, SKOS is primarily adopted 1) to capitalise its simple loose semantics, and 2) to avoid debates on a formal ontology on time concepts. We simply replace their isPart properties with skos:broader and skos:narrower (as well as dc:isPartOf/dc:hasPart and time:intervalContains/time:intervalDuring). In addition, we use Time Ontology in OWL to model more precise entity relations. acdhdate (date entities) and acdhut (unit of time entities) are introduced to add semantics for LODE, whenever needed but we keep them minimum. The list of non-exhaustive primarily used classes and properties for LODE entities are:
During the enrichment with the DBpedia and Wikidata dataset, their original properties (dbo, wdt, wdtn and foaf etc) are currently fully preserved instead of converting them to more standardised ones (W3C recommended properties).
Please note that, although we try to make LODE as stable as possible, we are aware that the modelling is neither perfect nor complete. Rather we aim to improve the data modelling over time. Please contact us if you have suggestions.
##### LODE data enrichment in the host/our implementation #####
We continuously enrich LODE by adding more information. It include labels in different languages, the corresponding date in the Julian calendar, additional information such as the upper/lower and following/previous time concept, the day of the week, and links to other LOD including DBpedia, YAGO, Wikidata, JapanSearch and Semium.org (used by Europeana). More information such as the date in the Islamic, Mayan, Japanese, Chinese calendar and festivities (e.g. Easter) could be added in future.
##### Known issues (data inconsistency for the data model) #####
There are a few known issues on the use of decades and centuries that confuses our users. There are two major perceptions for the concept of century and decade. As for centuries, in the strict construction of the Gregorian calendar, a century start from a year ending in a 1 to a year ending in a 0 (e.g. the 19st century began with 1901 and ended with 2000). In the popular usage, it starts from a year ending in a 0 to a year ending in a 9 (e.g. the 19th century began with 1900 and ended with 1999). The latter implies the 1st century (both BC and AD) consists of only 99 years. A similar construct occurs for decades. A popular usage is that a decade start from a year ending in a 0 to a year ending in a 9 (1960s began with 1960 and ended with 1969), while a decade start from a year ending in a 1 to a year ending in a 0 in a rarer version.
Our model follows the construction of Wikidata. It uses the strict version of centuries and popular version of decades. As a result, there are unfortunately logical inconsistencies: 1) two specific decades (i.e. 0s and 0s BC) consist of only 9 years, due to the lack of Year Zero. 2) skos:narrower and skos:broader are not entirely correct every hundred year. For example, the 11th century consists of years between 1001 and 1100. 1000s is a narrower concept of the 11th century, but it consists of years between 1000 and 1009. Whereas Wikidata uses a formal ontology, our model is based on loose SKOS, which might reduce the impact of data inconsistency. In addition, mathematical calculation by the XMLSchema data type can be executed for literals, regardless the inconsistencies on our entities.
Hypothetically speaking, some concepts would not existed at the time of date entity. For example, there was probably no concept of "22 September" for the people who lived in 1000 BC, because it was not yet invented. However, this does not mean we are unable to refer to the equivalent time of 22 September (or roughly 22 September) in 1000 BC, if we would like to. This may happen, for instance, if an archaeologist estimates the time by the autumnal or fall equinox. The same goes for 1581-01-01 in the proleptic Gregorian calendar, which only exists conceptually. Therefore, Date Entity, by default, deliberately includes such concept of time as a reference point for the purpose of data consistency, because it follows the construct of ISO8601 in order to indicate non-existent time concepts, due to the adaptation of the proleptic Gregorian calendar. It would be more confusing to change the URI syntax described above to indicate the 276th day of a year with condition (i.e. only apply before certain date in BC). Please be aware that it is always possible to extend Date Entity Model by adding a new scheme (see below) and link to the dates in other schemes, in order to facilitate imperfect data (missing concepts etc.) and problematic realisation of data model.
There is uncertainty about the use of "0000". Historically ISO8601 seems to switch it on and off. It is not allowed in the XML Schema Part 2: Datatypes Second Edition. However the schema specifications have a note that ISO8601 is likely to include it, thus, it will be allowed in a subsequent versions. For the time being, Date Entity Model includes it as the representation of 1 BC in the sense of the proleptic Gregorian calendar. If changes are needed in the future, it will be announced in this documentation accordingly.
In this context, LODE attempts to play a role for the standardisation of use of temporal information. LODE could server as a possible open and shared reference point which tries to be as consistent as possible, without forcing different implementers to adopt the same approach. By simply creating links to LODE in datasets, it would be possible to make heterogeneous implementations of time encoding harmonised, for example, even if 0000 and BC are used differently in different systems.
As LODE fetched data from Wikidata, DBpedia, and YAGO for data enrichment, there could be data quality problems which we cannot solve by ourselves. In particular, be aware that the data for the following properties may contain the problems:
Issues regarding the ACDH-CH Unit of Time entities should be consulted in its website.
##### Our solution for known issues #####
In general, the known issues are standardisation problems, and it is therefore hard to solve, because they are often out of our control. We try to minimise their impact by making the data modelling explicit and transparent. In particular, this webpage (https://vocabs.acdh.oeaw.ac.at/date_entities/) and the website for Unit of Time Entities (https://vocabs.acdh.oeaw.ac.at/ut_entities/) serve as a documentation to explain our modelling principles. In this way, users can understand the issues and develop and adjust their applications to meet their needs.
Bug fixing is working in progress.
##### URI syntax for other temporal system schemes #####
It is possible to add date entities in other temporal system scheme in future. For example, we could use the following suffix for different calendars and dating system:
gregorian_calendar/0001-12-02 (recommended syntax), or gregorian_calendar/AD0001-12-02
julian_calendar/-0001-12-02 (recommended syntax), or julian_calendar/BC0001-12-02
islamic_calendar/1441-05-25, or islamic_calendar/25_Jumada_I_1441
The suffixes are merely suggestions to avoid strict specifications. Nevertheless, the Gregorian and the Julian calendar can follow the convention of YYYY-MM-DD format from ISO8601 for simplicity and convenience. Still, 0000 could be avoided in order not to be confused with ISO8601 (i.e. -1 means 1 BC). If required, BC (Before Christ) and AD (Anno Domini) could be expressed in the syntax before or after the digits (e.g. 189 BC or BC 189). BCE (Before Common Era) and CE (Common Era) could be used, instead of BC and AD respectively. 0 could be omitted if a year has less than 4 digits (e.g. 645 rather than 0645), but the URIs may become unintuitive and confusing, in comparison with LODE (ISO8601). Therefore, it is highly recommended to comply with the YYYY-MM-DD format as much as possible, even if one of the Christian calendar systems is declared. In case a unit name is applied such as the Japanese calendar (e.g. Heisei) and the Islamic calendar (e.g. Jumada), it can be separated or concatenated, depending on practicality. IRI could be considered (e.g. 平成/31 rather than Heisei/31). BP (Before Present/Physics) could be used for scientific dating. Please contact and help us for the standardisation by sharing your suggestions and best practices.
##### Client user implementation: Nodification #####
Linked Open Data implementers are encouraged to create or enrich their data by including the Unit of Time and the Date Entities (i.e. URIs), in order to create connections to other LOD resources. Such data enrichment could often be done through "nodification" by transforming existing text literals into nodes. For instance, "11-10-2020" can generate a new node "https://vocabs.acdh.oeaw.ac.at/date/2020-10-11". In this case, it is important to preserve the original literal, so that one can still use XML-Schema based calculations. It may be a good idea to additionally generate upper level dates: "https://vocabs.acdh.oeaw.ac.at/date/2020-10" and "https://vocabs.acdh.oeaw.ac.at/date/2020"
Major LOD resource providers (Wikidata, DBpedia, etc) are kindly encouraged to include links to our entities in their datasets. It is also excellent if they can create more time entities and mint their own URIs. In that case, it is recommended to create links to our entities, so that the users can access LOD resources that use our entities. We believe that the more the users include the LODE URIs in their Linke Open Data datasets, the more they can enjoy LOD.
##### Contact #####
LODE is created with agile development by design. As it is a living service, we prefer to develop it with the users. Any feedback (bugs, suggestions, supports, collaborations) are welcome to improve it: firstname.lastname@example.org Thank you!
Monday, February 3, 2020 00:00:00
Monday, October 12, 2020 00:00:00
Wednesday, May 5, 2021 00:00:00