Linked CSV

Many open data sets are essentially tables, or sets of tables, which follow the same regular structure. This document describes a set of conventions for CSV files that enable them to be linked together and to be interpreted as RDF.

Structure

The structure of a CSV file is a header followed by a number of records. The header is the first line of the file, while the remaining lines are the records. Both the header and the records contain fields separated by commas. These terms are used as defined in [[RFC4180]]. Within this document, a column is a set of fields which are at the same index within their respective rows and the column name is the value of the field in the header for that column. For example, the following is a valid CSV file which lists country codes and names:

country,name
AD,Andorra
AF,Afghanistan
AI,Anguilla
AL,Albania

All valid CSV files are valid linked CSV files, so the above example is also a valid linked CSV file. It has four records and two columns, whose names are country and name.

Valid CSV files MUST use CRLF to indicate the ends of lines (and thus the separation of rows). Linked CSV parsers SHOULD provide a warning if CR or LF is used for line endings, and SHOULD recover by parsing the CSV file with those line endings.

Spreadsheet programs such as Excel or OpenOffice Calc typically use the line ending used by the platform on which they are deployed (eg simply LF on Mac OS X). Allowing other line endings for linked CSV is intended to make it easier to create such documents within spreadsheet programs.

The aim of processing a linked CSV file is to generate information about a set of entities. An entity may be represented internally by the application as an object or a resource. Each entity has a number of properties, which may have one or more values.

Records within a linked CSV file may be of two different types: prolog lines (see ) and data lines. Data lines can only come after the last prolog line, if there is one. A data line is a line that contains data about an entity. A single entity may be described across multiple data lines. For each data line describing an entity, each value within the line corresponds to a value of a property of that entity (the property being labelled through the corresponding header).

The JSON version of this file, as defined in , is:

[{
  "country": "AD",
  "name": "Andorra"
},{
  "country": "AF",
  "name": "Afghanistan"
},{
  "country": "AI",
  "name": "Anguilla"
},{
  "country": "AL",
  "name": "Albania"
}]

Linked CSV files must be encoded as UTF-8.

It isn't usually easy to set the encoding of a CSV file when exporting from normal spreadsheet programs. It would be nice if there were a way of detecting the encoding. Perhaps it could be sniffed based on the initial characters #, in the file (with UTF-8 assumed if those aren't the initial characters)?

Identifiers

Linked CSV is built around the concept of using URIs to name things. Every record, column, and even slices of data, in a linked CSV file is addressable using URI Identifiers for the text/csv Media Type. For example, if the linked CSV file is accessed at http://example.org/countries, the first record in the CSV file above, which happens to be the first data line within the linked CSV file (which describes Andorra) is addressable with the URI:

http://example.org/countries#row:0

However, this addressing merely identifies the records within the linked CSV file, not the entities that the record describes. This distinction is important for two reasons:

a single entity may be described by multiple records within the linked CSV file
addressing entities and records separately enables us to make statements about the source of the information within a particular record

By default, each data line describes an entity, each entity is described by a single data line, and there is no way to address the entities. However, adding a $id column enables entities to be given identifiers. These identifiers are always URIs, and they are interpreted relative to the location of the linked CSV file. The $id column may be positioned anywhere but by convention it should be the first column (unless there is a # column, in which case it should be the second). For example:

$id,country,name
#AD,AD,     Andorra
#AD,AD,     Principality of Andorra
#AF,AF,     Afghanistan
#AF,AF,     Islamic Republic of Afghanistan

For the purpose of clarity within this document, whitespace has been added to this and the remainder of the examples so that headers and values line up correctly. Whitespace within linked CSV files is normally significant.

The prefix $ is used because the prefix @ is interpreted as indicating a formula when entered into spreadsheet programs such as Excel.

This linked CSV file contains two entities, which have the identifiers http://example.org/countries#AD and http://example.org/countries#AF. The first is described by the first two data lines and the second by the next two. The JSON generated for this file would be:

[{
  "@id": "http://example.org/countries#AD",
  "country": "AD",
  "name": [ "Andorra", "Principality of Andorra" ]
},{
  "@id": "http://example.org/countries#AF",
  "country": "AF",
  "name": [ "Afghanistan", "Islamic Republic of Afghanistan" ]
}]

and the RDF would be:

@prefix rel: <http://www.iana.org/assignments/relation/>
PREFIX : <http://example.org/countries#>
<http://example.org/countries#AD>
	rel:describedby <http://example.org/countries#row:0> ;
	:country "AD" ;
	:name "Andorra" , "Principality of Andorra" ;
	.

<http://example.org/countries#AF>
	rel:describedby <http://example.org/countries#row:1> ;
	:country "AF" ;
	:name "Afghanistan" , "Islamic Republic of Afghanistan" ;
	.

As shown by this example, when multiple data lines describe a single entity, a given property takes only the distinct values within the column for that entity rather than being duplicated. However, the file can be made shorter if it doesn't contain duplicates in the first case; the following CSV is equivalent:

$id,country,name
#AD,AD,     Andorra
#AD,,       Principality of Andorra
#AF,AF,     Afghanistan
#AF,,       Islamic Republic of Afghanistan

Interpreting Identifiers

By default, properties within the linked CSV file are assumed to apply to the thing described by the resource located by the URI identifier. For example, if the file contained identifier URIs that were Wikipedia pages, as in

$id,                                     country,name
http://en.wikipedia.org/wiki/Andorra,    AD,     Andorra
http://en.wikipedia.org/wiki/Andorra,    AD,     Principality of Andorra
http://en.wikipedia.org/wiki/Afghanistan,AF,     Afghanistan
http://en.wikipedia.org/wiki/Afghanistan,AF,     Islamic Republic of Afghanistan

applications should interpret the properties labelled country and name to apply to the countries described by those Wikipedia pages, not the Wikipedia pages themselves. In general this distinction does not matter, but it may do when using linked CSV to describe resources that are available on the web. Individual properties may be used differently, and apply to the content found at the referenced URI; how they are interpreted should be incorporated into the property documentation.

Prolog Lines

A linked CSV file can contain any number of prolog lines. Prolog lines describe additional processing of the linked CSV file, usually related to the file or some portion or the file, or related to some or all of the columns. Prolog lines can only be present if there is a column named #; any record that has a value in that column is a prolog line, and the value for that column indicates how the line should be interpreted:

type: This value indicates that the line provides information about the type of the values in each column
lang: This value indicates that the line provides information about the language of the values in each column
meta: This value indicates that the line provides metadata about the linked CSV file or rows within it
url: This value indicates that the line provides global URIs for the properties in each column
see: This value indicates that the line provides details of additional resources that may provide information about some or all of the entities whose identifiers are given within the column
empty: Having no value in the # column indicates that the line is a data line rather than a prolog line

Prolog lines must all be at the start of a linked CSV file. Any prolog lines that appear after the first data line must be ignored by processors. Prolog lines of different types can appear in any order.

Ignoring prolog lines that appear after the first data line aids streaming processing of linked CSV files, the hiding of prolog information within spreadsheet applications, and ease of reading for humans.

Could add other kinds of prolog lines. The thing to do is probably to have a separate registry of prolog line types that provide for configuration of the processing that should be done on the values in particular columns. For example, you could have prolog lines that enable to to specify a separator used within the values, to enable the creation of list values, or a date-syntax line that enabled you to specify the date syntax used in the values in that particular column.

Property Types

In the simple CSV example we have been looking at, all the values are strings, which works fine for country codes and names. We will now introduce a separate file, http://example.org/af-population, which initially looks like:

country,year,population
AF,     1960,9616353
AF,     1961,9799379
AF,     1962,9989846
AF,     1963,10188299

In this example, the property year holds years and the property population holds an integer. To indicate the types of these properties, we can add a type prolog line. The value of a type prolog line indicates the type of the values in the column that it is in. The type must be one of:

string
url
integer
decimal
double
boolean (true or false)
time — values of this type can be any of the date/time syntaxes supported by XML Schema, namely gYear, gMonth, gDay, gYearMonth, gMonthDay, date, time, dateTime

If there is no type indication in the header for the column, the default type for a particular value depends on the syntax of the value, as follows:

values matching XML Schema date/time syntax (aside from xs:gYear) are assumed to be date/time values
values matching [0-9]+ are assumed to be integers
values matching [0-9]+\.[0-9]+ are assumed to be decimal numbers
values matching [0-9]+(\.[0-9]+)?[eE][-+][0-9]+(\.[0-9]+)? are assuming to be floating point numbers
the value true is assumed to be the boolean value true, and the value false the boolean value false
otherwise, the value is assumed to be a string

Could enable quoting of values using """...""" delimited values within the CSV?

In the example above, we can add a type prolog line to indicate the types of the properties that are created. We can also change the country column to use the Wikipedia URIs that we previously used for the countries, and indicate that this is being done by giving its type as url. Since the population figures are all syntactically integers, there is no need to annotate that column with a type, but such an annotation can be added for clarity:

#,   country,                                 year,population
type,url,                                     time,integer
,    http://en.wikipedia.org/wiki/Afghanistan,1960,9616353
,    http://en.wikipedia.org/wiki/Afghanistan,1961,9799379
,    http://en.wikipedia.org/wiki/Afghanistan,1962,9989846
,    http://en.wikipedia.org/wiki/Afghanistan,1963,10188299

Conversion to JSON cannot preserve all this information as it does not support date/time datatypes. The resulting data would include the years as integers:

[{
  "country": "http://en.wikipedia.org/wiki/Afghanistan",
  "year": 1960,
  "population": 9616353
}, {
  "country": "http://en.wikipedia.org/wiki/Afghanistan",
  "year": 1961,
  "population": 9799379
}, {
  "country": "http://en.wikipedia.org/wiki/Afghanistan",
  "year": 1962,
  "population": 9989846
}, {
  "country": "http://en.wikipedia.org/wiki/Afghanistan",
  "year": 1963,
  "population": 10188299
}]

The mapping to RDF can preserve the datatype information:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#>
@prefix rel: <http://www.iana.org/assignments/relation/>
@prefix : <http://example.org/af-population#>

[ rel:describedby <http://example.org/af-population#row:0> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1960"^^xsd:gYear ;
  :population 9616353 ]

[ rel:describedby <http://example.org/af-population#row:1> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1961"^^xsd:gYear ;
  :population 9799379 ]

[ rel:describedby <http://example.org/af-population#row:2> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1962"^^xsd:gYear ;
  :population 9989846 ]

[ rel:describedby <http://example.org/af-population#row:3> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1963"^^xsd:gYear ;
  :population 10188299 ]

In generating the Turtle, the syntax of the values in the year column is used to determine what kind of date/time value each value should be mapped on to. Without the time annotation, the values would be mapped to integers.

Languages

A lang prolog line indicates the language used within each column. For example, the file that contains the country details can also be expanded to include the names of the countries in other languages:

  #,   $id,                                     country,english name,                   french name
  lang,,                                        ,       en,                             fr
  ,    http://en.wikipedia.org/wiki/Andorra,    AD,     Andorra,                        Andorre
  ,    http://en.wikipedia.org/wiki/Andorra,    ,       Principality of Andorra,
  ,    http://en.wikipedia.org/wiki/Afghanistan,AF,     Afghanistan,                    Afghanistan
  ,    http://en.wikipedia.org/wiki/Afghanistan,,       Islamic Republic of Afghanistan,

In this case, the values of the english name column are labelled as being in English while those in the french name column are labelled as being in French. The JSON would look like:

[{
  "@id": "http://en.wikipedia.org/wiki/Andorra",
  "country": "AD",
  "english name": [{
    "value": "Andorra",
    "lang": "en"
  }, {
    "value": "Principality of Andorra",
    "lang": "en"
  }],
  "french name": {
    "value": "Andorre",
    "lang": "fr"
  }
},{
  "@id": "http://en.wikipedia.org/wiki/Afghanistan",
  "country": "AF",
  "english name": [{
    "value": "Afghanistan",
    "lang": "en"
  }, {
    "value": "Islamic Republic of Afghanistan",
    "lang": "en"
  }],
  "french name": {
    "value": "Afghanistan",
    "lang": "fr"
  }
}]

The Turtle would look like:

@prefix rel: <http://www.iana.org/assignments/relation/>
@prefix : <http://example.org/af-population#>

<http://en.wikipedia.org/wiki/Andorra>
  rel:describedby 
    <http://example.org/countries#row:0>, 
    <http://example.org/countries#row:1> ;
  :country "AD" ;
  :english.name "Andorra"@en, "Principality of Andorra"@en ;
  :french.name "Andorre"@fr ;
  .

<http://en.wikipedia.org/wiki/Afghanistan>
  rel:describedby 
    <http://example.org/countries#row:2>, 
    <http://example.org/countries#row:3> ;
  :country "AF" ;
  :english.name "Afghanistan"@en , "Islamic Republic of Afghanistan"@en ;
  :french.name "Afghanistan"@fr ;
  .

Global Property Identifiers

When there are separate columns providing values in different languages for the same property, or When a large dataset is split across multiple files, as in the example here where the set of population figures is split across multiple country-specific files such as http://example.org/af-population, it is useful to be able to indicate when the separate labels in the CSV headers refer to the same property of a given entity.

To facilitate this, url prolog lines can indicate global identifiers for the properties. These lines contain URIs which are resolved relative to the location of the file itself. In the previous example, the two headers english name and french name both refer to the same name property. We can use a url line to indicate that these both refer to the same property:

#,   $id,                                     country,english name,                   french name
url, ,                                        ,       #name,                          #name
lang,,                                        ,       en,                             fr
,    http://en.wikipedia.org/wiki/Andorra,    AD,     Andorra,                        Andorre
,    http://en.wikipedia.org/wiki/Andorra,    ,       Principality of Andorra,
,    http://en.wikipedia.org/wiki/Afghanistan,AF,     Afghanistan,                    Afghanistan
,    http://en.wikipedia.org/wiki/Afghanistan,,       Islamic Republic of Afghanistan,

When this is converted to JSON, the URI for the property is processed to give just the property name:

[{
  "@id": "http://example.org/countries#AD",
  "country": "AD",
  "name": [{
    "value": "Andorra",
    "lang": "en"
  }, {
    "value": "Andorre",
    "lang": "fr"
  }, {
    "value": "Principality of Andorra",
    "lang": "en"
  }]
},{
  "@id": "http://example.org/countries#AF",
  "country": "AF",
  "name": [{
    "value": "Afghanistan",
    "lang": "en"
  }, {
    "value": "Afghanistan",
    "lang": "fr"
  }, {
    "value": "Islamic Republic of Afghanistan",
    "lang": "en"
  }]
}]

In the conversion to RDF, the RDF includes the labels for the properties:

@prefix rel: <http://www.iana.org/assignments/relation/>
@prefix rdfs: <...>
@prefix : <http://example.org/af-population#>

<http://en.wikipedia.org/wiki/Andorra>
  rel:describedby 
    <http://example.org/countries#row:0>, 
    <http://example.org/countries#row:1> ;
  :country "AD" ;
  :name "Andorra"@en, "Andorre"@fr, "Principality of Andorra"@en ;
  .

<http://en.wikipedia.org/wiki/Afghanistan>
  rel:describedby 
    <http://example.org/countries#row:2>, 
    <http://example.org/countries#row:3> ;
  :country "AF" ;
  :name "Afghanistan"@en , "Afghanistan"@fr, "Islamic Republic of Afghanistan"@en ;
  .

:name
  rdfs:label "english name" , "french name" ;
  .

When properties are shared across multiple files, the URIs in the url prolog line should resolve to the same URL. For example, if we wanted to indicate that the country property within the af-population file means the same as the country property within the ad-population file, we could associate them both with the same URI by adding the same url prolog line in both files:

#,   country,                                  year,                population
type,url,                                      time,                integer
url, /def/statistics#country,                  /def/statistics#year,/def/statistics#population
,    http://en.wikipedia.org/wiki/Afghanistan, 1960,                9616353
,    http://en.wikipedia.org/wiki/Afghanistan, 1961,                9799379
,    http://en.wikipedia.org/wiki/Afghanistan, 1962,                9989846
,    http://en.wikipedia.org/wiki/Afghanistan, 1963,                10188299

The resulting RDF would use these URLs for the country, year and population properties:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#>
@prefix rel: <http://www.iana.org/assignments/relation/>
@prefix : <http://example.org/def/statistics#>

[ rel:describedby <http://example.org/af-population#row:2> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1960"^^xsd:gYear ;
  :population 9616353 ]

[ rel:describedby <http://example.org/af-population#row:3> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1961"^^xsd:gYear ;
  :population 9799379 ]

[ rel:describedby <http://example.org/af-population#row:4> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1962"^^xsd:gYear ;
  :population 9989846 ]

[ rel:describedby <http://example.org/af-population#row:5> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1963"^^xsd:gYear ;
  :population 10188299 ]

Similarly, the resulting XML will use the property URIs to determine the namespace URIs for the child elements of the <csv:item> elements representing each entity:

<csv:collection xml:base="http://example.org/af-population"
  xmlns:csv="http://example.org/linked-csv"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="http://example.org/def/statistics#">
  <csv:item>
    <country href="http://en.wikipedia.org/wiki/Afghanistan" />
    <year xsi:type="xsd:gYear">1960</year>
    <population xsi:type="xsd:integer">9616353</population>
  </csv:item>
  <csv:item>
    <country href="http://en.wikipedia.org/wiki/Afghanistan" />
    <year xsi:type="xsd:gYear">1961</year>
    <population xsi:type="xsd:integer">9799379</population>
  </csv:item>
  <csv:item>
    <country href="http://en.wikipedia.org/wiki/Afghanistan" />
    <year xsi:type="xsd:gYear">1962</year>
    <population xsi:type="xsd:integer">9989846</population>
  </csv:item>
  <csv:item>
    <country href="http://en.wikipedia.org/wiki/Afghanistan" />
    <year xsi:type="xsd:gYear">1963</year>
    <population xsi:type="xsd:integer">10188299</population>
  </csv:item>
</csv:collection>

Applications may attempt to resolve the URIs in the url prolog lines; if they do so, this should resolve into a linked CSV file that describes the properties. In this example, http://example.org/def/statistics should contain something like:

$id,        label,     description
#country,   country,   "The country for which the population is being provided."
#year,      year,      "The year for which the population is being provided."
#population,population,"The number of people populating the given country in the given year."

To make it easier to use common vocabularies, a field within the URL prolog line may contain a CURIE (in the form prefix:name) as a shorthand for a URL. If a field within the URL prolog line starts with a recognised prefix, that prefix is expanded to its namespace and prepended to the remainder of the CURIE (after the colon). The recognised prefixes are:

prefix	namespace	description
Generic Vocabularies
`rel`	`http://www.iana.org/assignments/relation/`	IANA Link Relations
`schema`	`http://schema.org/`	schema.org
Metadata Vocabularies
`dc`	`http://purl.org/dc/terms/`	Dublin Core Metadata Terms
`dct`	`http://purl.org/dc/terms/`	Dublin Core Metadata Terms
`cc`	`http://creativecommons.org/ns#`	Creative Commons Rights Expression Language
`void`	`http://rdfs.org/ns/void#`	VoID
`wdrs`	`http://www.w3.org/2007/05/powder-s#`	POWDER-S
Schema Vocabularies
`rdf`	`http://www.w3.org/1999/02/22-rdf-syntax-ns#`	RDF
`rdfs`	`http://www.w3.org/2000/01/rdf-schema#`	RDF Schema
`owl`	`http://www.w3.org/2002/07/owl#`	OWL
`skos`	`http://www.w3.org/2004/02/skos/core#`	SKOS
`skos-xl`	`http://www.w3.org/2008/05/skos-xl#`	SKOS Extensions for Labels

This list is largely based on hunches about which vocabularies are going to be useful in linked CSV documents, coupled with some dogma in pushing schema.org as the vocabulary to rule them all. An alternative would be to define the same prefixes as listed in http://www.w3.org/2011/rdfa-context/rdfa-1.1.

There's no support for declaring your own prefixes or declaring a default prefix/vocabulary.

Linked CSV files that describe the properties used within other linked CSV files SHOULD use the RDFS vocabulary, which contains properties such as rdfs:label and rdfs:comment, to provide details about the properties. For example:

$id,        label,     description
url,        rdfs:label,rdfs:comment
#country,   country,   "The country for which the population is being provided."
#year,      year,      "The year for which the population is being provided."
#population,population,"The number of people populating the given country in the given year."

Self Description

Linked CSV files should be self-describing. They should include important metadata about the source of the data they contain, their license conditions, and links to other files that contain non-essential supplementary information. Although the file might be described within other files, and metadata might be made available through the HTTP headers, it is safer to embed this metadata within the file as there is no guarantee that metadata stored outside the file will be available as the data is passed around.

To provide metadata about the linked CSV document, the file has to contain a meta prolog line, which provides metadata about the file or records within the file. If there is a $id column, the value within that column indicates what the metadata is about: an empty value (or a missing $id column) indicates the metadata is associated with the file as a whole.

The remainder of each metadata line should hold the following values, in order:

a label for a property of the entity indicated in the $id column
optionally, a type or language annotation for the property, which is interpreted in the same way as the values in a type or lang prolog line
a value, the value of the property for that entity
optionally, a URI that is the global identifier for the property, which is interpreted in the same way as the values in a url prolog line

In our example, the http://example.org/af-population file may be part of a series of files available for different countries, and the metadata provide a pointer to an index document (http://example.org/populations) and to a license for the file:

#,   country,                                 year,population
type,url,                                     time,integer
meta,index,                                   url, /populations
meta,license,                                 url, http://creativecommons.org/publicdomain/mark/1.0/
,    http://en.wikipedia.org/wiki/Afghanistan,1960,9616353
,    http://en.wikipedia.org/wiki/Afghanistan,1961,9799379
,    http://en.wikipedia.org/wiki/Afghanistan,1962,9989846
,    http://en.wikipedia.org/wiki/Afghanistan,1963,10188299

In this example, none of the remaining data lines have identifiers themselves. The corresponding JSON would be:

[{
  "@id": "http://example.org/af-population",
  "index": "http://example.org/populations",
  "license": "http://creativecommons.org/publicdomain/mark/1.0/"
}, {
  "country": "http://en.wikipedia.org/wiki/Afghanistan",
  "year": 1960,
  "population": 9616353
}, {
  "country": "http://en.wikipedia.org/wiki/Afghanistan",
  "year": 1961,
  "population": 9799379
}, {
  "country": "http://en.wikipedia.org/wiki/Afghanistan",
  "year": 1962,
  "population": 9989846
}, {
  "country": "http://en.wikipedia.org/wiki/Afghanistan",
  "year": 1963,
  "population": 10188299
}]

The corresponding RDF would be:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#>
@prefix rel: <http://www.iana.org/assignments/relation/>
@prefix : <http://example.org/af-population#>

<>
  rel:describedby 
    <http://example.org/af-population#row:1>, 
    <http://example.org/af-population#row:2> ;
  :index <populations> ;
  :license <http://creativecommons.org/publicdomain/mark/1.0/> ;
  .

[ rel:describedby <http://example.org/af-population#row:3> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1960"^^xsd:gYear ;
  :population 9616353 ]

[ rel:describedby <http://example.org/af-population#row:4> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1961"^^xsd:gYear ;
  :population 9799379 ]

[ rel:describedby <http://example.org/af-population#row:5> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1962"^^xsd:gYear ;
  :population 9989846 ]

[ rel:describedby <http://example.org/af-population#row:6> ;
  :country <http://en.wikipedia.org/wiki/Afghanistan> ;
  :year "1963"^^xsd:gYear ;
  :population 10188299 ]

Metadata prolog lines can also be used to provide metadata about other parts of the linked CSV file by using URI Identifiers for the text/csv Media Type. These can be used to refer to rows, columns, and sets of rows that have common value(s) for particular fields. For example:

#,   $id,                                     country,english name,                              french name
url, ,                                        ,       #name,                                     #name
lang,,                                        ,       en,                                        fr
meta,#col:english%20name,                     note,   "contains both official and popular names",
,    http://en.wikipedia.org/wiki/Andorra,    AD,     Andorra,                                   Andorre
,    http://en.wikipedia.org/wiki/Andorra,    ,       Principality of Andorra,
,    http://en.wikipedia.org/wiki/Afghanistan,AF,     Afghanistan,                               Afghanistan
,    http://en.wikipedia.org/wiki/Afghanistan,,       Islamic Republic of Afghanistan,

Additional Data Sources

A prolog line in which the value of the # column is see provides pointers to other linked CSV files that describe the resources in appropriate columns.

Within a see line, columns that hold URI values (having url in the corresponding value of the type prolog line), can reference additional linked CSV files that describe the entities identified by the URIs in that column. For example, the population data within http://example.org/af-populations references a country described within http://example.org/countries. The population file would include:

#,   country,                                 year,population
type,url,                                     time,integer
see, /countries,                              ,
,    http://en.wikipedia.org/wiki/Afghanistan,1960,      9616353
,    http://en.wikipedia.org/wiki/Afghanistan,1961,      9799379
,    http://en.wikipedia.org/wiki/Afghanistan,1962,      9989846
,    http://en.wikipedia.org/wiki/Afghanistan,1963,      10188299

This indicates that an application can look within http://example.org/countries to find more information about some or all of the URIs within the country column. The URIs within the $id column in that file should match the URIs within the country column in this file.

If there is no type prolog line, a value in a see prolog line indicates that the column holds URIs (as if the type was set to url). If there is a type prolog line but the type of the column has a value other than url, values in the see prolog lines for that column are ignored.

This technique can also be used to point to additional data about the entities described within the linked CSV file itself. For example if another publisher also published a linked CSV file containing information about countries at http://other.example.com/countries (perhaps providing their names in other languages or describing their capital cities), we could reference it from the http://example.org/countries file as follows:

#,   $id,                                     country,english name,                   french name
url, ,                                        ,       #name,                          #name
lang,,                                        ,       en,                             fr
see, http://other.example.com/countries,      ,       ,
,    http://en.wikipedia.org/wiki/Andorra,    AD,     Andorra,                        Andorre
,    http://en.wikipedia.org/wiki/Andorra,    ,       Principality of Andorra,
,    http://en.wikipedia.org/wiki/Afghanistan,AF,     Afghanistan,                    Afghanistan
,    http://en.wikipedia.org/wiki/Afghanistan,,       Islamic Republic of Afghanistan,

Introduction

Structure

Identifiers

Interpreting Identifiers

Prolog Lines

Property Types

Languages

Global Property Identifiers

Self Description

Additional Data Sources

Packaging

Mapping to JSON

Parsing Linked CSV as Simple JSON

Parsing Linked CSV as JSON-LD

Mapping to XML

Parsing Linked CSV as XML

Mapping to RDF

Parsing Linked CSV as RDF

Publishing RDF as Linked CSV

Acknowledgements