-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[instructions][m]: updated intructions for python, pandas, js, r, plu…
- Loading branch information
1 parent
448234c
commit 9bb7a67
Showing
2 changed files
with
62 additions
and
94 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,140 +1,108 @@ | ||
{% macro r(dataset) -%} | ||
<p>In order to use Data Package in R follow instructions below:</p> | ||
<div class="highlight"> | ||
<pre class="hljs"><code>install.packages(<span class="hljs-string">"rjson"</span>) | ||
<span class="hljs-keyword">library</span>(<span class="hljs-string">"rjson"</span>) | ||
<p>If you are using R here's how to get the data you want quickly loaded:</p> | ||
<pre class="hljs"><code>install.packages(<span class="hljs-string">"jsonlite"</span>) | ||
<span class="hljs-keyword">library</span>(<span class="hljs-string">"jsonlite"</span>) | ||
|
||
json_file <- <span class="hljs-string">'{{dataset.path}}/datapackage.json'</span> | ||
json_file <- <span class="hljs-string">"http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json"</span> | ||
json_data <- fromJSON(paste(readLines(json_file), collapse=<span class="hljs-string">""</span>)) | ||
|
||
<span class="hljs-comment"># see metadata</span> | ||
print(json_data, row.names = <span class="hljs-literal">FALSE</span>) | ||
|
||
|
||
<span class="hljs-comment"># access csv file by the index starting 1</span> | ||
csv_url = json_data[[<span class="hljs-string">"resources"</span>]][[<span class="hljs-number">1</span>]][[<span class="hljs-string">"path"</span>]] | ||
data <- read.csv(url(csv_url)) | ||
<span class="hljs-comment"># access csv file by the index starting from 1</span> | ||
path_to_file = json_data$resources[[<span class="hljs-number">1</span>]]$path | ||
data <- read.csv(url(path_to_file)) | ||
print(data) | ||
</code> | ||
</pre> | ||
</div> | ||
</code></pre> | ||
{%- endmacro %} | ||
|
||
{% macro pandas(dataset) -%} | ||
<p>Tested with Python 3.5.2</p> | ||
|
||
<p>To generate Pandas data frames based on JSON Table Schema descriptors we have to install <code>jsontableschema-pandas</code> plugin. | ||
To load resources from a data package as Pandas data frames use <code>datapackage.push_datapackage</code> function. <i>Storage</i> works as a container for Pandas data frames.</p> | ||
<p>In order to work with Data Packages in Pandas you need to install the Frictionless Data data package library and the pandas extension:</p> | ||
|
||
<p>In order to work with Data Packages in Pandas you need to install our packages:</p> | ||
|
||
<div class="highlight"> | ||
<pre> | ||
$ pip install datapackage | ||
$ pip install jsontableschema-pandas | ||
</pre> | ||
</div> | ||
<p>To get Data Package run following code:</p> | ||
|
||
<div class="highlight"> | ||
<pre> | ||
<span></span><span class="kn">import</span> <span class="nn">datapackage</span> | ||
<pre class="hljs"><code>pip install datapackage | ||
pip install jsontableschema-pandas | ||
</code></pre> | ||
<p>To get the data run following code:</p> | ||
<pre class="hljs"><code><span class="hljs-keyword">import</span> datapackage | ||
|
||
<span class="n">data_url</span> <span class="o">=</span> <span class="s2">"{{dataset.path}}/datapackage.json"</span> | ||
data_url = <span class="hljs-string">"http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json"</span> | ||
|
||
<span class="c1"># to load Data Package into storage</span> | ||
<span class="n">storage</span> <span class="o">=</span> <span class="n">datapackage</span><span class="o">.</span><span class="n">push_datapackage</span><span class="p">(</span><span class="n">data_url</span><span class="p">,</span> <span class="s1">'pandas'</span><span class="p">)</span> | ||
<span class="hljs-comment"># to load Data Package into storage</span> | ||
storage = datapackage.push_datapackage(data_url, <span class="hljs-string">'pandas'</span>) | ||
|
||
<span class="c1"># to see datasets in this package</span> | ||
<span class="n">storage</span><span class="o">.</span><span class="n">buckets</span> | ||
<span class="hljs-comment"># data frames available (corresponding to data files in original dataset)</span> | ||
storage.buckets | ||
|
||
<span class="c1"># you can access datasets inside storage, e.g. the first one:</span> | ||
<span class="n">storage</span><span class="p">[</span><span class="n">storage</span><span class="o">.</span><span class="n">buckets</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span> | ||
</pre> | ||
</div> | ||
<span class="hljs-comment"># you can access datasets inside storage, e.g. the first one:</span> | ||
storage[storage.buckets[<span class="hljs-number">0</span>]] | ||
</code></pre> | ||
|
||
{%- endmacro %} | ||
|
||
{% macro python(dataset) -%} | ||
<p>In order to work with Data Packages in Python you need to install our packages:</p> | ||
<p>For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):</p> | ||
|
||
<div class="highlight"> | ||
<pre> | ||
$ pip install datapackage | ||
</pre> | ||
</div> | ||
<pre class="hljs"><code>pip install datapackage | ||
</code></pre> | ||
|
||
<p>To get Data Package into your Python environment, run following code:</p> | ||
|
||
<pre class="hljs"><code><span class="hljs-keyword">from</span> datapackage <span class="hljs-keyword">import</span> Package, Resource | ||
<pre class="hljs"><code><span class="hljs-keyword">from</span> datapackage <span class="hljs-keyword">import</span> Package | ||
|
||
package = Package(<span class="hljs-string">'{{dataset.path}}/datapackage.json'</span>) | ||
package = Package(<span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span>) | ||
|
||
<span class="hljs-comment"># see metadata</span> | ||
print(package.descriptor) | ||
|
||
<span class="hljs-comment"># get list of resources</span> | ||
<span class="hljs-comment"># get list of resources:</span> | ||
resources = package.descriptor[<span class="hljs-string">'resources'</span>] | ||
resourceList = [resources[x][<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, len(resources))] | ||
print(resourceList) <span class="hljs-comment"># ["resource name", ...]</span> | ||
print(resourceList) | ||
|
||
<span class="hljs-comment"># access csv file by the index starting 0</span> | ||
resource = Resource({<span class="hljs-string">'path'</span>: package.descriptor[<span class="hljs-string">'resources'</span>][<span class="hljs-number">0</span>][<span class="hljs-string">'path'</span>]}) | ||
print(resource.read(keyed=<span class="hljs-keyword">True</span>)) | ||
data = package.resources[<span class="hljs-number">0</span>].read() | ||
print(data) | ||
</code></pre> | ||
{%- endmacro %} | ||
|
||
{% macro javascript(dataset) -%} | ||
<p>To use this dataset in JavaScript, please, follow instructions below:</p> | ||
<p>If you are using JavaScript, please, follow instructions below:</p> | ||
<p>Install <code>data.js</code> module using <code>npm</code>:</p> | ||
<p> | ||
<pre class="hljs"> | ||
<code>$ npm install data.js | ||
</code></pre> | ||
</p> | ||
<p>Once the package is installed, use code snippet below:</p> | ||
<p>Once the package is installed, use the following code snippet:</p> | ||
|
||
<pre class="hljs"> | ||
<code><span class="hljs-keyword">const</span> {Dataset} = <span class="hljs-built_in">require</span>(<span class="hljs-string">'data.js'</span>) | ||
<pre class="hljs"><code><span class="hljs-keyword">const</span> {Dataset} = <span class="hljs-built_in">require</span>(<span class="hljs-string">'data.js'</span>) | ||
|
||
<span class="hljs-keyword">const</span> path = <span class="hljs-string">'{{dataset.path}}/datapackage.json'</span> | ||
<span class="hljs-keyword">const</span> path = <span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span> | ||
|
||
<span class="hljs-keyword">const</span> dataset = Dataset.load(path) | ||
<span class="hljs-comment">// We're using self-invoking function here as we want to use async-await syntax:</span> | ||
(<span class="hljs-keyword">async</span> () => { | ||
<span class="hljs-keyword">const</span> dataset = <span class="hljs-keyword">await</span> Dataset.load(path) | ||
|
||
<span class="hljs-comment">// get a data file in this dataset</span> | ||
<span class="hljs-comment">// Get the first data file in this dataset</span> | ||
<span class="hljs-keyword">const</span> file = dataset.resources[<span class="hljs-number">0</span>] | ||
<span class="hljs-keyword">const</span> data = file.stream() | ||
<span class="hljs-comment">// Get a raw stream</span> | ||
<span class="hljs-keyword">const</span> stream = <span class="hljs-keyword">await</span> file.stream() | ||
<span class="hljs-comment">// entire file as a buffer (be careful with large files!)</span> | ||
<span class="hljs-keyword">const</span> buffer = <span class="hljs-keyword">await</span> file.buffer | ||
})() | ||
</code></pre> | ||
{%- endmacro %} | ||
|
||
{% macro sql(dataset) -%} | ||
<p>In order to work with Data Packages in SQL you need to install our packages:</p> | ||
<div class="highlight"><pre><span></span>$ pip install datapackage | ||
$ pip install jsontableschema-sql | ||
$ pip install sqlalchemy | ||
</pre></div> | ||
|
||
<p>To import Data Package to your SQLite Database, run following code:</p> | ||
|
||
<div class="highlight"> | ||
<pre> | ||
<span></span><span class="kn">import</span> <span class="nn">datapackage</span> | ||
<span class="kn">from</span> <span class="nn">sqlalchemy</span> <span class="kn">import</span> <span class="n">create_engine</span> | ||
|
||
<span class="n">data_url</span> <span class="o">=</span> <span class="s1">'{{dataset.path}}/datapackage.json'</span> | ||
<span class="n">engine</span> <span class="o">=</span> <span class="n">create_engine</span><span class="p">(</span><span class="s1">'sqlite:///:memory:'</span><span class="p">)</span> | ||
|
||
<span class="c1"># to load Data Package into storage</span> | ||
<span class="n">storage</span> <span class="o">=</span> <span class="n">datapackage</span><span class="o">.</span><span class="n">push_datapackage</span><span class="p">(</span><span class="n">data_url</span><span class="p">,</span> <span class="s1">'sql'</span><span class="p">,</span> <span class="n">engine</span><span class="o">=</span><span class="n">engine</span><span class="p">)</span> | ||
{% macro ruby(dataset) -%} | ||
<p>Install the datapackage library created specially for Ruby language using <code>gem</code>:</p> | ||
<pre class="hljs"><code>gem install datapackage | ||
</code></pre> | ||
<p>Now get the dataset and read the data:</p> | ||
<pre class="hljs"><code><span class="hljs-keyword">require</span> <span class="hljs-string">'datapackage'</span> | ||
|
||
<span class="c1"># to see datasets in this package</span> | ||
<span class="n">storage</span><span class="o">.</span><span class="n">buckets</span> | ||
path = <span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span> | ||
|
||
<span class="c1"># to execute sql command (assuming data is in "data" folder, name of resource is data and file name is data.csv)</span> | ||
<span class="n">storage</span><span class="o">.</span><span class="n">_Storage__connection</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s1">'select * from data__data___data limit 1;'</span><span class="p">)</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span> | ||
package = DataPackage::Package.new(path) | ||
<span class="hljs-comment"># So package variable contains metadata. You can see it:</span> | ||
puts package | ||
|
||
<span class="c1"># description of the table columns</span> | ||
<span class="n">storage</span><span class="o">.</span><span class="n">describe</span><span class="p">(</span><span class="s1">'data__data___data'</span><span class="p">)</span> | ||
</pre> | ||
</div> | ||
<span class="hljs-comment"># Read data itself:</span> | ||
resource = package.resources[<span class="hljs-number">0</span>] | ||
data = resource.read | ||
puts data | ||
</code></pre> | ||
{%- endmacro %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters