Skip to content

Commit

Permalink
[instructions][m]: updated intructions for python, pandas, js, r, plu…
Browse files Browse the repository at this point in the history
…s added ruby and removed sql - refs #115 (#123)
  • Loading branch information
Mikanebu authored and anuveyatsu committed Nov 2, 2017
1 parent 448234c commit 9bb7a67
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 94 deletions.
150 changes: 59 additions & 91 deletions views/_instructions.html
Original file line number Diff line number Diff line change
@@ -1,140 +1,108 @@
{% macro r(dataset) -%}
<p>In order to use Data Package in R follow instructions below:</p>
<div class="highlight">
<pre class="hljs"><code>install.packages(<span class="hljs-string">"rjson"</span>)
<span class="hljs-keyword">library</span>(<span class="hljs-string">"rjson"</span>)
<p>If you are using R here's how to get the data you want quickly loaded:</p>
<pre class="hljs"><code>install.packages(<span class="hljs-string">"jsonlite"</span>)
<span class="hljs-keyword">library</span>(<span class="hljs-string">"jsonlite"</span>)

json_file &lt;- <span class="hljs-string">'{{dataset.path}}/datapackage.json'</span>
json_file &lt;- <span class="hljs-string">"http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json"</span>
json_data &lt;- fromJSON(paste(readLines(json_file), collapse=<span class="hljs-string">""</span>))

<span class="hljs-comment"># see metadata</span>
print(json_data, row.names = <span class="hljs-literal">FALSE</span>)


<span class="hljs-comment"># access csv file by the index starting 1</span>
csv_url = json_data[[<span class="hljs-string">"resources"</span>]][[<span class="hljs-number">1</span>]][[<span class="hljs-string">"path"</span>]]
data &lt;- read.csv(url(csv_url))
<span class="hljs-comment"># access csv file by the index starting from 1</span>
path_to_file = json_data$resources[[<span class="hljs-number">1</span>]]$path
data &lt;- read.csv(url(path_to_file))
print(data)
</code>
</pre>
</div>
</code></pre>
{%- endmacro %}

{% macro pandas(dataset) -%}
<p>Tested with Python 3.5.2</p>

<p>To generate Pandas data frames based on JSON Table Schema descriptors we have to install <code>jsontableschema-pandas</code> plugin.
To load resources from a data package as Pandas data frames use <code>datapackage.push_datapackage</code> function. <i>Storage</i> works as a container for Pandas data frames.</p>
<p>In order to work with Data Packages in Pandas you need to install the Frictionless Data data package library and the pandas extension:</p>

<p>In order to work with Data Packages in Pandas you need to install our packages:</p>

<div class="highlight">
<pre>
$ pip install datapackage
$ pip install jsontableschema-pandas
</pre>
</div>
<p>To get Data Package run following code:</p>

<div class="highlight">
<pre>
<span></span><span class="kn">import</span> <span class="nn">datapackage</span>
<pre class="hljs"><code>pip install datapackage
pip install jsontableschema-pandas
</code></pre>
<p>To get the data run following code:</p>
<pre class="hljs"><code><span class="hljs-keyword">import</span> datapackage

<span class="n">data_url</span> <span class="o">=</span> <span class="s2">"{{dataset.path}}/datapackage.json"</span>
data_url = <span class="hljs-string">"http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json"</span>

<span class="c1"># to load Data Package into storage</span>
<span class="n">storage</span> <span class="o">=</span> <span class="n">datapackage</span><span class="o">.</span><span class="n">push_datapackage</span><span class="p">(</span><span class="n">data_url</span><span class="p">,</span> <span class="s1">'pandas'</span><span class="p">)</span>
<span class="hljs-comment"># to load Data Package into storage</span>
storage = datapackage.push_datapackage(data_url, <span class="hljs-string">'pandas'</span>)

<span class="c1"># to see datasets in this package</span>
<span class="n">storage</span><span class="o">.</span><span class="n">buckets</span>
<span class="hljs-comment"># data frames available (corresponding to data files in original dataset)</span>
storage.buckets

<span class="c1"># you can access datasets inside storage, e.g. the first one:</span>
<span class="n">storage</span><span class="p">[</span><span class="n">storage</span><span class="o">.</span><span class="n">buckets</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
</pre>
</div>
<span class="hljs-comment"># you can access datasets inside storage, e.g. the first one:</span>
storage[storage.buckets[<span class="hljs-number">0</span>]]
</code></pre>

{%- endmacro %}

{% macro python(dataset) -%}
<p>In order to work with Data Packages in Python you need to install our packages:</p>
<p>For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):</p>

<div class="highlight">
<pre>
$ pip install datapackage
</pre>
</div>
<pre class="hljs"><code>pip install datapackage
</code></pre>

<p>To get Data Package into your Python environment, run following code:</p>

<pre class="hljs"><code><span class="hljs-keyword">from</span> datapackage <span class="hljs-keyword">import</span> Package, Resource
<pre class="hljs"><code><span class="hljs-keyword">from</span> datapackage <span class="hljs-keyword">import</span> Package

package = Package(<span class="hljs-string">'{{dataset.path}}/datapackage.json'</span>)
package = Package(<span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span>)

<span class="hljs-comment"># see metadata</span>
print(package.descriptor)

<span class="hljs-comment"># get list of resources</span>
<span class="hljs-comment"># get list of resources:</span>
resources = package.descriptor[<span class="hljs-string">'resources'</span>]
resourceList = [resources[x][<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, len(resources))]
print(resourceList) <span class="hljs-comment"># ["resource name", ...]</span>
print(resourceList)

<span class="hljs-comment"># access csv file by the index starting 0</span>
resource = Resource({<span class="hljs-string">'path'</span>: package.descriptor[<span class="hljs-string">'resources'</span>][<span class="hljs-number">0</span>][<span class="hljs-string">'path'</span>]})
print(resource.read(keyed=<span class="hljs-keyword">True</span>))
data = package.resources[<span class="hljs-number">0</span>].read()
print(data)
</code></pre>
{%- endmacro %}

{% macro javascript(dataset) -%}
<p>To use this dataset in JavaScript, please, follow instructions below:</p>
<p>If you are using JavaScript, please, follow instructions below:</p>
<p>Install <code>data.js</code> module using <code>npm</code>:</p>
<p>
<pre class="hljs">
<code>$ npm install data.js
</code></pre>
</p>
<p>Once the package is installed, use code snippet below:</p>
<p>Once the package is installed, use the following code snippet:</p>

<pre class="hljs">
<code><span class="hljs-keyword">const</span> {Dataset} = <span class="hljs-built_in">require</span>(<span class="hljs-string">'data.js'</span>)
<pre class="hljs"><code><span class="hljs-keyword">const</span> {Dataset} = <span class="hljs-built_in">require</span>(<span class="hljs-string">'data.js'</span>)

<span class="hljs-keyword">const</span> path = <span class="hljs-string">'{{dataset.path}}/datapackage.json'</span>
<span class="hljs-keyword">const</span> path = <span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span>

<span class="hljs-keyword">const</span> dataset = Dataset.load(path)
<span class="hljs-comment">// We're using self-invoking function here as we want to use async-await syntax:</span>
(<span class="hljs-keyword">async</span> () =&gt; {
<span class="hljs-keyword">const</span> dataset = <span class="hljs-keyword">await</span> Dataset.load(path)

<span class="hljs-comment">// get a data file in this dataset</span>
<span class="hljs-comment">// Get the first data file in this dataset</span>
<span class="hljs-keyword">const</span> file = dataset.resources[<span class="hljs-number">0</span>]
<span class="hljs-keyword">const</span> data = file.stream()
<span class="hljs-comment">// Get a raw stream</span>
<span class="hljs-keyword">const</span> stream = <span class="hljs-keyword">await</span> file.stream()
<span class="hljs-comment">// entire file as a buffer (be careful with large files!)</span>
<span class="hljs-keyword">const</span> buffer = <span class="hljs-keyword">await</span> file.buffer
})()
</code></pre>
{%- endmacro %}

{% macro sql(dataset) -%}
<p>In order to work with Data Packages in SQL you need to install our packages:</p>
<div class="highlight"><pre><span></span>$ pip install datapackage
$ pip install jsontableschema-sql
$ pip install sqlalchemy
</pre></div>

<p>To import Data Package to your SQLite Database, run following code:</p>

<div class="highlight">
<pre>
<span></span><span class="kn">import</span> <span class="nn">datapackage</span>
<span class="kn">from</span> <span class="nn">sqlalchemy</span> <span class="kn">import</span> <span class="n">create_engine</span>

<span class="n">data_url</span> <span class="o">=</span> <span class="s1">'{{dataset.path}}/datapackage.json'</span>
<span class="n">engine</span> <span class="o">=</span> <span class="n">create_engine</span><span class="p">(</span><span class="s1">'sqlite:///:memory:'</span><span class="p">)</span>

<span class="c1"># to load Data Package into storage</span>
<span class="n">storage</span> <span class="o">=</span> <span class="n">datapackage</span><span class="o">.</span><span class="n">push_datapackage</span><span class="p">(</span><span class="n">data_url</span><span class="p">,</span> <span class="s1">'sql'</span><span class="p">,</span> <span class="n">engine</span><span class="o">=</span><span class="n">engine</span><span class="p">)</span>
{% macro ruby(dataset) -%}
<p>Install the datapackage library created specially for Ruby language using <code>gem</code>:</p>
<pre class="hljs"><code>gem install datapackage
</code></pre>
<p>Now get the dataset and read the data:</p>
<pre class="hljs"><code><span class="hljs-keyword">require</span> <span class="hljs-string">'datapackage'</span>

<span class="c1"># to see datasets in this package</span>
<span class="n">storage</span><span class="o">.</span><span class="n">buckets</span>
path = <span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span>

<span class="c1"># to execute sql command (assuming data is in "data" folder, name of resource is data and file name is data.csv)</span>
<span class="n">storage</span><span class="o">.</span><span class="n">_Storage__connection</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s1">'select * from data__data___data limit 1;'</span><span class="p">)</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span>
package = DataPackage::Package.new(path)
<span class="hljs-comment"># So package variable contains metadata. You can see it:</span>
puts package

<span class="c1"># description of the table columns</span>
<span class="n">storage</span><span class="o">.</span><span class="n">describe</span><span class="p">(</span><span class="s1">'data__data___data'</span><span class="p">)</span>
</pre>
</div>
<span class="hljs-comment"># Read data itself:</span>
resource = package.resources[<span class="hljs-number">0</span>]
data = resource.read
puts data
</code></pre>
{%- endmacro %}
6 changes: 3 additions & 3 deletions views/showcase.html
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ <h2 class="section-title">Import into your tool</h2>
<li><a data-toggle="pill" href="#pandas">Pandas</a></li>
<li><a data-toggle="pill" href="#python">Python</a></li>
<li><a data-toggle="pill" href="#javascript">JavaScript</a></li>
<li><a data-toggle="pill" href="#sql">SQL</a></li>
<li><a data-toggle="pill" href="#ruby">Ruby</a></li>
</ul>
<!-- content for instructions -->
<div class="tab-content">
Expand Down Expand Up @@ -237,9 +237,9 @@ <h2 class="section-title">Import into your tool</h2>
</div>
</div>

<div id="sql" class="tab-pane fade">
<div id="ruby" class="tab-pane fade">
<div class="part">
{{instructions.sql(dataset)}}
{{instructions.ruby(dataset)}}
</div>
</div>
</div>
Expand Down

0 comments on commit 9bb7a67

Please sign in to comment.