[instructions][m]: updated intructions for python, pandas, js, r, plu…

…s added ruby and removed sql - refs #115 (#123)
datahub-v2 · Nov 2, 2017 · 9bb7a67 · 9bb7a67
1 parent 448234c
commit 9bb7a67
Show file tree

Hide file tree

Showing 2 changed files with 62 additions and 94 deletions.
diff --git a/views/_instructions.html b/views/_instructions.html
@@ -1,140 +1,108 @@
 {% macro r(dataset) -%}
-<p>In order to use Data Package in R follow instructions below:</p>
-<div class="highlight">
-<pre class="hljs"><code>install.packages(<span class="hljs-string">"rjson"</span>)
-<span class="hljs-keyword">library</span>(<span class="hljs-string">"rjson"</span>)
+<p>If you are using R here's how to get the data you want  quickly loaded:</p>
+<pre class="hljs"><code>install.packages(<span class="hljs-string">"jsonlite"</span>)
+<span class="hljs-keyword">library</span>(<span class="hljs-string">"jsonlite"</span>)
 
-json_file &lt;- <span class="hljs-string">'{{dataset.path}}/datapackage.json'</span>
+json_file &lt;- <span class="hljs-string">"http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json"</span>
 json_data &lt;- fromJSON(paste(readLines(json_file), collapse=<span class="hljs-string">""</span>))
 
-<span class="hljs-comment"># see metadata</span>
-print(json_data, row.names = <span class="hljs-literal">FALSE</span>)
-
-
-<span class="hljs-comment"># access csv file by the index starting 1</span>
-csv_url = json_data[[<span class="hljs-string">"resources"</span>]][[<span class="hljs-number">1</span>]][[<span class="hljs-string">"path"</span>]]
-data &lt;- read.csv(url(csv_url))
+<span class="hljs-comment"># access csv file by the index starting from 1</span>
+path_to_file = json_data$resources[[<span class="hljs-number">1</span>]]$path
+data &lt;- read.csv(url(path_to_file))
 print(data)
-</code>
-</pre>
-</div>
+</code></pre>
 {%- endmacro %}
 
 {% macro pandas(dataset) -%}
-<p>Tested with Python 3.5.2</p>
-
-<p>To generate Pandas data frames based on JSON Table Schema descriptors we have to install <code>jsontableschema-pandas</code> plugin.
-To load resources from a data package as Pandas data frames use <code>datapackage.push_datapackage</code> function. <i>Storage</i> works as a container for Pandas data frames.</p>
+<p>In order to work with Data Packages in Pandas you need to install the Frictionless Data data package library and the pandas extension:</p>
 
-<p>In order to work with Data Packages in Pandas you need to install our packages:</p>
 
-<div class="highlight">
-<pre>
-$ pip install datapackage
-$ pip install jsontableschema-pandas
-</pre>
-</div>
-<p>To get Data Package run following code:</p>
-
-<div class="highlight">
-<pre>
-<span></span><span class="kn">import</span> <span class="nn">datapackage</span>
+<pre class="hljs"><code>pip install datapackage
+pip install jsontableschema-pandas
+</code></pre>
+<p>To get the data run following code:</p>
+<pre class="hljs"><code><span class="hljs-keyword">import</span> datapackage
 
-<span class="n">data_url</span> <span class="o">=</span> <span class="s2">"{{dataset.path}}/datapackage.json"</span>
+data_url = <span class="hljs-string">"http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json"</span>
 
-<span class="c1"># to load Data Package into storage</span>
-<span class="n">storage</span> <span class="o">=</span> <span class="n">datapackage</span><span class="o">.</span><span class="n">push_datapackage</span><span class="p">(</span><span class="n">data_url</span><span class="p">,</span> <span class="s1">'pandas'</span><span class="p">)</span>
+<span class="hljs-comment"># to load Data Package into storage</span>
+storage = datapackage.push_datapackage(data_url, <span class="hljs-string">'pandas'</span>)
 
-<span class="c1"># to see datasets in this package</span>
-<span class="n">storage</span><span class="o">.</span><span class="n">buckets</span>
+<span class="hljs-comment"># data frames available (corresponding to data files in original dataset)</span>
+storage.buckets
 
-<span class="c1"># you can access datasets inside storage, e.g. the first one:</span>
-<span class="n">storage</span><span class="p">[</span><span class="n">storage</span><span class="o">.</span><span class="n">buckets</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
-</pre>
-</div>
+<span class="hljs-comment"># you can access datasets inside storage, e.g. the first one:</span>
+storage[storage.buckets[<span class="hljs-number">0</span>]]
+</code></pre>
 
 {%- endmacro %}
 
 {% macro python(dataset) -%}
-<p>In order to work with Data Packages in Python you need to install our packages:</p>
+<p>For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):</p>
 
-<div class="highlight">
-<pre>
-$ pip install datapackage
-</pre>
-</div>
+<pre class="hljs"><code>pip install datapackage
+</code></pre>
 
 <p>To get Data Package into your Python environment, run following code:</p>
 
-<pre class="hljs"><code><span class="hljs-keyword">from</span> datapackage <span class="hljs-keyword">import</span> Package, Resource
+<pre class="hljs"><code><span class="hljs-keyword">from</span> datapackage <span class="hljs-keyword">import</span> Package
 
-package = Package(<span class="hljs-string">'{{dataset.path}}/datapackage.json'</span>)
+package = Package(<span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span>)
 
-<span class="hljs-comment"># see metadata</span>
-print(package.descriptor)
-
-<span class="hljs-comment"># get list of resources</span>
+<span class="hljs-comment"># get list of resources:</span>
 resources = package.descriptor[<span class="hljs-string">'resources'</span>]
 resourceList = [resources[x][<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, len(resources))]
-print(resourceList) <span class="hljs-comment"># ["resource name", ...]</span>
+print(resourceList)
 
-<span class="hljs-comment"># access csv file by the index starting 0</span>
-resource = Resource({<span class="hljs-string">'path'</span>: package.descriptor[<span class="hljs-string">'resources'</span>][<span class="hljs-number">0</span>][<span class="hljs-string">'path'</span>]})
-print(resource.read(keyed=<span class="hljs-keyword">True</span>))
+data = package.resources[<span class="hljs-number">0</span>].read()
+print(data)
 </code></pre>
 {%- endmacro %}
 
 {% macro javascript(dataset) -%}
-<p>To use this dataset in JavaScript, please, follow instructions below:</p>
+<p>If you are using JavaScript, please, follow instructions below:</p>
 <p>Install <code>data.js</code> module using <code>npm</code>:</p>
 <p>
   <pre class="hljs">
   <code>$ npm install data.js
 </code></pre>
 </p>
-<p>Once the package is installed, use code snippet below:</p>
+<p>Once the package is installed, use the following code snippet:</p>
 
-<pre class="hljs">
-  <code><span class="hljs-keyword">const</span> {Dataset} = <span class="hljs-built_in">require</span>(<span class="hljs-string">'data.js'</span>)
+<pre class="hljs"><code><span class="hljs-keyword">const</span> {Dataset} = <span class="hljs-built_in">require</span>(<span class="hljs-string">'data.js'</span>)
 
-  <span class="hljs-keyword">const</span> path = <span class="hljs-string">'{{dataset.path}}/datapackage.json'</span>
+<span class="hljs-keyword">const</span> path = <span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span>
 
-  <span class="hljs-keyword">const</span> dataset = Dataset.load(path)
+<span class="hljs-comment">// We're using self-invoking function here as we want to use async-await syntax:</span>
+(<span class="hljs-keyword">async</span> () =&gt; {
+  <span class="hljs-keyword">const</span> dataset = <span class="hljs-keyword">await</span> Dataset.load(path)
 
-  <span class="hljs-comment">// get a data file in this dataset</span>
+  <span class="hljs-comment">// Get the first data file in this dataset</span>
   <span class="hljs-keyword">const</span> file = dataset.resources[<span class="hljs-number">0</span>]
-  <span class="hljs-keyword">const</span> data = file.stream()
+  <span class="hljs-comment">// Get a raw stream</span>
+  <span class="hljs-keyword">const</span> stream = <span class="hljs-keyword">await</span> file.stream()
+  <span class="hljs-comment">// entire file as a buffer (be careful with large files!)</span>
+  <span class="hljs-keyword">const</span> buffer = <span class="hljs-keyword">await</span> file.buffer
+})()
 </code></pre>
 {%- endmacro %}
 
-{% macro sql(dataset) -%}
-<p>In order to work with Data Packages in SQL you need to install our packages:</p>
-<div class="highlight"><pre><span></span>$ pip install datapackage
-$ pip install jsontableschema-sql
-$ pip install sqlalchemy
-</pre></div>
-
-<p>To import Data Package to your SQLite Database, run following code:</p>
-
-<div class="highlight">
-<pre>
-<span></span><span class="kn">import</span> <span class="nn">datapackage</span>
-<span class="kn">from</span> <span class="nn">sqlalchemy</span> <span class="kn">import</span> <span class="n">create_engine</span>
-
-<span class="n">data_url</span> <span class="o">=</span> <span class="s1">'{{dataset.path}}/datapackage.json'</span>
-<span class="n">engine</span> <span class="o">=</span> <span class="n">create_engine</span><span class="p">(</span><span class="s1">'sqlite:///:memory:'</span><span class="p">)</span>
-
-<span class="c1"># to load Data Package into storage</span>
-<span class="n">storage</span> <span class="o">=</span> <span class="n">datapackage</span><span class="o">.</span><span class="n">push_datapackage</span><span class="p">(</span><span class="n">data_url</span><span class="p">,</span> <span class="s1">'sql'</span><span class="p">,</span> <span class="n">engine</span><span class="o">=</span><span class="n">engine</span><span class="p">)</span>
+{% macro ruby(dataset) -%}
+<p>Install the datapackage library created specially for Ruby language using <code>gem</code>:</p>
+<pre class="hljs"><code>gem install datapackage
+</code></pre>
+<p>Now get the dataset and read the data:</p>
+<pre class="hljs"><code><span class="hljs-keyword">require</span> <span class="hljs-string">'datapackage'</span>
 
-<span class="c1"># to see datasets in this package</span>
-<span class="n">storage</span><span class="o">.</span><span class="n">buckets</span>
+path = <span class="hljs-string">'http://datahub.io/{{dataset.datahub.owner}}/{{dataset.name}}/datapackage.json'</span>
 
-<span class="c1"># to execute sql command (assuming data is in "data" folder, name of resource is data and file name is data.csv)</span>
-<span class="n">storage</span><span class="o">.</span><span class="n">_Storage__connection</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s1">'select * from data__data___data limit 1;'</span><span class="p">)</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span>
+package = DataPackage::Package.new(path)
+<span class="hljs-comment"># So package variable contains metadata. You can see it:</span>
+puts package
 
-<span class="c1"># description of the table columns</span>
-<span class="n">storage</span><span class="o">.</span><span class="n">describe</span><span class="p">(</span><span class="s1">'data__data___data'</span><span class="p">)</span>
-</pre>
-</div>
+<span class="hljs-comment"># Read data itself:</span>
+resource = package.resources[<span class="hljs-number">0</span>]
+data = resource.read
+puts data
+</code></pre>
 {%- endmacro %}
diff --git a/views/showcase.html b/views/showcase.html
@@ -208,7 +208,7 @@ <h2 class="section-title">Import into your tool</h2>
         <li><a data-toggle="pill" href="#pandas">Pandas</a></li>
         <li><a data-toggle="pill" href="#python">Python</a></li>
         <li><a data-toggle="pill" href="#javascript">JavaScript</a></li>
-        <li><a data-toggle="pill" href="#sql">SQL</a></li>
+        <li><a data-toggle="pill" href="#ruby">Ruby</a></li>
       </ul>
       <!-- content for instructions -->
       <div class="tab-content">
@@ -237,9 +237,9 @@ <h2 class="section-title">Import into your tool</h2>
           </div>
         </div>
 
-        <div id="sql" class="tab-pane fade">
+        <div id="ruby" class="tab-pane fade">
           <div class="part">
-            {{instructions.sql(dataset)}}
+            {{instructions.ruby(dataset)}}
           </div>
         </div>
       </div>