(New Office) White House Initiatives: create scraper for these offices (Phase 1) #163

higorspinto · 2020-05-26T23:04:18Z

The White House Initiatives are among the list of new offices whose datasets need to be ingested into the data portal. For this to happen, we need to create a new scraper to crawl/parse the available webpages of the office.

https://sites.ed.gov/hispanic-initiative/
https://sites.ed.gov/whieeaa/
https://sites.ed.gov/whhbcu/

Acceptance Criteria

We have a functional crawler that crawls through the webpages of the offices
We have a functional parser that understands the page structures and generates structured data
Datasets are produced when the scraper is run

Tasks

Identify the possible page structures in the target site
Write one or multiple parsers that cover as many cases as possible
Test if it runs well within the pipeline

Jira Card

higorspinto changed the title ~~(New Office) White House Initiatives on: create scraper for these offices~~ (New Office) White House Initiatives: create scraper for these offices May 26, 2020

higorspinto changed the title ~~(New Office) White House Initiatives: create scraper for these offices~~ (New Office) White House Initiatives: create scraper for these offices (Phase 1) May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(New Office) White House Initiatives: create scraper for these offices (Phase 1) #163

(New Office) White House Initiatives: create scraper for these offices (Phase 1) #163

higorspinto commented May 26, 2020 •

edited by osahon-okungbowa

Loading

(New Office) White House Initiatives: create scraper for these offices (Phase 1) #163

(New Office) White House Initiatives: create scraper for these offices (Phase 1) #163

Comments

higorspinto commented May 26, 2020 • edited by osahon-okungbowa Loading

Acceptance Criteria

Tasks

higorspinto commented May 26, 2020 •

edited by osahon-okungbowa

Loading