Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis for Logging system #2

Open
rufuspollock opened this issue Jun 4, 2020 · 1 comment
Open

Analysis for Logging system #2

rufuspollock opened this issue Jun 4, 2020 · 1 comment
Assignees

Comments

@rufuspollock
Copy link
Member

rufuspollock commented Jun 4, 2020

Logging and reporting is a crucial aspect of a data factory system like this.

  • What kind of logs
  • Log format
  • Log storage
  • Log access

Job stories

When a Run is initiated by an Operator they want to see it is running and be notified of application and (meta)data errors as soon as possible, especially “halts” so that they can debug and re-run

If there are a lot of (data) errors I want to examine these in a system that allows me to analyse / view easily (i.e. don’t have my page crash as it tries to load 100k error messages)
I don’t want to receive 100k error emails …

When a scheduled Run happens as an Operator (Sysadmin), I want to be notified afterwards (with a report?) if something went wrong, so that I can do something about it …

When I need to report to my colleagues about the Harvesting system I want to get an overall report of how it is going, how many datasets are harvested etc so that I can tell them

Domain Model

Status info: this is Run is running, it is finished, it took this long …

  • If the process takes longer that I expect we could show a window with live logs (using the Airflow API). We haven’t yet a status like “running step X”, “running step Y”, “stopped by error”, “finished”. We need to add this to the NG Harvester.

(Raw) Log information …

  • Logs on run execution (classic INFO, WARN etc logging)
    • Including handled application errors ERROR
  • (Meta)data errors (and warnings) => What do these look like?
  • (Unhandled) Exceptions or errors (caught by parent system)

Reports / Summaries e.g. 200 records processed, 5 errors, 2 warnings, 8 new dataset, 192 existing records updated

4 cases

  • Run Status Info (Live and Historic)
    • Who: Someone running a Job in realtime: When something does not work I want to see history of jobs (e.g. when have jobs stopped running) so that I can debug
    • Provided by: Orchestrator (ie. airflow) TODO: does orchestrator provide historic info (?)
    • Format: Whatever API that gives
  • App Log
    • Who: Someone running a Job (if they want real-time feedback)
      • Someone debugging a failed job (and a specific source)
      • Someone creating a new pipeline and wanting to debug it
    • Provided by: Logging in the code using std log library and either config of the storage location in code or from orchestrator
    • Format: Regular logs (text format) and a custom JSON file as a final log report
  • (Meta)Data Quality Warn / Errors
    • Who: “Owner” of a harvest source who wants to get those corrected
      • A Harvest Admin who is overseeing the process and wants to know what happened (and maybe how to fix the pipeline)
    • Provided by: Explicit recording as part of application code and a specific error format e.g. https://github.com/frictionlessdata/data-quality-spec
    • Format: Analyze results of the quality tools to use and define some kind of JSON results or report file.
  • Report: E.g. how many runs happened. How many datasets harvested etc.
    • Who: Non-tech people more.
      • Someone Running a Job
      • Harvest Admin
    • Provided by: output from NG Harvester. Displayed via new WUI / SPA embedded in CKAN
    • Format: Basic formatting of the logs (based by JSON) file, and then iterate based on feedback
@hannelita
Copy link
Contributor

Google cloud composer already provides a lot of logs. We may be able to create a sink on GCP Operations Logging and redirect the created logs to another service

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants