Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data to csv #1900

Merged
merged 25 commits into from
Oct 23, 2020
Merged

Data to csv #1900

merged 25 commits into from
Oct 23, 2020

Conversation

akp2603
Copy link
Contributor

@akp2603 akp2603 commented Oct 10, 2020

Description

The following PR includes backend APIs for the aforementioned issues.

More Details

  • Provided 3 apis. One to generate a request for data to be generated, one to fetch the status of the request being queued(CSV generation happens async, hence this api for polling.), and last one to download the file.
  • The following PR enables a user to generate a csv to get his/her data. With every new csv generated, the old csv get deleted, to prevent disk space.
  • Files, 30 days old, will be deleted through a cron.

Corresponding Issue

#1560


Reviewing this pull request? Check out our Code Review Practices guide if you haven't already!

@welcome
Copy link

welcome bot commented Oct 10, 2020

Thank you for opening this pull request with us! Be sure to follow our Pull Request Practices. Let us know if you have any questions on Slack.

Copy link
Member

@bennpham bennpham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Aashish, this PR looks nice for a proof of concept for a job scheduler or data backup. Good work on it 👍 . If I made a scheduler, I'd technically store it on Amazon S3 or Google Cloud service instead where data storage would be cheaper and less costlier than storing everything on disc. Especially when more users were to register, it'd be difficult to scale.

On the other hand, for an actual csv download itself, I'm wondering but couldn't you just do the following (modify below to restrict it to only users who owned that data):

Model.rb

def self.to_csv
  heading = ["Col1", "Col2", "Col3]

  CSV.generate do |csv|
    csv << heading
      
    Model.all.each do |model|
      csv << [model.data1, model.data2, model.data3]
    end
end

Controller.rb

def index
  ...

  respond_to do |format|
    format.html
    format.csv {
      send_data Model.to_csv, filename: "model-data-#{Date.current.strftime('%Y%m%d')}.csv"
    }
  end
end

index.html.erb

<%= link_to "Export to CSV", { format: :csv }, class: "btn" %>

@akp2603
Copy link
Contributor Author

akp2603 commented Oct 11, 2020

If I made a scheduler, I'd technically store it on Amazon S3 or Google Cloud service instead where data storage would be cheaper and less costlier than storing everything on disc. Especially when more users were to register, it'd be difficult to scale.

Hi @bennpham firstly thanks for taking the time to review my PR. Really appreciated. Secondly, regarding what you said, I second this, but I asked @julianguyen regarding the same before implementing it. I suggested what you suggested. But she preferred it storing it on the disk only. Hence I implemented it this way. Maybe we can switch to S3 as an enhancement over this?

Also, regarding the csv file, what you've suggested is separated files for separate models. But I implemented it all in a single file. (The format for which, was again, confirmed with Julia.). Can share it with you on slack too, if you want.

Please let me know if you still want any changes regarding the same.

Copy link
Member

@julianguyen julianguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have deep expertise this kind of data storage, so I'm definitely open to whatever option is free 🙃 Just cause we have limited funds at the moment as an open source project.

Gemfile Outdated
@@ -52,6 +52,14 @@ gem 'selenium-webdriver', '~> 3.142.3'

gem 'rubyzip', '~> 1.3.0'

gem 'sidekiq', '5.0.5'
# Uniqueness in sidekiq jobs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding comments but I don't think they're absolutely necessary!

@@ -0,0 +1,3 @@
// Place all the styles related to the Users::Reports controller here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete this file since it's unused!

@@ -12,6 +12,14 @@
#

class Allyship < ApplicationRecord
DISPLAY_ATTRIBUTES = %w[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to something like USER_DATA_ATTRIBUTES so that it's more obvious that this is being used for that.

@julianguyen
Copy link
Member

In terms of recommendations for learning how to test, I recommend looking at our existing specs for general patterns and ideas. The official RoR docs has a good starter doc on testing.

Feel free to ask for help in our #dev channel on Slack!

@akp2603 akp2603 removed the wip label Oct 17, 2020
@@ -39,3 +39,7 @@ RAISE_DELIVERY_ERRORS="false"
PSQL_HOST=""
PSQL_USERNAME=""
PSQL_PASSWORD=""

# REDIS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to add this to config/env/test.example.env as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@julianguyen julianguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong work! Thanks for looking into writing tests :D

end

def fetch_request_status_helper(user, request_id)
return 400, { error: 'Request id can be blank.' } if request_id.blank?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "Request id can't be blank" instead?

end

def download_data_helper(user, request_id)
return 400, { error: 'Request id can be blank.' } if request_id.blank?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above.


def perform(request_id)
data_request = Users::DataRequest.find_by(request_id: request_id)
return if data_request.blank?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 8 and 9 could be combined with an ||.


it 'is invalid without a status_id' do
data_request = build(:partial_data_request)
expect(data_request).to have(2).error_on(:status_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are there two errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. presence.
  2. inclusion.
    Both are treated separatedly. There are 2 validations on status_id. nil is blank and also not amongst the included values.

before { sign_in user }

it 'creates a data download request' do
post "/users/data"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace this and other occurrences with users_data_path.

before { sign_in user }

it 'fetches the status of data request with a blank request_id' do
get "/users/data/status"
Copy link
Member

@julianguyen julianguyen Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace this and other occurrences with users_data_status_path.

before { sign_in user }

it 'fetches the file with a blank request_id' do
get "/users/data/download"
Copy link
Member

@julianguyen julianguyen Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace this and other occurrences with users_data_download_path.

Copy link
Member

@julianguyen julianguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work! Thanks so much for taking this on and going through all the review cycles here :)

@julianguyen julianguyen merged commit 1eb6d8b into main Oct 23, 2020
@welcome
Copy link

welcome bot commented Oct 23, 2020

Thank you for merging this pull request with us! If you haven't already, in another pull request, please add yourself to our About page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants