Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving anon user tracking from image element to form submit #433

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

iambibhas
Copy link
Contributor

Anytime the site is loaded and there is -

  • a scroll event
  • a mouse pointer move event

the anon user session is set using a form submit.

Fixes #409.

@iambibhas iambibhas added the needs-review Need comment from the reviewers label Apr 4, 2018
@@ -27,4 +28,5 @@
{%- if header_campaign %}{{ campaign_script() }}{% endif %}
{% assets "js_tinymce" %}<script src="{{ ASSET_URL }}" type="text/javascript"></script>{% endassets %}
{% block footerscripts %}{% endblock %}
{{ anon_user_script() }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in layout.html.jinja2 only. We have a problem if we’re copy pasting an entire block between templates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i went through the templates, and they need some reorganization. Some blocks have been redefined in several places. I can fix them, but seems a little off topic for this PR. Maybe a separate PR for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change with @vidya-ram's PWA PR (#425). We should revisit this PR once that is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jace alright.

{%- if not g.user and not g.anon_user %}
<script type="text/javascript">
$(function () {
var sniffurl = "{{ url_for('sniffle') }}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Sniff” is the wrong term for this. We’re looking for an interactive session.

}
window.onscroll = function (e) {
if (!sniffed) {
sniff("scolll");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling.

@@ -28,14 +28,38 @@
gif1x1 = 'R0lGODlhAQABAJAAAP8AAAAAACH5BAUQAAAALAAAAAABAAEAAAICBAEAOw=='.decode('base64')


@app.route('/_sniffle.gif')
@app.route('/_sniffle', methods=['POST'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will now be /api/1/anonsession. This should also move into an views/api.py.


1. If there's g.user and session['anon_user'], it loads that anon_user and tags with user=g.user, then removes anon
2. If there's no g.user and no session['anon_user'] and form submitted token matches session['au'], sets session['anon_user'] = 'test'
3. If there's no g.user and there is session['au'] != form['token'], loads g.anon_user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 and 3 seem to be reversed. Also, you don’t need a test token in session['au'] anymore. A valid form CSRF token is enough. Our entire test token mechanism can be removed.

session['au'] = g.anon_user.id
session.permanent = True

return Response("OK")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t reply with text. Return {'status': 'ok'} and wrap the view with render_with(json=True).

if anon_user and g.user:
# we have anon user id in session['au'], set anon_user.user to current user
anon_user.user = g.user
g.db_commit_needed = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, what is this? We should be doing automatic database commits if there’s a dirty session and no exception was raised. Better than such hacky flags.

session.permanent = True
if 'impressions' in session:
elif not g.user:
session['au'] = u'test-' + unicode(uuid4())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed since we’ll use the CSRF token cookie now.

@@ -313,7 +302,7 @@ def session_jobpost_ab():
Returns the user's B-group assignment (NA, True, False) for all jobs shown to the user
in the current event session (impressions or views) as a dictionary of {id: bgroup}
"""
if not g.esession.persistent:
if g.esession and not g.esession.persistent:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does g.esession not exist?

@@ -31,4 +32,5 @@
<script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/d3/3.5.17/d3.min.js"></script>
<script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/c3/0.4.14/c3.min.js"></script>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jace should these static files be in tablayout template? Shouldn't they be in individual templates that inherit them and needs c3/d3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use C3 enough that it should be in the top level requirements, and all these direct references should be removed.

@iambibhas iambibhas removed the needs-review Need comment from the reviewers label Apr 10, 2018
Copy link
Member

@jace jace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes. Will need another review after the other PRs are merged.

@@ -27,4 +28,5 @@
{%- if header_campaign %}{{ campaign_script() }}{% endif %}
{% assets "js_tinymce" %}<script src="{{ ASSET_URL }}" type="text/javascript"></script>{% endassets %}
{% block footerscripts %}{% endblock %}
{{ anon_user_script() }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change with @vidya-ram's PWA PR (#425). We should revisit this PR once that is merged.

@@ -31,4 +32,5 @@
<script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/d3/3.5.17/d3.min.js"></script>
<script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/c3/0.4.14/c3.min.js"></script>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use C3 enough that it should be in the top level requirements, and all these direct references should be removed.

db.session.commit()

if g.anon_user:
session['au'] = g.anon_user.id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using session['au'] for two different sorts of content? That doesn't seem clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using session['au'] only to store anon user ID now. if it exists in cookie, it'll contain the anon user id.

from ..models import (db, JobCategory, JobPost, JobType, BoardJobPost, Tag, JobPostTag,
Campaign, CampaignView, CampaignAnonView, EventSessionBase, EventSession, UserEventBase,
UserEvent, JobImpression, JobViewSession, AnonUser, campaign_event_session_table,
JobLocation, PAY_TYPE)
from ..utils import scrubemail, redactemail, cointoss


gif1x1 = 'R0lGODlhAQABAJAAAP8AAAAAACH5BAUQAAAALAAAAAABAAEAAAICBAEAOw=='.decode('base64')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can go. It's not needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jace there is a view_application_email_gif endpoint where it's being used

return gif1x1, 200, {

@@ -224,3 +224,37 @@
</script>
{%- endwith %}
{%- endmacro -%}

{%- macro anon_user_script() -%}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved into app.js? It adds bulk to every page otherwise (to pages served to bots).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we move it to app.js then it'll load with all page load regardless of bots or humans? Also, will have to hardcode the api endpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will load only once because app.js is cached.

"""
Load anon user:

1. If client returns valid csrf token, create and set g.anon_user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sends, not returns.

"""
csrf_form = FlaskForm()
if csrf_form.validate_on_submit():
# This client sent us back valid csrf token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sent us", not "sent us back"

2. if g.anon_user exists, set session['au'] to anon_user.id
"""
csrf_form = FlaskForm()
if csrf_form.validate_on_submit():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not g.user and not g.anon_user and csrf_form.validate_on_submit():


if g.anon_user:
session['au'] = g.anon_user.id
g.esession.load_from_cache(session['au'], UserEvent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? session['au'] contains a brand new id. How is it a cache reference?

g.anon_user = None # Could change below
g.event_data = {} # Views can add data to the current pageview event
g.anon_user = None
g.event_data = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the comments.

else:
if g.user:
anon_user.user = g.user
session.pop('au', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under if g.user: add g.anon_user = None, and then an else clause before the following line. You don't want to have both g.user and g.anon_user

if g.esession:
session['es'] = g.esession.uuid

if g.anon_user and 'impressions' in session:
# Run this in the foreground since we need this later in the request for A/B display consistency.
# This is most likely being called from the UI-non-blocking sniffle.gif anyway.
save_impressions(g.esession.id, session.pop('impressions').values(), now)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to a function that is called both here and in the the anonsession endpoint after setting g.anon_user. Comment needs to change as well.

@@ -301,7 +257,8 @@ def record_views_and_events(response):
jobpost_id=g.jobpost_viewed[0],
bgroup=g.jobpost_viewed[1])
else:
g.esession.save_to_cache(session['au'])
if 'au' in session:
g.esession.save_to_cache(session['au'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache has to be saved before we have an anonymous user. It's not relevant after.

@jace
Copy link
Member

jace commented Apr 27, 2018

@iambibhas Here is how this is supposed to work.

First, we recognise three states for a user's identity:

  1. Unknown: We know nothing about this client. It's new, and it could be a bot or a user, but we don't know yet.
  2. Anon User: The client is behaving like a user using a browser, so we believe this is a human.
  3. User: The user has explicitly logged in.

When the user transitions from stage 2 to stage 3, we want to preserve their browsing history. This is why we do g.anon_user.user = g.user following which we set g.anon_user = None. The session is retained on the anon user account, because the browsing activity was anonymous, but we link the anon user account to the regular user account for future use (it's meant for keeping track of A/B test behaviour: if they were shown A or B before they logged in, we need to remember which one they were shown even after they logged in).

The transition from stage 1 to 2 is trickier. At stage 1, it could be a scraping bot causing a lot of traffic. We don't want to track session history for that, but we also don't want to lose the history of a client that turns out to be an anon user. Therefore:

  1. Assign a probe cookie session['au'] = uuid4() to every incoming request.
  2. Create a non-persistent EventSession and give it the same user id. Save it to Redis cache with a 5 minute timeout.
  3. When the anonsession API endpoint is called (previously, the sniffle endpoint), create an AnonUser account with the same userid, retrieve the session from cache, and now save it to the database.

A bot will also create a Redis EventSession entry, but it will be auto-discarded in five minutes.

if 'impression' in session:
rq_save_impressions(g.esession.id, session.pop('impressions').values(), now, delay=False)

return Response({'status': 'ok'})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be jsonified.

if g.anon_user:
session['au'] = g.anon_user.id
if 'impression' in session:
rq_save_impressions(g.esession.id, session.pop('impressions').values(), now, delay=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo. impression or impressions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the main save_impressions function.

def rq_save_impressions(session_id, impressions, viewed_time, delay=True):
func = save_impressions.delay if delay else save_impressions
print func, session_id
func(session_id, impressions, viewed_time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this function. Call save_impressions or save_impressions.delay directly from wherever you need it.

if 'au' in session and session['au'] is not None:
if unicode(session['au']).startswith('test'):
# old test token that we no longer need
session.pop('au', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session.pop('au') since it's guaranteed present. If it's not, that's definitely an error worth raising for us to investigate.

if g.anon_user and 'impressions' in session:
# Run this in the foreground
# since we need this later in the request for A/B display consistency.
rq_save_impressions(g.esession.id, session.pop('impressions').values(), now, delay=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use regular save_impressions.

@@ -192,7 +150,7 @@ def load_user_data(user):

@app.after_request
def record_views_and_events(response):
if hasattr(g, 'db_commit_needed') and g.db_commit_needed:
if len(db.session.dirty) > 0 or len(db.session.new) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this into a generic function in Coaster so that this call can be commit_if_necessary(db.session)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could just do db.session.commit() without an if condition. What is the impact of committing an empty transaction?

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Anon user tracking
3 participants