HackMD
    • Sharing Link copied
    • /edit
    • View mode
      • Edit mode
      • View mode
      • Book mode
      • Slide mode
      Edit mode View mode Book mode Slide mode
    • Note Permission
    • Read
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • Write
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • More (Comment, Invitee)
    • Publishing
    • Commenting Enable
      Disabled Forbidden Owners Signed-in users Everyone
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Invitee
    • No invitee
    • Options
    • Versions
    • Transfer ownership
    • Delete this note
    • Template
    • Save as template
    • Insert from template
    • Export
    • Google Drive Export to Google Drive
    • Gist
    • Import
    • Google Drive Import from Google Drive
    • Gist
    • Clipboard
    • Download
    • Markdown
    • HTML
    • Raw HTML
Menu Sharing Help
Menu
Options
Versions Transfer ownership Delete this note
Export
Google Drive Export to Google Drive Gist
Import
Google Drive Import from Google Drive Gist Clipboard
Download
Markdown HTML Raw HTML
Back
Sharing
Sharing Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
More (Comment, Invitee)
Publishing
More (Comment, Invitee)
Commenting Enable
Disabled Forbidden Owners Signed-in users Everyone
Permission
Owners
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Invitee
No invitee
   owned this note    owned this note      
Published Linked with
Like BookmarkBookmarked
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- tags: cofacts, google analytics, GA, GA4, universal analytics, UA GA: UA-98468513-3 --- # Cofacts GA4 Migration design doc ## :globe_with_meridians: Site Goal: - Migrate from Universal analytics to GA4 (with Google Tag Manager) Tasks: - [x] Simplify GA integration - Remove BE ga tracking RSS endpoint calls, use nginx log instead - Remove GA tracking ID, it can be hard coded on tagmanager side; update README and `.env.sample` - [x] investigate difference in UA and GA4 stats - Within error range (<5%) - [ ] investigate if RSS's `utm_source` still work in GA4 - [x] investigate if article ID in page views can be tracked correctly - Leverage [recommended event](https://developers.google.com/analytics/devguides/collection/ga4/reference/events?hl=en&client_type=gtm#view_item) - Article detail --> `view_item` w/ article ID + each reply IDs. - Reply detail --> `view_item` w/ reply ID only. ### `view_item` event and parameters When viewing article or reply detail page, [`view_item` event](https://developers.google.com/analytics/devguides/collection/ga4/reference/events?hl=en&client_type=gtm#view_item) will be triggered when the datail page's data is loaded. When visiting an article page, `view_item` event is fired with the following event fields: | BigQuery field | Description | Sample | | -------- | -------- | -------- | | `items[].item_id` | Viewed article ID or reply ID in the article detail | `IfvpToYBC7Q3lHuUI-c0` | | `items[].item_category` | type for `item_id`, can be `Article` or `Reply` | `Article` | | `items[].item_list_name` | | `article detail` | | `items[].item_list_id` | The article id of this detail page | `3c8bmf2taopxw` | | `items[].item_list_index` | Only for replies in article page. Records the position of this reply at the moment, start with 0. | `0` | When visiting a reply page, `view_item` event is fired with the following event fields: | BigQuery field | Description | Sample | | -------- | -------- | -------- | | `items[].item_id` | Viewed reply ID of the reply detail | `IfvpToYBC7Q3lHuUI-c0` | | `items[].item_category` | Reply detail only list reply itself | `Reply` | | `items[].item_list_name` | | `reply detail` | | `items[].item_list_id` | The reply id of this detail page | `IfvpToYBC7Q3lHuUI-c0` | ## :robot_face: rumors-line-bot Goals: - LIFF: Migrate from Universal analytics to GA4 (with Google Tag Manager) - Webhook: send events directly to BigQuery Tasks: - [x] integrate [BigQuery insert](https://cloud.google.com/bigquery/docs/streaming-data-into-bigquery) - [x] replace universal-analytics occurrences with own functions - inside these functions we still call ua() so that we send to both BQ and GA in the sake time to verify implementation - [x] implement tag manager for LIFF - send both GA and GA4 using tag manager - Same GA4 property for website - compare GA & GA4 stats - [ ] Monitor cost ### `view_item` event and parameters Identical to website's article detail page. When viewing article LIFF page, [`view_item` event](https://developers.google.com/analytics/devguides/collection/ga4/reference/events?hl=en&client_type=gtm#view_item) will be triggered when the datail page's data is loaded. When visiting an article LIFF, `view_item` event is fired with the following event fields: | BigQuery field | Description | Sample | | -------- | -------- | -------- | | `items[].item_id` | Viewed article ID, or ID of replies shown in the article detail | `IfvpToYBC7Q3lHuUI-c0` | | `items[].item_category` | type for `item_id`, can be `Article` or `Reply` | `Article` | | `items[].item_list_name` | | `article liff` | | `items[].item_list_id` | The article id of this detail page | `3c8bmf2taopxw` | | `items[].item_list_index` | Only for replies in article page. Records the position of this reply at the moment, start with 0. | `0` | ## :electric_plug: API Goal: Retireve analytics from BigQuery - [x] rewrite cron job tp read from BQ instead of GA reporting API ### Details of web In universal analytics, we extract article ID from URL and content grouping. They are actually not reliable and can [cause weird stuff entering `analytics` index](https://github.com/cofacts/rumors-api/issues/298). In GA4, rumors-site now leverages [`view_item` event](#view_item-event-and-parameters) with article ID and exposed reply IDs right from GraphQL. This gives us: 1. Accurate view count for articles 2. Ability to calculate reply view count under each article views :::spoiler Previous attempt #### SQL ```sql SELECT event_timestamp, event_name, items.item_id, items.item_category, items.item_list_id, items.item_list_name, items.item_list_index FROM `..._intraday_...`, UNNEST(items) AS items WHERE event_name = "view_item" LIMIT 1000 ``` #### Result ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_8a4f257d924bd34db864a3f626f20753.png) ::: #### `webVisit`, `webUser` For analytics, we can just count `item_id` to get the view count of that specific article / reply. In addition, we can also list the view count of a specific reply (in `item_id`, `item_cateogry` being `Reply`), and the view count distribution (can come from reply detail, or from different article details). ##### SQL ```sql SELECT event_date, item_category, item_id, count(*) as webVisit,COUNT(DISTINCT user_pseudo_id) as webUser FROM `${GA4_DATASET}.events_*`, UNNEST (items) WHERE event_name = 'view_item' AND stream_id = '${GA_WEB_STREAM_ID}' AND ${event_date condition} group by event_date, item_category, item_id ``` ##### Verification ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_1977e83d8ee27a07f4aec6376a432065.png) Web visit & web user of SQL result matches looker studio. Note: we cannot verify reply count because Looker Studio do not allow us to calculate event count under a certain `item_id` (we use "page path" on the screenshot above, which works foraritcle detail page). #### `liff` ##### SQL ```sql WITH t AS ( SELECT event_date, item_category, item_id, struct( collected_traffic_source.manual_source as source, count(*) as visit, COUNT(DISTINCT user_pseudo_id) as user ) as liffObj FROM `${GA4_DATASET}.events_*`, UNNEST (items) WHERE event_name = 'view_item' AND stream_id = '${GA_LIFF_STREAM_ID}' -- LIFF stream GROUP BY event_date, item_category, item_id, collected_traffic_source.manual_source ) SELECT event_date, item_category, item_id, ARRAY_AGG(liffObj) AS liff FROM t GROUP BY event_date, item_category, item_id ``` - We should use `collected_traffic_source.manual_source` to match UA stat - The field `collected_traffic_source` is newly added on 2023/06/05 ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_b34454736ef95118473a6d06162e438b.png). [Its documentation](https://support.google.com/analytics/answer/7029846?hl=en#zippy=%2Ctraffic-source%2Ccollected-traffic-source) was [not there in 2023 May](https://web.archive.org/web/20230521012849/https://support.google.com/analytics/answer/7029846?hl=en). - If we use `traffic_source.source` (cross-channel source), we will include an extra "(direct)" for `tmcheck` with exactly 1 user & 1 view count. Also, it is not populated in intra-day tables. - To create liff array with counts inside, we must 1. first group by (event_date, item_category, item_id, source) to calculate count 2. then merge liff objects of same (event_date, item_category, item_id) into array `liff` ##### Verification ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_778cb09e4543218bc9375c4052c45da7.png) ### Details of LINE bot #### `lineUser`, `lineVisit` ##### SQL ```sql SELECT DATE(evt.time, "${TIMEZONE}") as event_date, evt.category AS item_category, evt.label AS item_id, count(*) as lineVisit, COUNT(DISTINCT userId) as lineUser FROM `${LINE_BOT_EVENT_DATASET}.events`, UNNEST (events) as evt WHERE evt.action = 'Selected' GROUP BY event_date, item_category, item_id ``` ##### Verification ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_39aaf0763d1653ee1f744de1d5f02b7f.png) Note that UA contains JA bot's group usage, thus has more users. But for less popular messages the view count is somewhat consistent. ### Combined SQL Constants - LINE_BOT_EVENT_DATASET - GA4_DATASET - GA_WEB_STREAM_ID - GA_LIFF_STREAM_ID Parameters - startDate - endDate ```sql WITH lineStats AS ( SELECT FORMAT_DATE("%Y%m%d", DATE(createdAt, "+08:00")) AS event_date, evt.category AS item_category, evt.label AS item_id, COUNT(*) AS lineVisit, COUNT(DISTINCT userId) AS lineUser FROM `industrious-eye-145611.line_bot.events`, UNNEST (events) AS evt WHERE evt.action = 'Selected' -- Use createdAt so that BQ can select correct partition AND createdAt >= TIMESTAMP("2023-05-10T00:00:00.000+08:00") AND createdAt <= TIMESTAMP("2023-05-20T23:59:59.999+08:00") GROUP BY event_date, item_category, item_id ), webStats AS ( SELECT event_date, item_category, item_id, COUNT(*) AS webVisit, COUNT(DISTINCT user_pseudo_id) AS webUser FROM `industrious-eye-145611.analytics_315002472.events_*`, UNNEST (items) WHERE event_name = 'view_item' AND stream_id = '3561738274' -- Optimize BQ table selection AND (_table_suffix between '20230510' and '20230520' OR _table_suffix between 'intraday_20230510' and 'intraday_20230520') GROUP BY event_date, item_category, item_id ), liffStats AS ( WITH t AS ( SELECT event_date, item_category, item_id, struct( collected_traffic_source.manual_source as source, count(*) as visit, COUNT(DISTINCT user_pseudo_id) as user ) as liffObj FROM `industrious-eye-145611.analytics_315002472.events_*`, UNNEST (items) WHERE event_name = 'view_item' AND stream_id = '4697477273' -- LIFF stream -- Optimize BQ table selection AND (_table_suffix between '20230510' and '20230520' OR _table_suffix between 'intraday_20230510' and 'intraday_20230520') GROUP BY event_date, item_category, item_id, collected_traffic_source.manual_source ) SELECT event_date, item_category, item_id, ARRAY_AGG(liffObj) AS liff FROM t GROUP BY event_date, item_category, item_id ) SELECT event_date AS dateStr, LOWER(item_category) AS type, item_id AS docId, STRUCT(lineUser, lineVisit, webUser, webVisit, liff) AS stat FROM lineStats FULL JOIN webStats USING (event_date, item_category, item_id) FULL JOIN liffStats USING (event_date, item_category, item_id) ``` #### Notes on `_table_suffix` It is possible that intraday tables contain 2 days. ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_25d3204b869ae50224ee502b7b3509ad.png) Therefore when we do `_table_suffix` filter, we should handle both the case that has `intraday_` and no `intraday_` together. ## Analysis of LINE bot number drop ### 2203-6-16~22 ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_a2413b8c758a40f81a7961b441b3b290.png) Event / category that drops: - Article / Search - UserInput / ArticleSearch - UserInput / MessageType - Article / Selected - Reply / Selected - Group / Leave - Group / Join Event / category that does not drop: - Reply / Type - ContextProxy / Forward - Reply / Search - Tutorial / Step - Article / NoReply - Article / Create - UserInput / IsForwarded - UserInput / ChatWithBot - Cronjob / Send Notification Obsolete: UserInput/Feedback-Vote (LIFF via Tag Manager) ### 2203-5-20~6-22 ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_029a6c5b7b24f06d09fd6c8be65425e4.png) Event / category that drops: - Article / Search (15% drop) - UserInput / ArticleSearch (11% drop) - UserInput / MessageType (1% drop) - Article / Selected (6% drop) - Reply / Selected (2% drop) - Tutorial / Step (Just by 1, omitable) - Article / NoReply (Just by 1, omitable) - Group / Join - Group / Leave Event / category that does not drop - Reply / Search - Reply / Type (Actually BQ has +1 more) - ContentProxy / Forward (Actually BQ has +1 more) - UserInput / IsForwarded (BQ Has 1% more...) - Article / Create (BQ has 1% more...) - UserInput / ChatWithBot - Cronjob / Send Notification Segment by category & message source: ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_972833217554ccfc92aa0db94247825a.png) Dropping combinations - UserInput / user (just by 0.04%) - Article / group (by 33%) - UserInput / group (by 49%) - Tutorial / user (just by 1) - Reply / group (by 50%) - Group / group (by 62%) - Article / room (by 62%) - UserInput / room (by 72%) Not dropping combinations - Article / user (BQ has 0.04% more) - Reply / user (BQ has +2 more) - Cronjob / user ### Focus on 6/10 Check filter alignment: if message source is user, the numbers are accurate ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_7e393583c18a77e9e4e5fe299e01f4fb.png) Group: UA > BQ ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_2269d736bc3e0934d311451725fa07ff.png) ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_11a476ae0e988ee14b902957e40c60ec.png) Even when category / action / label are the same, the count is still missing. - For the same line of code, sometimes it works, but some times it's not sent. - Only happens for group chat. #### Matching logs One `UserInput / ArticleSearch / ArticleFound` should match to one `console.debug('[GROUP]')` ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_4de4a5c71f5d3e50d6516e0d7e158ac7.png) zh log stream: 127 records ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_583ea8a4f236cdda27e7ce37915eeb8d.png) zh, en log stream: 261 records (BQ result) ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_eb953deb3a43d6661040c3a8129b21e1.png) zh, en, ja log stream: 401 records (UA result) ![](https://s3-ap-northeast-1.amazonaws.com/g0v-hackmd-images/uploads/upload_14b5cce9413871986826a8c60277d3bc.png) :::success #### Conclusion The number difference of UA and BQ is due to JA bots not updated to latest version. - Users of EN & JA bot users almost do not use 1-1; groups are used instead - We do remember to update EN bot to latest, but not JA bots. JA bots are not connected to BQ at all, hence the difference. :::

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lost their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.


Upgrade

All
  • All
  • Team
No template.

Create a template


Upgrade

Delete template

Do you really want to delete this template?

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

Sign in via GitHub

New to HackMD? Sign up

Help

  • English
  • 中文
  • 日本語

Documents

Tutorials

Book Mode Tutorial

Slide Example

YAML Metadata

Resources

Releases

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions

Versions

Upgrade now

Version named by    

More Less
  • Edit
  • Delete

Note content is identical to the latest version.
Compare with
    Choose a version
    No search result
    Version not found

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.