HackMD
    • Sharing Link copied
    • /edit
    • View mode
      • Edit mode
      • View mode
      • Book mode
      • Slide mode
      Edit mode View mode Book mode Slide mode
    • Note Permission
    • Read
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • Write
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • More (Comment, Invitee)
    • Publishing
    • Commenting Enable
      Disabled Forbidden Owners Signed-in users Everyone
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Invitee
    • No invitee
    • Options
    • Versions
    • Transfer ownership
    • Delete this note
    • Template
    • Save as template
    • Insert from template
    • Export
    • Google Drive Export to Google Drive
    • Gist
    • Import
    • Google Drive Import from Google Drive
    • Gist
    • Clipboard
    • Download
    • Markdown
    • HTML
    • Raw HTML
Menu Sharing Help
Menu
Options
Versions Transfer ownership Delete this note
Export
Google Drive Export to Google Drive Gist
Import
Google Drive Import from Google Drive Gist Clipboard
Download
Markdown HTML Raw HTML
Back
Sharing
Sharing Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
More (Comment, Invitee)
Publishing
More (Comment, Invitee)
Commenting Enable
Disabled Forbidden Owners Signed-in users Everyone
Permission
Owners
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Invitee
No invitee
   owned this note    owned this note      
Published Linked with
Like BookmarkBookmarked
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- tags: 0archive, disinfo --- # 0archive 營運管理 Operation Management ## 如何營運管理自己的 0archive? > [name=chihao] 待整理 ## 目標 - 更有效運用雲端空間 - 確實備份資料 ## Action items - [x] 刪除 40G middle2 dump 的檔案 [name=ronny] - 現有 79G 可用空間 - [x] ArticleSnapshot202002 -> jsonl -> bzip2 -> m2 disinfo db [name=pm5] - [x] ArticleSnapshot202002 -> jsonl -> bzip2 -> objstorage [name=pm5] - [x] ArticleSnapshot202003 -> jsonl -> bzip2 -> objstorage [name=pm5] - [x] 每天把 snapshot 以外的 table sql dump -> bzip2 -> objstorage [name=pm5] - [x] 每天把新的 snapshot -> jsonl -> bzip2 -> objstorage [name=pm5] - [x] 開 linode objstorage 1TB [name=ronny] - [x] 手動處理 Ptt 舊文章不再 snapshot [name=wenyi] - 按照「2 個月前的文章不再 update」的原則,2/2 以前的文章不再 update - 可以從 url 知道發佈時間 - [x] 實作 - 增加 policy:first snapshot 在 2 個月以前的 Article 不再 update [name=wenyi] - [x] 實作 - update freq - 第 1、2、3 天各一次,第 7 天第 4 次,1 Article = 4 ArticleSnapshot [name=wenyi] - [ ] ~~實作 - 第一次之後的 snapshot 都存 diff~~ - [ ] ~~實作 - parser 要把 snapshot 拼起來~~ - [ ] ~~實作 - parser 要在 first snapshot 兩個月內 parse 完該 Article~~ ## 備份 - 除了 snapshot 之外的 table - 每天 sql dump -> objstorage - snapshot 呢? - 每天新的 ArticleSnapshot -> jsonl -> objstorage ## 實作 - 320G db - ArticleSnapshot202003: 129G - ArticleSnapshot202002: 42G - ArticleSnapshot: 16G - binlog: 49G (只保留三天,寫入記錄,應該是壓縮後) - taoyuan-chu: 16G - 還有些零零碎碎的 - 2020/4/2 12:40 現況 | 檔案系統 | 容量 | 已用 | 可用 | 已用% | 掛載點 | | --- | --- | --- | --- | --- | --- | | /dev/sdc | 314G | 253G | 46G | 85% | /mnt/m2-disinfo-mysql-320g-1 | > 容量 > 已用 + 可用?因為 df 指令會留 buffer 以防萬一 > 80% 就是需要因應處理了 [name=ronny] - 160G 原生 - 40G middle2 dump 的檔案 - 刪! - 70G 誤刪的檔案 - snapshot archive: snapshot table 超過 2 個月就 archive 到 offline storage - take fewer snapshots? - 目前 - 7 - daily for a week - 改 - 3? - once every 2 days - 改 - 前三天一天一次、第七天再一次,總共四次 - 分 site 類型用不同的 snapshot freq? - 內容農場 - 官媒 - 新聞網站 - Fb 專頁 - Fb 公開社團 - 討論區看板 - YouTube 頻道 - YouTube 帳號 - ~~沒有不一樣就不存 snapshot?~~ - 每次 snapshot 存 diff 就好 - 比 readability 快 - 但是 snapshot archive 可能會把第 1 篇刪掉 - 是對第 1 篇 diff 還是前 1 篇? - 偷懶:都對第一篇 :p [name=chihao] - 方案一:我們 2 個月 archive 一次,只追蹤 update 1 個禮拜,所以在目前的 scheme 底下應該沒有問題 - 增加 policy:first snapshot 在 2 個月以前的 Article 不再 update - 方案二:Snapshot 和 SnapshotDiff 分開存,Snapshot 只存第 1 個 snapshot ### Snapshot archive - 儲存到哪裡? - Linode Object Storage - 1 TB = $20/mo - 1 TB Outbound Transfer - Up to 50 Million Objects per Cluster - $.02/GB Additional Storage - $.01/GB Additional Outbound Transferred - NAS - B2 Cloud Storage - AWS Glacier - AWS S3 - GCP Storage - GCP Storage Nearline - 用什麼格式儲存? - JSONLines - 缺點 - 要自己刻、多佔一些空間 - 優點 - 使用上比較彈性 - script 完成 [name=pm5] - encoding - unicode -> utf8 - 寫好了 ## 討論 > 比較治本應該是把 202002 搬出來,然後盡快實作內容沒變就不要 snapshot 和超過兩個月就 archive ,這樣子現在硬碟空間應該很夠 > > - 把 202002 搬出來 > - 省空間機制 > - 實作「內容不變就不要 snapshot」 > - 超過兩個月的 snapshot 就 archive > > [name=ronny] > 之前是以 5 年用量來估計硬碟的?顯然十分不準 XD [name=pm5] > 「內容不變就不要 snapshot」要用 raw HTML?用 parse 過的內文?Readability 跑起來也有點花時間所以這可能會讓 crawling 的速度變慢 [name=pm5] > 而且 parser 更新也會影響「內容不變」的判斷 [name=chihao]

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lost their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.


Upgrade

All
  • All
  • Team
No template.

Create a template


Upgrade

Delete template

Do you really want to delete this template?

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

Sign in via GitHub

New to HackMD? Sign up

Help

  • English
  • 中文
  • 日本語

Documents

Tutorials

Book Mode Tutorial

Slide Example

YAML Metadata

Resources

Releases

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions

Versions

Upgrade now

Version named by    

More Less
  • Edit
  • Delete

Note content is identical to the latest version.
Compare with
    Choose a version
    No search result
    Version not found

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.