博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
机器学习笔记(Washington University)- Clustering Specialization-week five
阅读量:6993 次
发布时间:2019-06-27

本文共 2553 字,大约阅读时间需要 8 分钟。

1. Mixed membership model

This model wants to discover a set of memberships

In contrast, cluster models aim at discovering a single membership

In clustering:

  • one topic indicator zi per document i
  • all words come from(get scored under) same topic zi
  • distribution on prevaluence of topics in corpus,  πi=[πi1 ... πik]

In LDA:

  • one topic indicator ziw per word in doc i
  • each word gets socred under its topic  ziw 
  • distribution on prevaluence of topics in document, πi=[πi1 ... πik]

LDA inputs: set of words per doc for each doc in corpus

LDA outputs: corpus-wide topic vocab distributions, topic assignments per word, topic proportions per doc

Typically LDA is specified as a bayesian model

  • account for unvertainty in parameters when making predictions
  • naturally regularizes parameter estimates in contrast to MLE.

 

2. Gibbs sampling

Iterative random hard assignments

predictions:

  • make prediction for each snapshot of randomly assigned variables/parameters
  • average predictions for final result
  • look at snapshot of randomly assigned variables/parameters that maximize joint model probability

benefits:

  • intuitive updates
  • very straightforward to implement

Procedure:

  • randomly reassign all ziw based on doc topic proportions and topic vocab distributions
  • randomly reassign doc topic proportions based on assignments ziw in current doc
  • repeat for all docs
  • randomly ressign topic vocab distributions based on assignments ziw in entire corpus
  • repeat steps 1-4 until max iter reached

3. Collapsed gibbs sampling

Based no special structure of LDA model, can sample just indicator variables ziw.

no need to sample other parameters

  • corpus-wide topic vocab distributions
  • per-doc topic proportions

Procedure:

randomly reassign ziw based on current assignment zjv of all other words in document and corpus.

How much doc likes each topic based on other assignments in doc

 

 

 

nik is the current assignment to topic k in doc i

Ni is the words in doc i

α is the smoothing param from bayes prior

 

How much each topic likes the word dynamic based on assignments in other docs in corpus

mdynamic,k is the assignments corpus-wide of word dynamic to topic k

γ is the smoothing param

V is the size of vocab

 

probabilities = how much doc likes topic * how much topic likes word(normalize this product of terms over k possible topics)

Based on the probabilities increment count based on new assignmentof ziw

what to do with the collapsed samples?

From best sample of ziw, can infer

  • Topics from conditional distribution
  • document embedding

 

转载于:https://www.cnblogs.com/climberclimb/p/6931411.html

你可能感兴趣的文章
java Webservice(一)HttpClient使用(一)
查看>>
cookie (储存在用户本地终端上的数据)
查看>>
你真的会玩SQL吗?之逻辑查询处理阶段
查看>>
用字体制作小图标
查看>>
python之函数用法startswith()
查看>>
while(scanf("%d",&n)!=EOF)与while(cin>>n)
查看>>
BigTale
查看>>
UITabBarController 笔记(一)AppDelegate中加UITabBarController 为 rootViewController
查看>>
oracle基础备份和还原
查看>>
Velocity 语法示例
查看>>
golang的ssh例子
查看>>
【python】pymongo中正则查询时的转义问题
查看>>
立足“快时尚”,联想笋尖S90怎样诠释“比美更美”?
查看>>
linux下执行strlwr函数出错:ld returned 1 exit status
查看>>
WSADATA
查看>>
各大引擎矩阵的矩阵存储方式 ----行矩阵 or 列矩阵
查看>>
html 跳转页面,同时跳转到一个指定的位置
查看>>
solr的suggest模块
查看>>
SWT中ole/activex实践--操作word的一个例子
查看>>
Volley(二)—— 基本Request对象 & RequestQueue&请求取消
查看>>