数据处理的统计学习(scikit-learn教程)
|
分裂:自上而下的方法:所有的观测样例开始于同一个簇。迭代的进行分层。对于预计簇很多的情况,这种方法既慢(由于所有的观测样例作为一个簇开始的,是递归进行分离的)又有统计学行的病态。 连同-驱使聚类(Conectivity-constrained clustering) from sklearn.feature_extraction.image import grid_to_graphfrom sklearn.cluster import AgglomerativeClustering################################################################################ Generate datalena = sp.misc.lena()# Downsample the image by a factor of 4lena = lena[::2,::2] + lena[1::2,::2] + lena[::2,1::2] + lena[1::2,1::2]
X = np.reshape(lena,(-1,1))################################################################################ Define the structure A of the data. Pixels connected to their neighbors.connectivity = grid_to_graph(*lena.shape)################################################################################ Compute clusteringprint("Compute structured hierarchical clustering...")
st = time.time()
n_clusters = 15 ?# number of regionsward = AgglomerativeClustering(n_clusters=n_clusters,? ?linkage='ward',connectivity=connectivity).fit(X)
label = np.reshape(ward.labels_,lena.shape)print("Elapsed time: ",time.time() - st)print("Number of pixels: ",label.size)print("Number of clusters: ",np.unique(label).size)
特征凝聚: digits = datasets.load_digits() images = digits.images X = np.reshape(images,(len(images),-1)) connectivity = grid_to_graph(*images[0].shape) agglo = cluster.FeatureAgglomeration(connectivity=connectivity,? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? n_clusters=32) agglo.fit(X) X_reduced = agglo.transform(X) X_approx = agglo.inverse_transform(X_reduced) images_approx = np.reshape(X_approx,images.shape)
(2)分解:从一个信号到成分和加载成分及其加载:
上面观测样例的点分布在一个方向上是非常平坦的:三个特征单变量的一个甚至可以有其他两个准确的计算出来。PCA用来发现数据在哪个方向上是不平坦的。 当被用来转换数据的时候,PCA可以通过投射到一个主子空间来降低数据的维度。: # Create a signal with only 2 useful dimensionsx1 = np.random.normal(size=100) x2 = np.random.normal(size=100) x3 = x1 + x2 X = np.c_[x1,x2,x3]from sklearn import decomposition pca = decomposition.PCA() pca.fit(X)print(pca.explained_variance_) ?# As we can see,only the 2 first components are usefulpca.n_components = 2X_reduced = pca.fit_transform(X) X_reduced.shape
|

