目标:学习如何在 Anaconda Prompt 中安装 Python 工具包,并用 wordcloud + matplotlib 生成简单的英文词云图。
import matplotlib.pyplot as plt # 导入作图工具包
from wordcloud import WordCloud, STOPWORDS # 导入词云工具、停用词工具
词云的输入通常是一段文本。这里用英文示例。
也可以从文件读取,如 open('text.txt').read()。
data = """Sometimes people come into your life and you know right away that they were meant to be there,
they serve some sort of purpose,to teach you a lesson or help figure out who you are or who you want to become.
You never know who these people may be -your roommate,neighbor,professor,long lost friend,lover or even a complete stranger
who,when you lockeyes with them,you know that very moment that they will affect your life in some profound way."""
# 查看前10个字符
print(data[:10])
Sometimes
停用词是在分析中需要忽略的常用词,如 the、is、and 等。
STOPWORDS 是 wordcloud 自带的英文停用词集合,也可以添加自定义停用词。
stopwords = set(STOPWORDS)
print("停用词样例:", list(stopwords)[:20]) # 查看部分停用词
停用词样例: ['the', 'how', 'is', "you're", 'above', 'have', 'below', 'what', 'while', 'during', 'from', "we've", 'each', "we'll", 'else', "hasn't", 'but', 'further', "they'd", "here's"]
常用参数:
background_color:背景颜色(如 'white') max_words:最大显示的词数 stopwords:停用词集合生成方法:
wc = WordCloud(...)
wc.generate(data)
wc = WordCloud(
background_color='white', # 背景颜色
max_words=200, # 最大显示词数
stopwords=stopwords # 停用词集合
)
wc.generate(data) # 生成词云
<wordcloud.wordcloud.WordCloud at 0x28f7d536d60>
plt.imshow(wc, interpolation="bilinear")
plt.axis("off") # 不显示坐标轴
plt.show()
pip install,不要在 Python 交互环境里运行。jieba),并指定支持中文的字体文件,否则会出现乱码。stopwords.add('word') 添加额外停用词。.txt 或 .csv 文件中读取,以便维护。