Python 拼寫檢查如何更簡單的使用
Python 拼寫檢查在使用的時候有些問題一直在困擾著我們。其實只有不斷的學(xué)習(xí)才能更好的使用這門語言。這幾天在翻舊代碼時發(fā)現(xiàn)以前寫的注釋部分有很多單詞拼寫錯誤,這些單詞錯得不算離譜,應(yīng)該可以用工具自動糾錯絕大部分。
用 Python 拼寫檢查腳本很容易,如果能很好利用 aspell/ispell 這些現(xiàn)成的小工具就更簡單了。
Google 大牛 Peter Norvig 寫了一篇 How to Write a Spelling Corrector 很值得一看,大牛就是大牛,21行 Python拼寫檢查問題,而且還不用外部工具,只需要事先讀入一個詞典文件。本文程序的 edits1 函數(shù)就是從牛人家那里 copy 的。
- #!/usr/bin/python
- # A simple spell checker
- # written by http://www.vpsee.com
- import os, sys, subprocess, signal
- alphabet = 'abcdefghijklmnopqrstuvwxyz'
- def found(word, args, cwd = None, shell = True):
- child = subprocess.Popen(args,
- shellshell = shell,
- stdin = subprocess.PIPE,
- stdout = subprocess.PIPE,
- cwdcwd = cwd,
- universal_newlines = True)
- child.stdout.readline()
- (stdout, stderr) = child.communicate(word)
- if ": " in stdout:
- # remove \n\n
- stdoutstdout = stdout.rstrip("\n")
- # remove left part until :
- left, candidates = stdout.split(": ", 1)
- candidatescandidates = candidates.split(", ")
- # making an error on the first letter of a word is less
- # probable, so we remove those candidates and append them
- # to the tail of queue, make them less priority
- for item in candidates:
- if item[0] != word[0]:
- candidates.remove(item)
- candidates.append(item)
- return candidates
- else:
- return None
- # copy from http://norvig.com/spell-correct.html
- def edits1(word):
- n = len(word)
- return set([word[0:i]+word[i+1:] for i in range(n)] +
- [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] +
- [word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] +
- [word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet])
- def correct(word):
- candidates1 = found(word, 'aspell -a')
- if not candidates1:
- print "no suggestion"
- return
- candidates2 = edits1(word)
- candidates = []
- for word in candidates1:
- if word in candidates2:
- candidates.append(word)
- if not candidates:
- print "suggestion: %s" % candidates1[0]
- else:
- print "suggestion: %s" % max(candidates)
- def signal_handler(signal, frame):
- sys.exit(0)
- if __name__ == '__main__':
- signal.signal(signal.SIGINT, signal_handler)
- while True:
- input = raw_input()
- correct(input)
以上就是對Python 拼寫檢查的相關(guān)解決方案。
【編輯推薦】