Analyzer¶
- This module contains two classes:
- Analyzer: a generic analyzer. It can be fed both from text strings and from files. You can also store a representation of the state of the analyzer to be retrieved later, with the
from_file
class method or theload
method. - EnglishAnalyzer: an special analyzer for the English language.
- Analyzer: a generic analyzer. It can be fed both from text strings and from files. You can also store a representation of the state of the analyzer to be retrieved later, with the
-
class
freqens.analyzer.
Analyzer
(content=None)¶ The class that performs the analysis. You can feed an analyzer from different sources (strings, files... ) so that it extracts the target frequency distribution and ask it to score supplied content based on frequency similarity
-
choose_best
(strings, n=1)¶ Returns the n strings whose frequency distribution is most similar to the one fed to the analyzer.
Parameters: - strings – an iterator with the strings where the Analyzer will looked for the best strings.
- n – an integer specifying the number of strings which will be returned.
Returns: an iterable containing the
n
best strings sorted by frequency similarity
-
discard
(chars)¶ Removes the chars in chars from the counter
Parameters: chars – an interable consisting of the chars whose frequency will be set to 0
-
feed
(content)¶ Feeds the analyzer with a string
Parameters: content – the string to be fed to the analyzer
-
feed_from_raw_file
(filename)¶ Feeds the analyzer with the content of a file Every character will be taken into account, including newline chars.
Parameters: filename – the path of the file that will be fed to the analyzer
-
classmethod
from_file
(filename)¶ Reads a frequency distribution from a JSON file as stored by store method
-
classmethod
from_raw_file
(filename)¶ Returns an analyzer whose frequency distribution is read from the file content
-
keys
()¶ Returns the characters whose frequency is greater than 0
-
load
(filename)¶ Loads a frequency distribution file and adds it to the current distribution
-
score
(content)¶ Assigns a score to any string. The smaller, the more similar frequency distribution. 0 means that the frequency distributions of both the content and the analyzer are equal.
Parameters: content – the string to be scored. Returns: a float number
-
serialize
()¶ Returns a json representation of the analyzer
Returns: a string containing a json representation of the absolute frequencies the analyzer has been fed with.
-
store
(filename)¶ Stores the json representation of the analyzer to a file
-
transform_keys
(transformation)¶ Maps the keys to other new keys to get a new frequency distribution
The relative frequency of keys that map to the same key will be added in order to get the new frequency distribution.
Parameters: transformation – a callable object that maps chars to chars
-
-
class
freqens.analyzer.
EnglishAnalyzer
(blank_spaces=True, case_sensitive=True, just_alpha=False)¶ An analyzer for the english language
-
freqens.analyzer.
counter_distance
(counter1, counter2)¶ Euclidean distance on the frequency distribution space