Core Detector and Recognizer¶
The detector and recognizer classes are the core of the package. They provide wrappers for the underlying Keras models.
(weights='clovaai_general', load_from_torch=False, optimizer='adam', backbone_name='vgg')[source]¶ A text detector using the CRAFT architecture.
- Parameters
weights – The weights to use for the model. Currently, only clovaai_general is supported.
load_from_torch – Whether to load the weights from the original PyTorch weights.
optimizer – The optimizer to use for training the model.
backbone_name – The backbone to use. Currently, only ‘vgg’ is supported.
(images, detection_threshold=0.7, text_threshold=0.4, link_threshold=0.4, size_threshold=10, **kwargs)[source]¶ Recognize the text in a set of images.
- Parameters
images – Can be a list of numpy arrays of shape HxWx3 or a list of filepaths.
link_threshold – This is the same as text_threshold, but is applied to the link map instead of the text map.
detection_threshold – We want to avoid including boxes that may have represented large regions of low confidence text predictions. To do this, we do a final check for each word box to make sure the maximum confidence value exceeds some detection threshold. This is the threshold used for this check.
text_threshold – When the text map is processed, it is converted from confidence (float from zero to one) values to classification (0 for not text, 1 for text) using binary thresholding. The threshold value determines the breakpoint at which a value is converted to a 1 or a 0. For example, if the threshold is 0.4 and a value for particular point on the text map is 0.5, that value gets converted to a 1. The higher this value is, the less likely it is that characters will be merged together into a single word. The lower this value is, the more likely it is that non-text will be detected. Therein lies the balance.
size_threshold – The minimum area for a word.
(image_generator, batch_size=8, heatmap_size=512, heatmap_distance_ratio=1.5)[source]¶ Get a generator of X, y batches to train the detector.
- Parameters
image_generator – A generator with the same signature as keras_ocr.tools.get_image_generator. Optionally, a third entry in the tuple (beyond image and lines) can be provided which will be interpreted as the sample weight.
batch_size – The size of batches to generate.
heatmap_size – The size of the heatmap to pass to get_gaussian_heatmap
heatmap_distance_ratio – The distance ratio to pass to get_gaussian_heatmap. The larger the value, the more tightly concentrated the heatmap becomes.
(alphabet=None, weights='kurapan', build_params=None)[source]¶ A text detector using the CRNN architecture.
- Parameters
alphabet – The alphabet the model should recognize.
build_params – A dictionary of build parameters for the model. See keras_ocr.recognition.build_model for details.
weights – The starting weight configuration for the model.
include_top – Whether to include the final classification layer in the model (set to False to use a custom alphabet).
(image_generator, batch_size=8, lowercase=False)[source]¶ Generate batches of training data from an image generator. The generator should yield tuples of (image, sentence) where image contains a single line of text and sentence is a string representing the contents of the image. If a sample weight is desired, it can be provided as a third entry in the tuple, making each tuple an (image, sentence, weight) tuple.
- Parameters
image_generator – An image / sentence tuple generator. The images should be in color even if the OCR is setup to handle grayscale as they will be converted here.
batch_size – How many images to generate at a time.
lowercase – Whether to convert all characters to lowercase before encoding.
Data Generation¶
The data_generation
module contains the functions
for generating synthetic data.
(width, height, fontsize, M, contour, minarea=0.5)[source]¶ Compute the permitted drawing contour on a padded canvas for an image of a given size. We assume the canvas is padded with one full image width and height on left and right, top and bottom respectively.
- Parameters
width – Width of image
height – Height of image
fontsize – Size of characters
M – The transformation matrix
contour – The contour to which we are limited inside the rectangle of size width / height
minarea – The minimum area required for a character slot to qualify as being visible, expressed as a fraction of the untransformed fontsize x fontsize slot.
(image_generator, max_string_length, target_width, target_height, margin=0)[source]¶ Convert an image generator created by get_image_generator to (image, sentence) tuples for training a recognizer.
- Parameters
image_generator – An image generator created by get_image_generator
max_string_length – The maximum string length to allow
target_width – The width to warp lines into
target_height – The height to warp lines into
margin – The margin to apply around a single line.
(lines)[source]¶ Convert a series of lines, each consisting of (box, character) tuples, into a multi-line string.
(text, fontsize, height, width, fonts, use_ligatures=False, thetaX=0, thetaY=0, thetaZ=0, color=(0, 0, 0), permitted_contour=None, draw_contour=False)[source]¶ Get a transparent image containing text.
- Parameters
text – The text to draw on the image
fontsize – The size of text to show.
height – The height of the output image
width – The width of the output image
fonts – A dictionary of {subalphabet: paths_to_font}
thetaX – Rotation about the X axis
thetaY – Rotation about the Y axis
thetaZ – Rotation about the Z axis
color – The color of drawn text
permitted_contour – A contour defining which part of the image we can put text. If None, the entire canvas is permitted for text.
use_ligatures – Whether to render ligatures. If True, ligatures are always used (with an initial check for support which sometimes yields false positives). If False, ligatures are never used.
- Returns
An (image, lines) tuple where image is the transparent text image and lines is a list of lines where each line itself is a list of (box, character) tuples and box is an array of points with shape (4, 2) providing the coordinates of the character box in clockwise order starting from the top left.
(filepath, alphabet)[source]¶ Verify that a font contains a specific set of characters.
- Parameters
filepath – Path to fsontfile
alphabet – A string of characters to check for.
(cache_dir=None)[source]¶ Download a set of pre-reviewed backgrounds.
- Parameters
cache_dir – Where to save the dataset. By default, data will be saved to ~/.keras-ocr.
- Returns
A list of background filepaths.
(cache_dir=None, alphabet='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', exclude_smallcaps=False)[source]¶ Download a set of pre-reviewed fonts.
- Parameters
cache_dir – Where to save the dataset. By default, data will be saved to ~/.keras-ocr.
alphabet – An alphabet which we will use to exclude fonts that are missing relevant characters. By default, this is set to string.ascii_letters + string.digits.
exclude_smallcaps – If True, fonts that are known to use the same glyph for lowercase and uppercase characters are excluded.
- Returns
A list of font filepaths.
(height, width, font_groups, text_generator, font_size=18, backgrounds=None, background_crop_mode='crop', rotationX=0, rotationY=0, rotationZ=0, margin=0, use_ligatures=False, augmenter=None, draw_contour=False, draw_contour_text=False)[source]¶ Create a generator for images containing text.
- Parameters
height – The height of the generated image
width – The width of the generated image.
font_groups – A dict mapping of { subalphabet: [path_to_font1, path_to_font2] }.
text_generator – See get_text_generator
font_size – The font size to use. Alternative, supply a tuple and the font size will be randomly selected between the two values.
backgrounds – A list of paths to image backgrounds or actual images as numpy arrays with channels in RGB order.
background_crop_mode – One of letterbox or crop, indicates how backgrounds will be resized to fit on the canvas.
rotationX – The X-axis text rotation to use. Alternative, supply a tuple and the rotation will be randomly selected between the two values.
rotationY – The Y-axis text rotation to use. Alternative, supply a tuple and the rotation will be randomly selected between the two values.
rotationZ – The Z-axis text rotation to use. Alternative, supply a tuple and the rotation will be randomly selected between the two values.
margin – The minimum margin around the edge of the image.
use_ligatures – Whether to render ligatures (see draw_text_image)
augmenter – An image augmenter to be applied to backgrounds
draw_contour – Draw the permitted contour onto images (debugging only)
draw_contour_text – Draw the permitted contour inside the text drawing function.
- Yields
Tuples of (image, lines) where image is the transparent text image and lines is a list of lines where each line itself is a list of (box, character) tuples and box is an array of points with shape (4, 2) providing the coordinates of the character box in clockwise order starting from the top left.
(image, fontsize, margin=0)[source]¶ Get the largest possible contour of light or dark area in an image.
- Parameters
image – The image in which to find a contiguous area.
fontsize – The fontsize for text. Will be used for blurring and for determining useful areas.
margin – The minimum margin required around the image.
- Returns
A (contour, isDark) tuple. If no contour is found, both entries will be None.
(width, height, thetaX=0, thetaY=0, thetaZ=0)[source]¶ Provide a rotation matrix about the center of a rectangle with a given width and height.
- Parameters
width – The width of the rectangle
height – The height of the rectangle
thetaX – Rotation about the X axis
thetaY – Rotation about the Y axis
thetaZ – Rotation about the Z axis
- Returns
A 3x3 transformation matrix
(alphabet=None, lowercase=False, max_string_length=None)[source]¶ Generates strings of sentences using only the letters in alphabet.
- Parameters
alphabet – The alphabet of permitted characters
lowercase – Whether to convert all strings to lowercase.
max_string_length – The maximum length of the string
The tools
module primarily contains convenience functions for
reading images and downloading data.
(boxes, scale=1, boxes_format='boxes')[source]¶ Adjust boxes using a given scale and offset.
- Parameters
boxes – The boxes to adjust
boxes_format – The format for the boxes. See the drawBoxes function for an explanation on the options.
scale – The scale to apply
- Return type
(boxes, augmenter, image=None, boxes_format='boxes', image_shape=None, area_threshold=0.5, min_area=None)[source]¶ Augment an image and associated boxes together.
- Parameters
image – The image which we wish to apply the augmentation.
boxes – The boxes that will be augmented together with the image
boxes_format – The format for the boxes. See the drawBoxes function for an explanation on the options.
image_shape – The shape of the input image if no image will be provided.
area_threshold – Fraction of bounding box that we require to be in augmented image to include it.
min_area – The minimum area for a character to be included.
(line)[source]¶ Combine a set of boxes in a line into a single bounding box.
- Parameters
line – A list of (box, character) entries
- Returns
A (box, text) tuple
(url, sha256=None, cache_dir=None, verbose=True, filename=None)[source]¶ Download a file to a cache directory and verify it with a sha256 hash.
- Parameters
url – The file to download
sha256 – The sha256 hash to check. If the file already exists and the hash matches, we don’t download it again.
cache_dir – The directory in which to cache the file. The default is ~/.keras-ocr.
verbose – Whether to log progress
filename – The filename to use for the file. By default, the filename is derived from the URL.
(image, predictions, ax=None)[source]¶ Draw text annotations onto image.
- Parameters
image – The image on which to draw
predictions – The predictions as provided by pipeline.recognize.
ax – A matplotlib axis on which to draw.
(image, boxes, color=(255, 0, 0), thickness=5, boxes_format='boxes')[source]¶ Draw boxes onto an image.
- Parameters
image – The image on which to draw the boxes.
boxes – The boxes to draw.
color – The color for each box.
thickness – The thickness for each box.
boxes_format – The format used for providing the boxes. Options are “boxes” which indicates an array with shape(N, 4, 2) where N is the number of boxes and each box is a list of four points) as provided by keras_ocr.detection.Detector.detect, “lines” (a list of lines where each line itself is a list of (box, character) tuples) as provided by keras_ocr.data_generation.get_image_generator, or “predictions” where boxes is by itself a list of (word, box) tuples as provided by keras_ocr.pipeline.Pipeline.recognize or keras_ocr.recognition.Recognizer.recognize_from_boxes.
(image, width, height, cval=255, mode='letterbox', return_scale=False)[source]¶ Obtain a new image, fit to the specified size.
- Parameters
image – The input image
width – The new width
height – The new height
cval – The constant value to use to fill the remaining areas of the image
return_scale – Whether to return the scale used for the image
- Returns
The new image
(line)[source]¶ Given a list of (box, character) tuples, return a revised line with a consistent ordering of left-to-right or top-to-bottom, with each box provided with (top-left, top-right, bottom-right, bottom-left) ordering.
- Returns
A tuple that is the fixed line as well as a string indicating whether the line is horizontal or vertical.
(points)[source]¶ Obtain the parameters of a rotated box.
- Returns
The vertices of the rotated box in top-left, top-right, bottom-right, bottom-left order along with the angle of rotation about the bottom left corner.
- Return type
(box)[source]¶ Returns the width and height of a rotated rectangle
- Parameters
box – A list of four points starting in the top left
and moving clockwise. (corner) –
(image, width, height, cval=255)[source]¶ Pad an image to a desired size. Raises an exception if image is larger than desired size.
- Parameters
image – The input image
width – The output width
height – The output height
cval – The value to use for filling the image.
(filepath_or_buffer)[source]¶ Read a file into an image object
- Parameters
filepath_or_buffer – The path to the file, a URL, or any object with a read method (such as io.BytesIO)
(filepath_or_array, width, height, cval=255, mode='letterbox')[source]¶ Read an image from disk and fit to the specified size.
- Parameters
filepath – The path to the image or numpy array of shape HxWx3
width – The new width
height – The new height
cval – The constant value to use to fill the remaining areas of the image
mode – The mode to pass to “fit” (crop or letterbox)
- Returns
The new image
(image, max_scale, max_size)[source]¶ Obtain the optimal resized image subject to a maximum scale and maximum size.
- Parameters
image – The input image
max_scale – The maximum scale to apply
max_size – The maximum size to return
(image, box, target_height=None, target_width=None, margin=0, cval=None, return_transform=False, skip_rotate=False)[source]¶ Warp a boxed region in an image given by a set of four points into a rectangle with a specified width and height. Useful for taking crops of distorted or rotated text.
- Parameters
image – The image from which to take the box
box – A list of four points starting in the top left corner and moving clockwise.
target_height – The height of the output rectangle
target_width – The width of the output rectangle
return_transform – Whether to return the transformation matrix with the image.
The datasets
module contains functions for using data
from public datasets. See the fine-tuning detector
and fine-tuning recognizer examples.
(split='train', cache_dir=None)[source]¶ Get a list of (filepath, box, word) tuples from the BornDigital dataset. This dataset comes pre-cropped so box is always None.
- Parameters
split – Which split to get (train, test, or traintest)
cache_dir – The directory in which to cache the file. The default is ~/.keras-ocr.
- Returns
A recognition dataset as a list of (filepath, box, word) tuples
(split='train', cache_dir=None, limit=None, legible_only=False, english_only=False, return_raw_labels=False)[source]¶ Get a list of (filepath, box, word) tuples from the COCO-Text dataset.
- Parameters
split – Which split to get (train, val, or trainval)
limit – Limit the number of files included in the download
cache_dir – The directory in which to cache the file. The default is ~/.keras-ocr.
return_raw_labels – Whether to return the raw labels object
- Returns
A recognition dataset as a list of (filepath, box, word) tuples. If return_raw_labels is True, you will also get a (labels, images_dir) tuple containing the raw COCO data and the directory in which you can find the images.
(labels, width, height, augmenter=None, area_threshold=0.5, focused=False, min_area=None, shuffle=True)[source]¶ Generated augmented (image, lines) tuples from a list of (filepath, lines, confidence) tuples. Confidence is not used right now but is included for a future release that uses semi-supervised data.
- Parameters
labels – A list of (image, lines, confience) tuples.
augmenter – An augmenter to apply to the images.
width – The width to use for output images
height – The height to use for output images
area_threshold – The area threshold to use to keep characters in augmented images.
min_area – The minimum area for a character to be included.
focused – Whether to pre-crop images to width/height containing a region containing text.
shuffle – Whether to shuffle the data on each iteration.
(cache_dir=None, skip_illegible=False)[source]¶ Get the ICDAR 2013 text segmentation dataset for detector training. Only the training set has the necessary annotations. For the test set, only segmentation maps are provided, which do not provide the necessary information for affinity scores.
- Parameters
cache_dir – The directory in which to store the data.
skip_illegible – Whether to skip illegible characters.
- Returns
Lists of (image_path, lines, confidence) tuples. Confidence is always 1 for this dataset. We record confidence to allow for future support for weakly supervised cases.
(cache_dir=None)[source]¶ Get a list of (filepath, box, word) tuples from the ICDAR 2013 dataset.
- Parameters
cache_dir – The directory in which to cache the file. The default is ~/.keras-ocr.
- Returns
A recognition dataset as a list of (filepath, box, word) tuples
(cache_dir=None)[source]¶ EXPERIMENTAL. Get a semisupervised labeled version of the ICDAR 2019 dataset. Only images with Latin-only scripts are available at this time.
- Parameters
cache_dir – The cache directory to use.
(labels, height, width, alphabet, augmenter=None, shuffle=True)[source]¶ Generate augmented (image, text) tuples from a list of (filepath, box, label) tuples.
- Parameters
labels – A list of (filepath, box, label) tuples
height – The height of the images to return
width – The width of the images to return
alphabet – The alphabet which limits the characters returned
augmenter – The augmenter to apply to images
shuffle – Whether to shuffle the dataset on each iteration