API

Core Detector and Recognizer

The detector and recognizer classes are the core of the package. They provide wrappers for the underlying Keras models.

class keras_ocr.detection.Detector(weights='clovaai_general', load_from_torch=False, optimizer='adam', backbone_name='vgg')[source]

A text detector using the CRAFT architecture.

Parameters
  • weights – The weights to use for the model. Currently, only clovaai_general is supported.

  • load_from_torch – Whether to load the weights from the original PyTorch weights.

  • optimizer – The optimizer to use for training the model.

  • backbone_name – The backbone to use. Currently, only ‘vgg’ is supported.

detect(images, detection_threshold=0.7, text_threshold=0.4, link_threshold=0.4, size_threshold=10, **kwargs)[source]

Recognize the text in a set of images.

Parameters
  • images – Can be a list of numpy arrays of shape HxWx3 or a list of filepaths.

  • link_threshold – This is the same as text_threshold, but is applied to the link map instead of the text map.

  • detection_threshold – We want to avoid including boxes that may have represented large regions of low confidence text predictions. To do this, we do a final check for each word box to make sure the maximum confidence value exceeds some detection threshold. This is the threshold used for this check.

  • text_threshold – When the text map is processed, it is converted from confidence (float from zero to one) values to classification (0 for not text, 1 for text) using binary thresholding. The threshold value determines the breakpoint at which a value is converted to a 1 or a 0. For example, if the threshold is 0.4 and a value for particular point on the text map is 0.5, that value gets converted to a 1. The higher this value is, the less likely it is that characters will be merged together into a single word. The lower this value is, the more likely it is that non-text will be detected. Therein lies the balance.

  • size_threshold – The minimum area for a word.

get_batch_generator(image_generator, batch_size=8, heatmap_size=512, heatmap_distance_ratio=1.5)[source]

Get a generator of X, y batches to train the detector.

Parameters
  • image_generator – A generator with the same signature as keras_ocr.tools.get_image_generator. Optionally, a third entry in the tuple (beyond image and lines) can be provided which will be interpreted as the sample weight.

  • batch_size – The size of batches to generate.

  • heatmap_size – The size of the heatmap to pass to get_gaussian_heatmap

  • heatmap_distance_ratio – The distance ratio to pass to get_gaussian_heatmap. The larger the value, the more tightly concentrated the heatmap becomes.

class keras_ocr.recognition.Recognizer(alphabet=None, weights='kurapan', build_params=None)[source]

A text detector using the CRNN architecture.

Parameters
  • alphabet – The alphabet the model should recognize.

  • build_params – A dictionary of build parameters for the model. See keras_ocr.recognition.build_model for details.

  • weights – The starting weight configuration for the model.

  • include_top – Whether to include the final classification layer in the model (set to False to use a custom alphabet).

compile(*args, **kwargs)[source]

Compile the training model.

get_batch_generator(image_generator, batch_size=8, lowercase=False)[source]

Generate batches of training data from an image generator. The generator should yield tuples of (image, sentence) where image contains a single line of text and sentence is a string representing the contents of the image. If a sample weight is desired, it can be provided as a third entry in the tuple, making each tuple an (image, sentence, weight) tuple.

Parameters
  • image_generator – An image / sentence tuple generator. The images should be in color even if the OCR is setup to handle grayscale as they will be converted here.

  • batch_size – How many images to generate at a time.

  • lowercase – Whether to convert all characters to lowercase before encoding.

recognize(image)[source]

Recognize text from a single image.

Parameters

image – A pre-cropped image containing characters

recognize_from_boxes(images, box_groups, **kwargs)[source]

Recognize text from images using lists of bounding boxes.

Parameters
  • images – A list of input images, supplied as numpy arrays with shape (H, W, 3).

  • boxes – A list of groups of boxes, one for each image

Return type

List[List[str]]

Data Generation

The data_generation module contains the functions for generating synthetic data.

keras_ocr.data_generation.compute_transformed_contour(width, height, fontsize, M, contour, minarea=0.5)[source]

Compute the permitted drawing contour on a padded canvas for an image of a given size. We assume the canvas is padded with one full image width and height on left and right, top and bottom respectively.

Parameters
  • width – Width of image

  • height – Height of image

  • fontsize – Size of characters

  • M – The transformation matrix

  • contour – The contour to which we are limited inside the rectangle of size width / height

  • minarea – The minimum area required for a character slot to qualify as being visible, expressed as a fraction of the untransformed fontsize x fontsize slot.

keras_ocr.data_generation.convert_image_generator_to_recognizer_input(image_generator, max_string_length, target_width, target_height, margin=0)[source]

Convert an image generator created by get_image_generator to (image, sentence) tuples for training a recognizer.

Parameters
  • image_generator – An image generator created by get_image_generator

  • max_string_length – The maximum string length to allow

  • target_width – The width to warp lines into

  • target_height – The height to warp lines into

  • margin – The margin to apply around a single line.

keras_ocr.data_generation.convert_lines_to_paragraph(lines)[source]

Convert a series of lines, each consisting of (box, character) tuples, into a multi-line string.

keras_ocr.data_generation.draw_text_image(text, fontsize, height, width, fonts, use_ligatures=False, thetaX=0, thetaY=0, thetaZ=0, color=(0, 0, 0), permitted_contour=None, draw_contour=False)[source]

Get a transparent image containing text.

Parameters
  • text – The text to draw on the image

  • fontsize – The size of text to show.

  • height – The height of the output image

  • width – The width of the output image

  • fonts – A dictionary of {subalphabet: paths_to_font}

  • thetaX – Rotation about the X axis

  • thetaY – Rotation about the Y axis

  • thetaZ – Rotation about the Z axis

  • color – The color of drawn text

  • permitted_contour – A contour defining which part of the image we can put text. If None, the entire canvas is permitted for text.

  • use_ligatures – Whether to render ligatures. If True, ligatures are always used (with an initial check for support which sometimes yields false positives). If False, ligatures are never used.

Returns

An (image, lines) tuple where image is the transparent text image and lines is a list of lines where each line itself is a list of (box, character) tuples and box is an array of points with shape (4, 2) providing the coordinates of the character box in clockwise order starting from the top left.

keras_ocr.data_generation.font_supports_alphabet(filepath, alphabet)[source]

Verify that a font contains a specific set of characters.

Parameters
  • filepath – Path to fsontfile

  • alphabet – A string of characters to check for.

keras_ocr.data_generation.get_backgrounds(cache_dir=None)[source]

Download a set of pre-reviewed backgrounds.

Parameters

cache_dir – Where to save the dataset. By default, data will be saved to ~/.keras-ocr.

Returns

A list of background filepaths.

keras_ocr.data_generation.get_fonts(cache_dir=None, alphabet='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', exclude_smallcaps=False)[source]

Download a set of pre-reviewed fonts.

Parameters
  • cache_dir – Where to save the dataset. By default, data will be saved to ~/.keras-ocr.

  • alphabet – An alphabet which we will use to exclude fonts that are missing relevant characters. By default, this is set to string.ascii_letters + string.digits.

  • exclude_smallcaps – If True, fonts that are known to use the same glyph for lowercase and uppercase characters are excluded.

Returns

A list of font filepaths.

keras_ocr.data_generation.get_image_generator(height, width, font_groups, text_generator, font_size=18, backgrounds=None, background_crop_mode='crop', rotationX=0, rotationY=0, rotationZ=0, margin=0, use_ligatures=False, augmenter=None, draw_contour=False, draw_contour_text=False)[source]

Create a generator for images containing text.

Parameters
  • height – The height of the generated image

  • width – The width of the generated image.

  • font_groups – A dict mapping of { subalphabet: [path_to_font1, path_to_font2] }.

  • text_generator – See get_text_generator

  • font_size – The font size to use. Alternative, supply a tuple and the font size will be randomly selected between the two values.

  • backgrounds – A list of paths to image backgrounds or actual images as numpy arrays with channels in RGB order.

  • background_crop_mode – One of letterbox or crop, indicates how backgrounds will be resized to fit on the canvas.

  • rotationX – The X-axis text rotation to use. Alternative, supply a tuple and the rotation will be randomly selected between the two values.

  • rotationY – The Y-axis text rotation to use. Alternative, supply a tuple and the rotation will be randomly selected between the two values.

  • rotationZ – The Z-axis text rotation to use. Alternative, supply a tuple and the rotation will be randomly selected between the two values.

  • margin – The minimum margin around the edge of the image.

  • use_ligatures – Whether to render ligatures (see draw_text_image)

  • augmenter – An image augmenter to be applied to backgrounds

  • draw_contour – Draw the permitted contour onto images (debugging only)

  • draw_contour_text – Draw the permitted contour inside the text drawing function.

Yields

Tuples of (image, lines) where image is the transparent text image and lines is a list of lines where each line itself is a list of (box, character) tuples and box is an array of points with shape (4, 2) providing the coordinates of the character box in clockwise order starting from the top left.

keras_ocr.data_generation.get_maximum_uniform_contour(image, fontsize, margin=0)[source]

Get the largest possible contour of light or dark area in an image.

Parameters
  • image – The image in which to find a contiguous area.

  • fontsize – The fontsize for text. Will be used for blurring and for determining useful areas.

  • margin – The minimum margin required around the image.

Returns

A (contour, isDark) tuple. If no contour is found, both entries will be None.

keras_ocr.data_generation.get_rotation_matrix(width, height, thetaX=0, thetaY=0, thetaZ=0)[source]

Provide a rotation matrix about the center of a rectangle with a given width and height.

Parameters
  • width – The width of the rectangle

  • height – The height of the rectangle

  • thetaX – Rotation about the X axis

  • thetaY – Rotation about the Y axis

  • thetaZ – Rotation about the Z axis

Returns

A 3x3 transformation matrix

keras_ocr.data_generation.get_text_generator(alphabet=None, lowercase=False, max_string_length=None)[source]

Generates strings of sentences using only the letters in alphabet.

Parameters
  • alphabet – The alphabet of permitted characters

  • lowercase – Whether to convert all strings to lowercase.

  • max_string_length – The maximum length of the string

Tools

The tools module primarily contains convenience functions for reading images and downloading data.

keras_ocr.tools.adjust_boxes(boxes, scale=1, boxes_format='boxes')[source]

Adjust boxes using a given scale and offset.

Parameters
  • boxes – The boxes to adjust

  • boxes_format – The format for the boxes. See the drawBoxes function for an explanation on the options.

  • scale – The scale to apply

Return type

Union[ndarray, List[Tuple[ndarray, str]], List[Tuple[str, ndarray]]]

keras_ocr.tools.augment(boxes, augmenter, image=None, boxes_format='boxes', image_shape=None, area_threshold=0.5, min_area=None)[source]

Augment an image and associated boxes together.

Parameters
  • image – The image which we wish to apply the augmentation.

  • boxes – The boxes that will be augmented together with the image

  • boxes_format – The format for the boxes. See the drawBoxes function for an explanation on the options.

  • image_shape – The shape of the input image if no image will be provided.

  • area_threshold – Fraction of bounding box that we require to be in augmented image to include it.

  • min_area – The minimum area for a character to be included.

keras_ocr.tools.combine_line(line)[source]

Combine a set of boxes in a line into a single bounding box.

Parameters

line – A list of (box, character) entries

Returns

A (box, text) tuple

keras_ocr.tools.download_and_verify(url, sha256=None, cache_dir=None, verbose=True, filename=None)[source]

Download a file to a cache directory and verify it with a sha256 hash.

Parameters
  • url – The file to download

  • sha256 – The sha256 hash to check. If the file already exists and the hash matches, we don’t download it again.

  • cache_dir – The directory in which to cache the file. The default is ~/.keras-ocr.

  • verbose – Whether to log progress

  • filename – The filename to use for the file. By default, the filename is derived from the URL.

keras_ocr.tools.drawAnnotations(image, predictions, ax=None)[source]

Draw text annotations onto image.

Parameters
  • image – The image on which to draw

  • predictions – The predictions as provided by pipeline.recognize.

  • ax – A matplotlib axis on which to draw.

keras_ocr.tools.drawBoxes(image, boxes, color=(255, 0, 0), thickness=5, boxes_format='boxes')[source]

Draw boxes onto an image.

Parameters
  • image – The image on which to draw the boxes.

  • boxes – The boxes to draw.

  • color – The color for each box.

  • thickness – The thickness for each box.

  • boxes_format – The format used for providing the boxes. Options are “boxes” which indicates an array with shape(N, 4, 2) where N is the number of boxes and each box is a list of four points) as provided by keras_ocr.detection.Detector.detect, “lines” (a list of lines where each line itself is a list of (box, character) tuples) as provided by keras_ocr.data_generation.get_image_generator, or “predictions” where boxes is by itself a list of (word, box) tuples as provided by keras_ocr.pipeline.Pipeline.recognize or keras_ocr.recognition.Recognizer.recognize_from_boxes.

keras_ocr.tools.fit(image, width, height, cval=255, mode='letterbox', return_scale=False)[source]

Obtain a new image, fit to the specified size.

Parameters
  • image – The input image

  • width – The new width

  • height – The new height

  • cval – The constant value to use to fill the remaining areas of the image

  • return_scale – Whether to return the scale used for the image

Returns

The new image

keras_ocr.tools.fix_line(line)[source]

Given a list of (box, character) tuples, return a revised line with a consistent ordering of left-to-right or top-to-bottom, with each box provided with (top-left, top-right, bottom-right, bottom-left) ordering.

Returns

A tuple that is the fixed line as well as a string indicating whether the line is horizontal or vertical.

keras_ocr.tools.get_rotated_box(points)[source]

Obtain the parameters of a rotated box.

Returns

The vertices of the rotated box in top-left, top-right, bottom-right, bottom-left order along with the angle of rotation about the bottom left corner.

Return type

Tuple[ndarray, float]

keras_ocr.tools.get_rotated_width_height(box)[source]

Returns the width and height of a rotated rectangle

Parameters
  • box – A list of four points starting in the top left

  • and moving clockwise. (corner) –

keras_ocr.tools.pad(image, width, height, cval=255)[source]

Pad an image to a desired size. Raises an exception if image is larger than desired size.

Parameters
  • image – The input image

  • width – The output width

  • height – The output height

  • cval – The value to use for filling the image.

keras_ocr.tools.read(filepath_or_buffer)[source]

Read a file into an image object

Parameters

filepath_or_buffer – The path to the file, a URL, or any object with a read method (such as io.BytesIO)

keras_ocr.tools.read_and_fit(filepath_or_array, width, height, cval=255, mode='letterbox')[source]

Read an image from disk and fit to the specified size.

Parameters
  • filepath – The path to the image or numpy array of shape HxWx3

  • width – The new width

  • height – The new height

  • cval – The constant value to use to fill the remaining areas of the image

  • mode – The mode to pass to “fit” (crop or letterbox)

Returns

The new image

keras_ocr.tools.resize_image(image, max_scale, max_size)[source]

Obtain the optimal resized image subject to a maximum scale and maximum size.

Parameters
  • image – The input image

  • max_scale – The maximum scale to apply

  • max_size – The maximum size to return

keras_ocr.tools.sha256sum(filename)[source]

Compute the sha256 hash for a file.

keras_ocr.tools.warpBox(image, box, target_height=None, target_width=None, margin=0, cval=None, return_transform=False, skip_rotate=False)[source]

Warp a boxed region in an image given by a set of four points into a rectangle with a specified width and height. Useful for taking crops of distorted or rotated text.

Parameters
  • image – The image from which to take the box

  • box – A list of four points starting in the top left corner and moving clockwise.

  • target_height – The height of the output rectangle

  • target_width – The width of the output rectangle

  • return_transform – Whether to return the transformation matrix with the image.

Datasets

The datasets module contains functions for using data from public datasets. See the fine-tuning detector and fine-tuning recognizer examples.

keras_ocr.datasets.get_born_digital_recognizer_dataset(split='train', cache_dir=None)[source]

Get a list of (filepath, box, word) tuples from the BornDigital dataset. This dataset comes pre-cropped so box is always None.

Parameters
  • split – Which split to get (train, test, or traintest)

  • cache_dir – The directory in which to cache the file. The default is ~/.keras-ocr.

Returns

A recognition dataset as a list of (filepath, box, word) tuples

keras_ocr.datasets.get_cocotext_recognizer_dataset(split='train', cache_dir=None, limit=None, legible_only=False, english_only=False, return_raw_labels=False)[source]

Get a list of (filepath, box, word) tuples from the COCO-Text dataset.

Parameters
  • split – Which split to get (train, val, or trainval)

  • limit – Limit the number of files included in the download

  • cache_dir – The directory in which to cache the file. The default is ~/.keras-ocr.

  • return_raw_labels – Whether to return the raw labels object

Returns

A recognition dataset as a list of (filepath, box, word) tuples. If return_raw_labels is True, you will also get a (labels, images_dir) tuple containing the raw COCO data and the directory in which you can find the images.

keras_ocr.datasets.get_detector_image_generator(labels, width, height, augmenter=None, area_threshold=0.5, focused=False, min_area=None, shuffle=True)[source]

Generated augmented (image, lines) tuples from a list of (filepath, lines, confidence) tuples. Confidence is not used right now but is included for a future release that uses semi-supervised data.

Parameters
  • labels – A list of (image, lines, confience) tuples.

  • augmenter – An augmenter to apply to the images.

  • width – The width to use for output images

  • height – The height to use for output images

  • area_threshold – The area threshold to use to keep characters in augmented images.

  • min_area – The minimum area for a character to be included.

  • focused – Whether to pre-crop images to width/height containing a region containing text.

  • shuffle – Whether to shuffle the data on each iteration.

keras_ocr.datasets.get_icdar_2013_detector_dataset(cache_dir=None, skip_illegible=False)[source]

Get the ICDAR 2013 text segmentation dataset for detector training. Only the training set has the necessary annotations. For the test set, only segmentation maps are provided, which do not provide the necessary information for affinity scores.

Parameters
  • cache_dir – The directory in which to store the data.

  • skip_illegible – Whether to skip illegible characters.

Returns

Lists of (image_path, lines, confidence) tuples. Confidence is always 1 for this dataset. We record confidence to allow for future support for weakly supervised cases.

keras_ocr.datasets.get_icdar_2013_recognizer_dataset(cache_dir=None)[source]

Get a list of (filepath, box, word) tuples from the ICDAR 2013 dataset.

Parameters

cache_dir – The directory in which to cache the file. The default is ~/.keras-ocr.

Returns

A recognition dataset as a list of (filepath, box, word) tuples

keras_ocr.datasets.get_icdar_2019_semisupervised_dataset(cache_dir=None)[source]

EXPERIMENTAL. Get a semisupervised labeled version of the ICDAR 2019 dataset. Only images with Latin-only scripts are available at this time.

Parameters

cache_dir – The cache directory to use.

keras_ocr.datasets.get_recognizer_image_generator(labels, height, width, alphabet, augmenter=None, shuffle=True)[source]

Generate augmented (image, text) tuples from a list of (filepath, box, label) tuples.

Parameters
  • labels – A list of (filepath, box, label) tuples

  • height – The height of the images to return

  • width – The width of the images to return

  • alphabet – The alphabet which limits the characters returned

  • augmenter – The augmenter to apply to images

  • shuffle – Whether to shuffle the dataset on each iteration