Pyostie

PYOSTIE( Python Open Source Text Information Extractor)


Project maintained by anirudhpnbb Hosted on GitHub Pages — Theme by mattgraham

Upload Python Package release

Table of Contents

About The Project

PYOSTIE is short for Python Open Source Text Information Extractor.

A very elegant and simple library to extract text from many file formats.

This module can extract text from PDfs, Office files, text files, Image files. Also, we generate an excel file that gives you some deeper insights into the text. We are now only extracting insights for Image and PDF formats.( More to come soon.)

Installation

  1. Clone the repo
    git clone https://github.com/anirudhpnbb/Pyostie.git
    
  2. Install using pip or pip3 ```commandline

pip3 install Pyostie

(or)

pip install Pyostie


<!-- USAGE EXAMPLES -->
## Usage


```python
import pyostie

For image files with insights.

output = pyostie.extract(filename, insights=True, extension="jpg") #### Format of the extension can also be "tif" or "pnb"
df, text = output.start()

For image files without insights.

output = pyostie.extract(filename, insights=False, extension="jpg")
text = output.start()

For PDF files:

output = pyostie.extract(filename, extension="pdf")
text = output.start()

For PDF files with insights:

output = pyostie.extract(filename, insights=True, extension="pdf")
text = output.start()

For Excel files

output = pyostie.extract(filename, extension="xlsx")
text = output.start()

For word files

image_folder(optional): Address where image needs to be written

output = pyostie.extract(filename, image_folder, extension="docx")
text = output.start()

For audio files

output = pyostie.extract(filename, extension="mp3")
text = output.start()

Future Works

In this version, we can only extract text from PDFs, Excel, TXT, CSV and MP3 formats. Soon, we will be adding doc, ppt, pptx, and many more. Watch this space for more updates.

Contact

Anirudh Palaparthi - @anirudh8889 - pnbbanirudh - aniruddhapnbb@gmail.com

Balaram Guddanti - Balaram Guddanti - balaram.guddanti6@gmail.com

Project Link: PYOSTIE