Types of text extraction

Unstructured Data

Raw Text

  • Extracting plain text from documents or web pages without any specific formatting.

Example: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Words or Sentences Extraction

  • Identifying and extracting specific words or sentences from unstructured text.

Example: Extracted sentence: “consectetur adipiscing elit.”

Image Data

Extract Text from Images

  • Using OCR (Optical Character Recognition) to extract text from images.

Example:

an image of Purdue gate

Extracted text: “Purdue University.”

Structured Data

Tabular Data

  • Extracting structured data from tables, such as banking records or employment records.

Example:

DateDescriptionAmount
2023-01-01Salary Deposit$5000
2023-01-05Rent Payment$1200
2023-01-10Utilities Payment$200

Forms Data

  • Reading and extracting structured data from forms.

Example: Form Data:

  • Name: John Doe
  • Age: 30
  • Address: 123 Main Street
  • City: Anytown
  • State: NY

Extracted Data: { “Name”: “John Doe”, “Age”: 30, “Address”: “123 Main Street, Anytown, NY” }

Conclusion

Effective text extraction techniques are essential for transforming raw data into actionable insights. By understanding the different types of data and the corresponding extraction methods, you can leverage text extraction to improve data analysis and decision-making processes.