Read a CSV file and Parse unstructured data line by line

Hi, I’m trying to read a CSV file that contains unstructured data and get a response line by line. I have two problems:

  1. Text is in Spanish, OpenAI changes non UTF-8 characters. i.e. Nitrogeno becomes Nitr\u00f3geno.

  2. Text is in Spanish, OpenAI translate it to English and then parses it.

I’d really appreciate any help, I need the response to be in Spanish and to not change Nitrogeno to Nitr\u00f3geno.

This is the code that I’m using and the responses that I get:

import pandas as pd
import numpy as np

import os
import openai

openai.api_key = “sk-”

import csv

Open file

with open(“basex-new-joined.csv”) as file_obj:

# Create reader object by passing the file 
# object to reader method
reader_obj = csv.reader(file_obj, delimiter="~")
# Iterate over each row in the csv 
# file using reader object
for row in reader_obj: 
       response = openai.Completion.create(
         model="text-davinci-002",
         prompt= (row),
         temperature=0,
         max_tokens=1000,
         top_p=1,
         frequency_penalty=0,
         presence_penalty=0
       )
       print(response) 

This is the data in the CSV file:

“A table summarizing NOMBRE COMERCIAL: POLIQUEL CALCIO. PRODUCTO: FERTILIZANTE FOLIAR; ORIGEN: QUIMICO; GRADO: 5-0-0; COMPOSICION: NITROGENO TOTAL (N) 70,00 G/L, NITROGENO NITRICO 70, 00 G/L, CALCIO SOLUBLE EN AGUA (CAO) 187,00 G/L, MAGNESIO SOLUBLE EN AGUA (MGO) 22,00 G/L, BORO (B SOLUBLE EN AGUA) 6,70 G/L, ADITIVOS INERES: AGUA, EDTA, ACIDOS HUMICOS Y LIGNOSULFONATO DE SODIO (FUENTES: OCTABORATO DE SODIO, CLORURO DE CALCIO,NITRATO DE CALCIO Y NITRATO DE MAGNESIO), ELEMENTOS MENORES O SECUNDARIOS: CALCIO, MAGNESIO;TIPO DE ABONO: SIMPLE; DENSIDAD: 1,47G/CM3; TIPO DE EMPAQUE: TAMBOR X 200LITROS; USO ESPECIFICO: FERTILIZANTE SIMPLE PARA APLICACION AL SUELO MEDIANTE SISTEMA DE FERTIRRIEGO, SEGUN RECOMENDACIONES DE UN INGENIERO AGRONOMO, CON BASE EN EL ANALISIS DE SUELOS O DEL TEJIDO FOLIAR; USO AGRONOMICO: APLICAR EN CULTIVOS FRUTALES Y HORTICOLAS; MARCA: POLIQUEL; PRODUCTOR: GRUPO BIOQUIMICO MEXICANO S.A DE C.V, MEXICO, REGISTRO DE VENTA NO.3511, CONCEPTO DE INSUMOS NO.AI064.”
“A table summarizing PRODUCTO: FERTILIZANTE CON ZINC MANNI-PLEX ZN, ORIGEN: QUIMICO, GRADO: 3-0-0, COMPOSICION: NITROGENO TOTAL (N) 43,80 G/L, NITROGENO NITRICO (N) 43,80 G/L, ZINC TOTAL (ZN) 85,0 G/L, CARBONO ORGANICO OXIDABLE TOTAL 45,80 G/L, PH EN SOLUCION AL 10% 6,37, DENSIDAD A 20 GRADOS C 1,229 G/CM3, CONDUCTIVIDAD ELECTRICA 1:200 1,11 DS/M, SOLIDOS INSOLUBLES EN AGUA 5,33 G/L, SALMONELLA AUSENTE/25 ML, ENTEROBACTERIAS <10 UFC/ML, METALES PESADOS POR DEBAJO DE LA NORMA ACTUAL, ELEMENTOS MENORES O SECUNDARIOS: ZINC, TIPO DE ABONO: FERTILIZANTE SIMPLE N, DENSIDAD (PARA DISOLUCIONES): 1,229 G/CM3, TIPO DE EMPAQUE: FRASCO PLASTICO 1L, 2L, 3L, 5L, 10L, 20L, 50L, 100L Y 200L DE CONTENIDO NETO, USO ESPECIFICO: FERTILIZANTE SIMPLE N, PARA APLICACION AL SUELO MEDIANTE SISTEMA DE FERTIIRRIGACION SEGUN RECOMENDACIONES DE UN INGENIERO AGRONOMO, CON BASE EN EL ANALISIS DE SUELOS O DE TEJIDO FOLIAR, USO AGRONOMICO: APLICACION PARA CULTIVOS DE CAFE, BANANO, PALMA, AGUACATE, MARCA: MANNI-PLEX ZN / BRANDT ; REGISTRO DE VENTA ICA NRO (7556) FECHA 08/05/2012 CON VIGENCIA INDEFINIDA, NOS ACOGEMOS AL DECRETO 3733 DEL 2005 EXCLUSION DEL IVA, MERCANCIA NUEVA.”

Output:

{
“choices”: [
{
“finish_reason”: “stop”,
“index”: 0,
“logprobs”: null,
“text”: “\n\nNombre Comercial: Poliquel Calcio\n\nProducto: Fertilizante foliar\n\nOrigen: Qu\u00edmico\n\nGrado: 5-0-0\n\nComposici\u00f3n: Nitr\u00f3geno total (N) 70,00 g/l, nitr\u00f3geno nitrico 70, 00 g/l, calcio soluble en agua (CaO) 187,00 g/l, magnesio soluble en agua (MgO) 22,00 g/l, boro (B soluble en agua) 6,70 g/l, aditivos interes: agua, EDTA, \u00e1cidos h\u00famicos y lignosulfonato de sodio (fuentes: octaborato de sodio, cloruro de calcio, nitrato de calcio y nitrato de magnesio), elementos menores o secundarios: calcio, magnesio;\n\nTipo de abono: Simple\n\nDensidad: 1,47 g/cm3\n\nTipo de empaque: Tambor x 200 litros\n\nUso espec\u00edfico: Fertilizante simple para aplicaci\u00f3n al suelo mediante sistema de fertirriego, seg\u00fan recomendaciones de un ingeniero agronomo, con base en el an\u00e1lisis de suelos o del tejido foliar\n\nUso agron\u00f3mico: Aplicar en cultivos frutales y hort\u00edcolas\n\nMarca: Poliquel\n\nProductor: Grupo Bioqu\u00edmico Mexicano S.A de C.V, M\u00e9xico\n\nRegistro de venta no.3511, concepto de insumos no.AI064.”
}
],
“created”: 1666802727,
“id”: “cmpl-65dotV2HOoN14KRWo3gHQuylYZHtH”,
“model”: “text-davinci-002”,
“object”: “text_completion”,
“usage”: {
“completion_tokens”: 415,
“prompt_tokens”: 469,
“total_tokens”: 884
}
}
{
“choices”: [
{
“finish_reason”: “stop”,
“index”: 0,
“logprobs”: null,
“text”: “\n\nPRODUCT: FERTILIZER WITH ZINC MANNI-PLEX ZN, ORIGIN: CHEMICAL, GRADE: 3-0-0, COMPOSITION: TOTAL NITROGEN (N) 43.80 G / L, NITRIC NITROGEN (N) 43.80 G / L, TOTAL ZINC (ZN) 85.0 G / L, TOTAL OXIDABLE ORGANIC CARBON 45.80 G / L, PH IN SOLUTION AT 10% 6.37, DENSITY AT 20 DEGREES C 1.229 G / CM3, ELECTRICAL CONDUCTIVITY 1:200 1.11 DS / M, INSOLUBLE SOLIDS IN WATER 5.33 G / L, SALMONELLA ABSENT / 25 ML, ENTEROBACTERIA <10 CFU / ML, HEAVY METALS BELOW THE CURRENT STANDARD, MINOR OR SECONDARY ELEMENTS: ZINC, TYPE OF FERTILIZER: SIMPLE FERTILIZER N, DENSITY (FOR SOLUTIONS): 1.229 G / CM3, TYPE OF PACKAGING: PLASTIC BOTTLE 1L, 2L, 3L, 5L, 10L, 20L, 50L, 100L AND 200L OF NET CONTENT, SPECIFIC USE: SIMPLE FERTILIZER N, FOR APPLICATION TO THE SOIL BY FERTIIRRIGATION SYSTEM AS RECOMMENDED BY AN AGRONOMIST, BASED ON THE ANALYSIS OF SOILS OR FOLIAR TISSUE, AGRONOMIC USE: APPLICATION FOR COFFEE, BANANA, PALM, AVOCADO, BRAND: MANNI-PLEX ZN / BRANDT; SALE REGISTRY ICA NRO (7556) DATE 08/05/2012 WITH INDEFINITE VALIDITY, WE ABIDE BY DECREE 3733 OF 2005 EXCLUSION OF VAT, NEW MERCHANDISE.”
}
],
“created”: 1666802740,
“id”: “cmpl-65dp69TRsAT1NI2Ek5XzZl3bTJghl”,
“model”: “text-davinci-002”,
“object”: “text_completion”,
“usage”: {
“completion_tokens”: 446,
“prompt_tokens”: 530,
“total_tokens”: 976
}
}

Can you clarify what you are trying to achieve?
I can’t see any instructions to the AI in your prompt, what do you want it to do?

Prompt is “A table summarizing TEXT GOES HERE"

I was able to solve it by listing properties to be parsed, I think GPT-3 zeroes on languages in this order: PROMPT, then PROPERTIES.

I used these:

| PRODUCTO | NOMBRE COMERCIAL | NO CAS | CALIDAD | ASPECTO FISICO | CONCENTRACION | FARMACOPEA DE REFERENCIA | EMPAQUE/PRESENTACION COMERCIAL | USO | MARCA | NOMBRE QUIMICO | FABRICANTE | PAIS | REGISTRO SANITARIO |

Well done! (I still have no idea what you are trying to achieve :slight_smile: )

3 Likes

maybe this helpful, it was part of NASA SpaceApps challenge October 2022. We created a Corpus of information from ntrs (Nasa Techinical Report server)

“read a CSV file that contains unstructured data and get a response line by line.”

then manually pushed one line output through openai davinci.

colab Jupyter notebook link included. click an go. maybe try your csv?

maybe this helps

csv to openai

@_j regular! Where is the brother, the message is too old.