Just curious, why would you say that?
The documentation implied Retrieval would be for FAQ checking. My assumption (from your question, perhaps incorrect) is it’s designed for interpreting a question and retrieving a specific, minimally modified “answer”.
Can you give an example of what the examples look like (is there a lot of overlap?), and what you expect the user input to look like?
Sure. The purpose of the internal libraries is to minimize boilerplate. So the scripts are incredibly repetitive. User input is a json object that has two keys: the endpoint to parse, and a partial example of the endpoint get response output (not necessarily valid json). So the thread looks like something like:
User → {“endpoint”: “https://endpoint.com/api/v1”, “example”: ‘[{“name”: “a name”, “address”: “an address”, “other keys”: “values”}, {…}…]’}
System → {“script”: “import module\nimport module2\n\ndef extract_entity(…”}
running the script retrieves a get response from endpoint, then extracts the keys into an object for canonicalization. Then writes to file.
from acmerequests import AcmeRequests
from acme.record import AcmeRecord
from acme.writer import AcmeWriter
from acme.record_deduper import AcmeRecordDeduper
from acme.record_id import RecommendedRecordIds
def fetch_data():
api = "http://theurl.com/api/v2"
params = {
"page[limit]": "500",
"filter[radius]": "30000",
"filter[lat]": "55.755826",
"filter[lng]": "37.6173",
}
r = session.get(api, headers=headers, params=params)
js = r.json()["data"]
for j in js:
# entity extraction code, specific to the input record format
row = AcmeRecord(
page_url=page_url,
location_name=location_name,
location_type=location_type,
street_address=street_address,
city=city,
zip_postal=postal,
country_code=country_code,
phone=phone,
locator_domain=locator_domain,
hours_of_operation=hours_of_operation,
)
sgw.write_row(row)
if __name__ == "__main__":
page_url = f"http://theurl.com/api/v1"
log = AcmeLogSetup().get_logger(logger_name="sourcename")
headers = {
# required headers for requests
}
with AcmeRequests() as session:
with AcmeWriter(AcmeRecordDeduper(RecommendedRecordIds.Id)) as sgw:
fetch_data()
The scripts are generally relatively short. Maybe a few hundred lines.