Decoding Exported Data by Parsing conversations.json and/or chat.html

markanthonykoop · September 29, 2023, 12:46am

After exporting the chatgpt data, I would like to recreate the conversations similar to what is available in chat.html. It is not immediately clear to me how the mapping is structured in conversations.json. Is this documented somewhere?

I also tried to copy/paste chat.html into ChatGPT to see if I could get ChatGPT to figure out what the html code was doing, but I got a violation of terms acknowledgement window. Does anyone know why that might be?

markanthonykoop · September 29, 2023, 1:13am

Nevermind on part 2. It was related to having a really long line of json data apparently.

Finally got it working. Here is the conversation:

https://chat.openai.com/share/761c4e0c-b4a8-4c1b-b4bf-3ce32e9e3602

I was able to eventually get ChatGPT to get it. I didn’t have to understand much of the code at all, but it did take a few hours to navigate the straying, etc.

The code at the bottom seems to work ok.

stevesunypoly · February 6, 2024, 8:23pm

Hi, I’m working on the same thing; I had chatgpt write a python script to convert the json file into something I would work with using tiddlywiki as an importer. Any progress on this? Does anyone know if my conversations.json file is available via API?

mozahleri · February 10, 2024, 11:47pm

If you log into the site, you can export your conversations (via settings). The files are produced and a link/ url is emailed to you. Click on the link and it will download. But be aware that the conversations.json file format has recently changed . The example code here is out of date.

martylamb · February 26, 2024, 11:57am

I’m working on a tool to do the sort of thing described here (in my case, it’s to create local markdown files from an export). It’s working great so far but there’s still a little more to do. If you’d like to try it out, shoot me a message at marty at martiansoftware dot com and I’ll send you a copy as soon as it’s ready.

wijjj · July 4, 2024, 11:19am

Mate, thanks a lot! Works like a charm. I did a little polishing so one can use it easily as CLI. Also changed output name format so chat files are sorted by date when already sorted by name ( Github Gist fd351f01a5d561d433ae852fba8eca0a ):

"""
This script processes conversation data from a JSON file, extracts messages,
and writes them to text files. It also creates a summary JSON file with a summary
of the conversations. The script is designed to be run as a command-line interface (CLI),
allowing the user to specify the input JSON file and output directory.

Usage:
    python script_name.py /path/to/conversations.json /path/to/output_directory
"""

import unicodedata
import json
import re
import argparse
from datetime import datetime
from pathlib import Path


def extract_message_parts(message):
    """
    Extract the text parts from a message content.

    Args:
        message (dict): A message object.

    Returns:
        list: List of text parts.
    """
    content = message.get("content")
    if content and content.get("content_type") == "text":
        return content.get("parts", [])
    return []


def get_author_name(message):
    """
    Get the author name from a message.

    Args:
        message (dict): A message object.

    Returns:
        str: The author's role or a custom label.
    """
    author = message.get("author", {}).get("role", "")
    if author == "assistant":
        return "ChatGPT"
    elif author == "system":
        return "Custom user info"
    return author


def get_conversation_messages(conversation):
    """
    Extract messages from a conversation.

    Args:
        conversation (dict): A conversation object.

    Returns:
        list: List of messages with author and text.
    """
    messages = []
    current_node = conversation.get("current_node")
    mapping = conversation.get("mapping", {})
    while current_node:
        node = mapping.get(current_node, {})
        message = node.get("message") if node else None
        if message:
            parts = extract_message_parts(message)
            author = get_author_name(message)
            if parts and len(parts) > 0 and len(parts[0]) > 0:
                if author != "system" or message.get("metadata", {}).get(
                    "is_user_system_message"
                ):
                    messages.append({"author": author, "text": parts[0]})
        current_node = node.get("parent") if node else None
    return messages[::-1]


def create_directory(base_dir, date):
    """
    Create a directory based on the date.

    Args:
        base_dir (Path): Base output directory.
        date (datetime): The date to base the directory name on.

    Returns:
        Path: The path of the created directory.
    """
    directory_name = date.strftime("%Y_%m")
    directory_path = base_dir / directory_name
    directory_path.mkdir(parents=True, exist_ok=True)
    return directory_path


def sanitize_title(title):
    """
    Sanitize the title to create a valid file name, preserving non-ASCII characters.

    Args:
        title (str): The title of the conversation.

    Returns:
        str: Sanitized title.
    """
    title = unicodedata.normalize("NFKC", title)
    title = re.sub(r'[<>:"/\\|?*\x00-\x1F\s]', '_', title)
    return title[:140]


def create_file_name(directory_path, title, date):
    """
    Create a sanitized file name.

    Args:
        directory_path (Path): The directory where the file will be saved.
        title (str): The title of the conversation.
        date (datetime): The date to base the file name on.

    Returns:
        Path: The path of the created file.
    """
    sanitized_title = sanitize_title(title)
    return (
        directory_path / f"{date.strftime('%Y_%m_%d')}_{sanitized_title}.txt"
    )


def write_messages_to_file(file_path, messages):
    """
    Write messages to a text file.

    Args:
        file_path (Path): The path of the file to write to.
        messages (list): List of messages to write.
    """
    with file_path.open("w", encoding="utf-8") as file:
        for message in messages:
            file.write(f"{message['author']}\n")
            file.write(f"{message['text']}\n")


def update_conversation_summary(summary, directory_name, conversation, date, messages):
    """
    Update the conversation summary dictionary.

    Args:
        summary (dict): The conversation summary dictionary.
        directory_name (str): The name of the directory.
        conversation (dict): The conversation object.
        date (datetime): The updated date of the conversation.
        messages (list): List of messages in the conversation.
    """
    if directory_name not in summary:
        summary[directory_name] = []

    summary[directory_name].append(
        {
            "title": conversation.get("title", "Untitled"),
            "create_time": datetime.fromtimestamp(
                conversation.get("create_time")
            ).strftime("%Y-%m-%d %H:%M:%S"),
            "update_time": date.strftime("%Y-%m-%d %H:%M:%S"),
            "messages": messages,
        }
    )


def write_summary_json(output_dir, summary):
    """
    Write the conversation summary to a JSON file.

    Args:
        output_dir (Path): The output directory.
        summary (dict): The conversation summary to write.
    """
    summary_json_path = output_dir / "conversation_summary.json"
    with summary_json_path.open("w", encoding="utf-8") as json_file:
        json.dump(summary, json_file, ensure_ascii=False, indent=4)


def write_conversations_and_summary(conversations_data, output_dir):
    """
    Write conversation messages to text files and create a conversation summary JSON file.

    Args:
        conversations_data (list): List of conversation objects.
        output_dir (Path): Directory to save the output files.

    Returns:
        list: Information about created directories and files.
    """
    created_directories_info = []
    conversation_summary = {}

    for conversation in conversations_data:
        updated = conversation.get("update_time")
        if not updated:
            continue

        updated_date = datetime.fromtimestamp(updated)
        directory_path = create_directory(output_dir, updated_date)
        title = conversation.get("title", "Untitled")
        file_name = create_file_name(directory_path, title, updated_date)

        messages = get_conversation_messages(conversation)
        write_messages_to_file(file_name, messages)

        update_conversation_summary(
            conversation_summary,
            directory_path.name,
            conversation,
            updated_date,
            messages,
        )

        created_directories_info.append(
            {"directory": str(directory_path), "file": str(file_name)}
        )

    write_summary_json(output_dir, conversation_summary)

    return created_directories_info


def main():
    """
    Main function to parse arguments and process the conversations.
    """
    parser = argparse.ArgumentParser(
        description="Process conversation data from a JSON file."
    )
    parser.add_argument(
        "input_file", type=Path, help="Path to the input conversations JSON file."
    )
    parser.add_argument(
        "output_dir", type=Path, help="Directory to save the output files."
    )

    args = parser.parse_args()

    if not args.input_file.exists():
        print(f"Error: The input file '{args.input_file}' does not exist.")
        return

    with args.input_file.open("r", encoding="utf-8") as file:
        conversations_data = json.load(file)

    created_directories_info = write_conversations_and_summary(
        conversations_data, args.output_dir
    )

    for info in created_directories_info:
        print(f"Created {info['file']} in directory {info['directory']}")


if __name__ == "__main__":
    main()

martylamb · September 30, 2024, 11:36pm

I just released the tool I mentioned above to convert ChatGPT exports into local Markdown files. It’s called ChatKeeper, and I’d love any feedback or suggestions for improvement you might have.

Links are not allowed here but it can be found on my website at martiansoftware dot com.

leolzn619 · October 1, 2024, 3:20am

why wouldn’t you build a chrome extension type of tool?

martylamb · October 1, 2024, 11:15am

Because I wanted to act on the full conversation history, not just the current conversation, and I wanted full access to the local filesystem (which might be possible in a chrome extension but I’m not sure).

Also I don’t use chrome. And didn’t want to make something specific to one browser.

SolarBiscuit · October 11, 2024, 10:43pm

For those who are interested in using PowerShell here is a script that does the same thing:

# Prompt the user for the directory containing the conversations.json file
$inputDirectory = Read-Host "Please enter the directory where conversations.json is located"

# Define the function to get conversation messages
function Get-ConversationMessages {
    param ($conversation)

    $messages = @()
    $currentNode = $conversation.current_node
    $mapping = $conversation.mapping

    while ($currentNode) {
        $node = $mapping.$currentNode
        $message = $node.message
        $content = $message.content
        $author = if ($message.author) { $message.author.role } else { "" }

        if ($content -and $content.content_type -eq "text") {
            $parts = $content.parts
            if ($parts.Count -gt 0 -and $parts[0].Length -gt 0) {
                if ($author -ne "system" -or ($message.metadata).is_user_system_message) {
                    if ($author -eq "assistant") { $author = "ChatGPT" }
                    elseif ($author -eq "system") { $author = "Custom user info" }
                    
                    $messages += [pscustomobject]@{
                        Author = $author
                        Text   = $parts[0]
                    }
                }
            }
        }

        $currentNode = $node.parent
    }

    return $messages | Sort-Object -Descending
}

# Define the function to write conversations and create pruned.json
function Write-ConversationsAndJson {
    param ($conversationsData)

    # Get the directory where the script is saved
    $outputDirectory = $PSScriptRoot

    $createdDirectoriesInfo = @()
    $prunedData = @{}

    foreach ($conversation in $conversationsData) {
        $updated = $conversation.update_time
        if (-not $updated) { continue }

        # Convert Unix timestamp to DateTime
        $updatedDate = Get-Date ([DateTimeOffset]::FromUnixTimeSeconds($updated).DateTime)
        
        $directoryName = $updatedDate.ToString("MMMM_yyyy")
        $directoryPath = Join-Path $outputDirectory $directoryName

        if (-not (Test-Path $directoryPath)) {
            New-Item -Path $directoryPath -ItemType Directory | Out-Null
        }

        $title = if ($conversation.title) { $conversation.title } else { "Untitled" }
        $sanitizedTitle = $title -replace "[^a-zA-Z0-9_]", "_" -replace "^.{120}", "$&"
        $fileName = "$directoryPath/$sanitizedTitle_$($updatedDate.ToString('dd_MM_yyyy_HH_mm_ss')).txt"

        $messages = Get-ConversationMessages $conversation
        $messageContent = $messages | ForEach-Object { "$($_.Author)`n$($_.Text)`n" }

        Set-Content -Path $fileName -Value $messageContent -Encoding UTF8

        if (-not $prunedData[$directoryName]) {
            $prunedData[$directoryName] = @()
        }

        $prunedData[$directoryName] += @{
            Title       = $title
            Create_Time = (Get-Date ([DateTimeOffset]::FromUnixTimeSeconds($conversation.create_time).DateTime)).ToString("yyyy-MM-dd HH:mm:ss")
            Update_Time = $updatedDate.ToString("yyyy-MM-dd HH:mm:ss")
            Messages    = $messages
        }

        $createdDirectoriesInfo += @{
            Directory = $directoryPath
            File      = $fileName
        }
    }

    $jsonPrunedData = $prunedData | ConvertTo-Json -Depth 4 -Compress
    Set-Content -Path (Join-Path $outputDirectory "pruned.json") -Value $jsonPrunedData -Encoding UTF8

    return $createdDirectoriesInfo
}

# Load conversations.json data after prompting the user for the directory
$conversationFilePath = Join-Path $inputDirectory "conversations.json"

if (Test-Path $conversationFilePath) {
    $conversationData = Get-Content $conversationFilePath -Raw | ConvertFrom-Json
    $createdDirectoriesInfo = Write-ConversationsAndJson -conversationsData $conversationData
    Write-Host "Processing complete. Files saved in: $PSScriptRoot"
} else {
    Write-Host "Error: conversations.json not found in the provided directory."
}

martylamb · October 25, 2024, 12:50pm

That’s a cool PowerShell script and although I haven’t tried it, it looks useful for some basic use. Reading through it, however, I can confidently say it doesn’t do the same thing as my software. But it might be a great alternative for some folks using Windows. The more options for backing up and having useful local copies of our conversations, the better.

I actually came here to say that there’s a beta version of ChatKeeper (at martiansoftware dot com) available now for Mac users in case anyone was disappointed not to see Mac support earlier. I hope it’s helpful to folks.

martylamb · November 24, 2024, 11:20pm

I’ve been working on some new features and improvements that I’d love for folks to try out in a new release candidate for ChatKeeper version 1.1.0.

Rendering to Markdown any conversations that use the new Canvas and Search features requires a lot of json processing that even OpenAI’s own export viewer (html and javascript included in the export) doesn’t do.

I’m not aware of any current issues with this version, but I’m still finalizing testing for the official 1.1.0 release… so it might have issues I haven’t found yet. So while I’m testing I’m also very interested to hear how it works for you.

Here’s What’s New in ChatKeeper 1.1.0-rc.1

Support for ChatGPT’s new Canvas feature: You can now save and manage your Canvas sessions with ChatKeeper. Please note that there are some known issues with ChatGPT’s export format for Canvas chats. I’ve reported these to OpenAI and implemented workarounds where possible.
Support for ChatGPT’s new Search feature: ChatKeeper now formats ChatGPT’s search summaries and sources in Markdown.
Native Apple Silicon Support: ChatKeeper now runs natively on Apple Silicon (m1/m2/m3/m4). No more need for Rosetta complications.
Official Homebrew Installer: Mac users now have an easy way to install and update on their systems. The correct binary will be automatically installed for your platform.
Conversation Index by Start Date: A new index document organizes your conversations by their start dates for another navigation option (in addition to the previously existing index by last activity).
Numbered Conversation Turns: Conversation “turns” now have numbered Markdown headings in order to enable linking to specific messages within conversations.
Version Information Display: The ChatKeeper version is now included in both YAML front matter and in more user messages in both Markdown and the CLI.

- Marty

jack.northrup.ph · December 6, 2024, 9:45pm

#!/home/jack/miniconda3/envs/cloned_base/bin/python
import json
import logging
import os
import glob
import subprocess
import os
import string
‘’’
create working directory in it a directory called CHATGPT
in CHATGPT folder unzip the downloaded zip file of chatgpt dataset
run this script from the project folder it will create a three folders called:
directory1 = ‘CHATGPT/JSON’
make_path_exist(directory1)
directory2 = ‘CHATGPT/HTML’
make_path_exist(directory2)
directory3 = ‘CHATGPT/TEXT’
make_path_exist(directory3)
it will convert the data into three forms txt, html and json
it will create a database called CHATGPT_files.db
it will create a table called files
it will insert the data into the table
it will close the connection
‘’’
def clean_title(title):
valid_chars = set(string.ascii_letters + string.digits + string.whitespace)
cleaned_title = ‘’.join(char if char in valid_chars else ‘’ for char in title)
cleaned_title = cleaned_title.replace(’ ', '’) # Replace spaces with underscores
return cleaned_title.strip()

make a function tooocreate folder if it doesn’t exist

‘’’
This code defines a function make_path_exist that takes a directory path as input and creates the directory if it does not already exist. It then calls this function three times with different directory names.
‘’’
def make_path_exist(directory):
path = os.path.join(os.getcwd(), directory)
if not os.path.exists(path):
os.makedirs(path)

def split_and_save_and_convert(conversations_file):
directory1 = ‘CHATGPT/JSON’
make_path_exist(directory1)
directory2 = ‘CHATGPT/HTML’
make_path_exist(directory2)
directory3 = ‘CHATGPT/TEXT’
make_path_exist(directory3)
try:
with open(conversations_file, ‘r’, encoding=‘utf-8’) as file:
data = json.load(file)

        for conversation in data:
            title = conversation.get('title', 'Unknown_Title')
            title_with_underscores = clean_title(title)
            chapter_filename = f"{title_with_underscores}.json"
            chapter_filepath = os.path.join(directory1, chapter_filename)
            
            logging.info(f"Saving data for conversation '{title}' to {chapter_filepath}")
            
            with open(chapter_filepath, 'w', encoding='utf-8') as chapter_file:
                json.dump([conversation], chapter_file, indent=2)

            # Convert JSON to HTML
            html_output_file = os.path.join(directory2, f"{title_with_underscores}.html")
            convert_to_html(chapter_filepath, html_output_file)

            # Convert JSON to TXT
            txt_output_file = os.path.join(directory3, f"{title_with_underscores}.txt")
            convert_to_txt(chapter_filepath, txt_output_file)

except FileNotFoundError:
    logging.error(f"File not found: {conversations_file}")
except json.JSONDecodeError:
    logging.error(f"Error decoding JSON in file: {conversations_file}")
except Exception as e:
    logging.error(f"An unexpected error occurred: {e}")

def convert_to_html(json_file, html_output_file):
with open(json_file, ‘r’, encoding=‘utf-8’) as file:
json_data = json.load(file)

result_str = get_conversation_result(json_data)

with open(html_output_file, "w", encoding='utf-8') as html_output:
    result_html = result_str.replace("/n", "XXXXXXX\n")
    result_html = result_html.replace("<", "&lt;")
    result_html = result_html.replace(">", "&gt;")
    for line in result_html.split("XXXXXXX"):
        line = line.replace("\n", "<br />\n")
        html_output.write(line)

def convert_to_txt(json_file, txt_output_file):
with open(json_file, ‘r’, encoding=‘utf-8’) as file:
json_data = json.load(file)

result_str = get_conversation_result(json_data)

with open(txt_output_file, "w", encoding='utf-8') as txt_output:
    result_txt = result_str.replace("/n", "XXXXXXX\n")
    for line in result_txt.split("XXXXXXX"):
        txt_output.write(line)

def get_conversation_result(json_data):
result_str = “”
for conversation in json_data:
title = conversation.get(‘title’, ‘’)
messages = get_conversation_messages(conversation)

    result_str += title + '\n'
    for message in messages:
        result_str += message['author'] + '\n' + message['text'] + '\n'
    result_str += '\n'

return result_str

def get_conversation_messages(conversation):
messages =
current_node = conversation.get(‘current_node’)
while current_node:
node = conversation[‘mapping’][current_node]
message = node.get(‘message’)
if (message and message.get(‘content’) and message[‘content’].get(‘content_type’) == ‘text’ and
len(message[‘content’].get(‘parts’, )) > 0 and len(message[‘content’][‘parts’][0]) > 0 and
(message[‘author’][‘role’] != ‘system’ or message.get(‘metadata’, {}).get(‘is_user_system_message’))):
author = message[‘author’][‘role’]
if author == ‘assistant’:
author = ‘ChatGPT’
elif author == ‘system’ and message[‘metadata’].get(‘is_user_system_message’):
author = ‘Custom user info’
messages.append({‘author’: author, ‘text’: message[‘content’][‘parts’][0]})
current_node = node.get(‘parent’)
return messages[::-1]

Example usage

conversations_file_path = ‘CHATGPT/conversations.json’
#output_folder = ‘CHATDPT/output_txt_html_json’

Ensure the output folder exists

#os.makedirs(output_folder, exist_ok=True)

Configure logging

logging.basicConfig(level=logging.INFO)

Call the split, save, and convert function

split_and_save_and_convert(conversations_file_path)
import sqlite3
import os
import hashlib

Connect to SQLite database (creates a new database if it doesn’t exist)

db_path2 = ‘CHATGPT_files.db’
conn = sqlite3.connect(db_path2)
cursor = conn.cursor()

Create a table to store file information

cursor.execute(‘’’
CREATE TABLE IF NOT EXISTS files (
id INTEGER PRIMARY KEY,
filename TEXT NOT NULL,
content BLOB NOT NULL,
text_content TEXT NOT NULL,
hash_value TEXT NOT NULL,
format TEXT NOT NULL
)
‘’')

Commit changes and close the connection

conn.commit()
conn.close()

Function to calculate SHA-256 hash of a file

def calculate_hash(file_path):
sha256 = hashlib.sha256()
with open(file_path, ‘rb’) as file:
while chunk := file.read(8192): # Read in 8KB chunks
sha256.update(chunk)
return sha256.hexdigest()

Function to insert a file into the database

def insert_file(filename, content, text_content, hash_value, file_format):
conn = sqlite3.connect(db_path2)
cursor = conn.cursor()
cursor.execute(‘INSERT INTO files (filename, content, text_content, hash_value, format) VALUES (?, ?, ?, ?, ?)’,
(filename, content, text_content, hash_value, file_format))
conn.commit()
conn.close()

Function to insert HTML files recursively

def insert_text_files(directory):
for filename in os.listdir(directory): # Corrected variable name
if filename.endswith(‘.txt’):
file_path = os.path.join(directory, filename) # Construct full file path
with open(file_path, ‘rb’) as file:
print(file_path)
file_content = file.read()

        text_content = file_content.decode('utf-8', errors='ignore')  # Convert bytes to string
        hash_value = calculate_hash(file_path)
        insert_file(filename, file_content, text_content, hash_value, 'txt')  # Corrected insertion
        print(f"Inserted: {filename}")

Example: Insert HTML files recursively from the specified directory

input_folder = ‘CHATGPT/TEXT’
insert_text_files(input_folder)

print(‘Insertion process completed.’)
#---------------------------------------------------
def clean_title(title):
valid_chars = set(string.ascii_letters + string.digits + string.whitespace)
cleaned_title = ‘’.join(char if char in valid_chars else ‘’ for char in title)
cleaned_title = cleaned_title.replace(’ ', '’) # Replace spaces with underscores
return cleaned_title.strip()

make a function tooocreate folder if it doesn’t exist

‘’’
This code defines a function make_path_exist that takes a directory path as input and creates the directory if it does not already exist. It then calls this function three times with different directory names.
‘’’
def make_path_exist(directory):
path = os.path.join(os.getcwd(), directory)
if not os.path.exists(path):
os.makedirs(path)
def split_and_save_and_convert(conversations_file):
directory1 = ‘CHATGPT/JSON’
make_path_exist(directory1)
directory2 = ‘CHATGPT/HTML’
make_path_exist(directory2)
directory3 = ‘CHATGPT/TEXT’
make_path_exist(directory3)
try:
with open(conversations_file, ‘r’, encoding=‘utf-8’) as file:
data = json.load(file)

        for conversation in data:
            title = conversation.get('title', 'Unknown_Title')
            title_with_underscores = clean_title(title)
            chapter_filename = f"{title_with_underscores}.json"
            chapter_filepath = os.path.join(directory1, chapter_filename)
            
            logging.info(f"Saving data for conversation '{title}' to {chapter_filepath}")
            
            with open(chapter_filepath, 'w', encoding='utf-8') as chapter_file:
                json.dump([conversation], chapter_file, indent=2)

            # Convert JSON to HTML
            html_output_file = os.path.join(directory2, f"{title_with_underscores}.html")
            convert_to_html(chapter_filepath, html_output_file)

            # Convert JSON to TXT
            txt_output_file = os.path.join(directory3, f"{title_with_underscores}.txt")
            convert_to_txt(chapter_filepath, txt_output_file)

except FileNotFoundError:
    logging.error(f"File not found: {conversations_file}")
except json.JSONDecodeError:
    logging.error(f"Error decoding JSON in file: {conversations_file}")
except Exception as e:
    logging.error(f"An unexpected error occurred: {e}")

def convert_to_html(json_file, html_output_file):
with open(json_file, ‘r’, encoding=‘utf-8’) as file:
json_data = json.load(file)

result_str = get_conversation_result(json_data)

with open(html_output_file, "w", encoding='utf-8') as html_output:
    result_html = result_str.replace("/n", "XXXXXXX\n")
    result_html = result_html.replace("<", "&lt;")
    result_html = result_html.replace(">", "&gt;")
    for line in result_html.split("XXXXXXX"):
        line = line.replace("\n", "<br />\n")
        html_output.write(line)

def convert_to_txt(json_file, txt_output_file):
with open(json_file, ‘r’, encoding=‘utf-8’) as file:
json_data = json.load(file)

result_str = get_conversation_result(json_data)

with open(txt_output_file, "w", encoding='utf-8') as txt_output:
    result_txt = result_str.replace("/n", "XXXXXXX\n")
    for line in result_txt.split("XXXXXXX"):
        txt_output.write(line)

def get_conversation_result(json_data):
result_str = “”
for conversation in json_data:
title = conversation.get(‘title’, ‘’)
messages = get_conversation_messages(conversation)

    result_str += title + '\n'
    for message in messages:
        result_str += message['author'] + '\n' + message['text'] + '\n'
    result_str += '\n'

return result_str

def get_conversation_messages(conversation):
messages =
current_node = conversation.get(‘current_node’)
while current_node:
node = conversation[‘mapping’][current_node]
message = node.get(‘message’)
if (message and message.get(‘content’) and message[‘content’].get(‘content_type’) == ‘text’ and
len(message[‘content’].get(‘parts’, )) > 0 and len(message[‘content’][‘parts’][0]) > 0 and
(message[‘author’][‘role’] != ‘system’ or message.get(‘metadata’, {}).get(‘is_user_system_message’))):
author = message[‘author’][‘role’]
if author == ‘assistant’:
author = ‘ChatGPT’
elif author == ‘system’ and message[‘metadata’].get(‘is_user_system_message’):
author = ‘Custom user info’
messages.append({‘author’: author, ‘text’: message[‘content’][‘parts’][0]})
current_node = node.get(‘parent’)
return messages[::-1]

Example usage

conversations_file_path = ‘CHATGPT/conversations.json’
#output_folder = ‘CHATDPT/output_txt_html_json’

Ensure the output folder exists

#os.makedirs(output_folder, exist_ok=True)

Configure logging

logging.basicConfig(level=logging.INFO)

Call the split, save, and convert function

#split_and_save_and_convert(conversations_file_path)
json_file=‘CHATGPT/conversations.json’
txt_output_file =“conversations_2_text.txt”
convert_to_txt(json_file, txt_output_file)

Insert = open(“conversations.txt”,“a”)
with open(“conversations_2_text.txt”,“r”) as data:
lines = data.read()
line = lines.replace(“user\n”,“CHAT_DIALOGUEuser\n”)
Insert.write(line)
import sqlite3
import logging

Configure logging

logging.basicConfig(level=logging.DEBUG)

def connect_to_database(database_name):
“”"
Connect to the SQLite database.

Args:
    database_name (str): The name of the SQLite database file.
Returns:
    sqlite3.Connection or None: The database connection or None if connection fails.
"""
try:
    conn = sqlite3.connect(database_name)
    logging.info("Connected to the database successfully.")
    return conn
except Exception as e:
    logging.error(f"Failed to connect to the database: {e}")
    return None

def create_table(conn):
“”"
Create the dialogue table in the database.
Args:
conn (sqlite3.Connection): The SQLite database connection.
“”"
try:
if conn:
c = conn.cursor()
c.execute(‘’‘CREATE TABLE IF NOT EXISTS dialogue (
id INTEGER PRIMARY KEY,
user_ChatGPT_PAIR TEXT,
user_ChatGPT_PAIRb BLOB
)’‘’)
conn.commit()
logging.info(“Table ‘dialogue’ created successfully.”)
except Exception as e:
logging.error(f"Failed to create table ‘dialogue’: {e}“)
def insert_dialogue(conn, dialogue_data):
“””
Insert dialogue data into the database.
Args:
conn (sqlite3.Connection): The SQLite database connection.
dialogue_data (str): The dialogue data to insert into the database.
“”"
try:
if conn:
c = conn.cursor()
c.execute(“INSERT INTO dialogue (user_ChatGPT_PAIR, user_ChatGPT_PAIRb) VALUES (?,?)”, (dialogue_data,dialogue_data.encode(‘utf-8’),))
conn.commit()
logging.info(“Dialogue inserted into the database successfully.”)
except Exception as e:
logging.error(f"Failed to insert dialogue into the database: {e}")

Define the file path

file_path = ‘conversations.txt’

Read the file and insert dialogue into the database

try:
with open(file_path, “r”) as file:
file_contents = file.read()
dialogue_parts = file_contents.split(“CHAT_DIALOGUE”)
conn = connect_to_database(‘dialogueEXP2.db’)
if conn:
create_table(conn)
for dialogue_part in dialogue_parts:
insert_dialogue(conn, dialogue_part.strip())
print(“.”, end=“-”)
conn.close()
except Exception as e:
logging.error(f"An error occurred while reading or processing the file: {e}")

martylamb · December 17, 2024, 12:43pm

@jack.northrup.ph - nice. You’ll find it gets much more involved with conversations that take advantage of newer features, like canvas and search. And I have no idea what all of their new features announced this month will entail.

I’ve done a bunch of reverse engineering of their format and intend to keep up with it, so I have some work ahead of me with all of this month’s announcements. It should be fun to figure out.

In the meantime, ChatKeeper has graduated to its 1.1.0 release for anyone interested in checking it out after a quick download.

If you have a github repo or other location I can link to for what you posted, I am planning to add an “alternatives” list to the ChatKeeper site. If you don’t then I can just link to your message. Shoot me an email if you like (email is on the chatkeeper site). That goes for anyone else with an alternative too.

- Marty

spectrenode · March 5, 2025, 11:24pm

Wow, this works amazingly well! Thank you!

martylamb · April 7, 2025, 12:37pm

Along the lines of converting export data into more useful local files, ChatKeeper 1.2.0 is now available.

ChatKeeper converts your entire conversation history to local markdown files. You can rename or reorganize them however you like, and ChatKeeper will find and update them in place if you run it again later on an updated export.

New in this version:

Exporting Generated Images, both from DALL-E and the new GPT4o image generator. You can also move or rename these and ChatKeeper will identify them automatically on subsequent runs.
Handling Unzipped Exports (since Safari seems to unzip them by default…)
Lots of little things, like supporting Tasks, handling citations in Deep Research results, and other miscellaneous enhancements and smoothed rough edges.

Full details are on the ChatKeeper Download Page.

- Marty

cooliojones · April 30, 2025, 12:45pm

This was very helpful. VERY helpful. Thank you so much, and for me it worked pretty much on the first try! For someone who isn’t a deep coder like some of you may be. Thank you!

martylamb · May 21, 2025, 10:58am

Hello all, I have a quick update for anyone interested in ChatKeeper.

ChatKeeper 1.2.2 is now available for download.

This version adds “Chain of Thought” to the markdown output. This is the text that you might see flashing by while it’s “Reasoning”, followed by something like “Thought for 15 seconds.”

The markdown ends up looking something like this in Obsidian (but it’s just markdown and will work with any renderer):

Handling this chain of thought also eliminates any warning messages about “unhandled message types” for the types “thoughts” and “reasoning_recap”. (Note to self: I really need to make those warning messages friendlier.)

There are some minor bug fixes in this version as well, plus a change for better handling of conversations that use ChatGPT’s Canvas.

I read and respond to all feedback, so if you give it a try, please don’t hesitate to reach out and share your thoughts…

- Marty

capitalmind · June 13, 2025, 5:50am

I found this after creating something with a shell script. I’ll factor this in to the next revision but results are good so far. Check out GitHub - Capitalmind/chatgpt_search: Shell script with html interface to query chat history

58f21ab22ca08302c082 · June 15, 2025, 12:26am

Absolutely awesome! Thank you so very much!

Topic		Replies	Views
Best way to interact with PDF 2025 API chatgpt , api , pdf , assistants-api	47	11958	May 18, 2025
How do you maintain historical context in repeat API calls? API	29	94091	December 23, 2023
GPT-3.5-turbo how to remember previous messages like Chat-GPT website API	34	100869	December 12, 2023
Codex CLI programming game in Godot Coding with ChatGPT codex-cli	20	2229	November 5, 2025
Poor quality response on trained LLM with pdf files Community gpt-4	29	6811	May 1, 2024

Decoding Exported Data by Parsing conversations.json and/or chat.html

Here’s What’s New in ChatKeeper 1.1.0-rc.1

make a function tooocreate folder if it doesn’t exist

Example usage

Ensure the output folder exists

Configure logging

Call the split, save, and convert function

Connect to SQLite database (creates a new database if it doesn’t exist)

Create a table to store file information

Commit changes and close the connection

Function to calculate SHA-256 hash of a file

Function to insert a file into the database

Function to insert HTML files recursively

Example: Insert HTML files recursively from the specified directory

make a function tooocreate folder if it doesn’t exist

Example usage

Ensure the output folder exists

Configure logging

Call the split, save, and convert function

Configure logging

Define the file path

Read the file and insert dialogue into the database

Related topics