Automated Zenodo Publishing via the Legacy Deposit API | Research

This article documents the automated publishing workflow used to deploy updates of the Comprehensive Modular Forms Preprint to Zenodo entirely via API calls — no browser uploads, no manual form filling.

Motivation

The Riemann Project preprint underwent three revisions in a single day: initial publication, a table formatting improvement, and an author line correction. Each revision required a same-DOI update (via Zenodo's new version system). Doing this through the web UI would have been error-prone and slow. An automated workflow meant each revision cost ~30 seconds of script execution instead of 5 minutes of clicking through forms.

The DOI Deadlock (or: Why Not Use the Modern API)

Zenodo runs InvenioRDM v14, which ships a REST API under /api/records/{id}/draft. The intended workflow is:

Create a draft via the browser or API
PUT /api/records/{id}/draft with metadata
POST /api/records/{id}/draft/actions/publish

This fails in a subtle but unrecoverable way. When the browser creates a draft, InvenioRDM auto-reserves a DOI in the backend. But PUT /api/records/{id}/draft replaces the entire draft resource — including the metadata — wiping the DOI reference. The PID remains registered in the backend, so re-reserving it fails with "A PID already exists", while publishing fails with "Missing DOI for required field". The draft is corrupted and must be deleted.

The legacy deposit API (/api/deposit/depositions) avoids this entirely: it returns a prereserve_doi on creation that remains stable across subsequent PUT calls.

Authentication

Zenodo offers Bearer token auth, but for quick iteration I used session cookies obtained from a browser debugger. The cookies are passed as a Netscape-format file:

# Netscape HTTP Cookie File
zenodo.org	FALSE	/	TRUE	1780692402	session	<session_token>
zenodo.org	FALSE	/	TRUE	1780472171	csrftoken	<csrf_token>

Two critical headers are required alongside cookies:

X-CSRFToken: the csrftoken value (CSRF protection)
Referer: https://zenodo.org/ (origin validation)

Workflow

The script follows five steps:

[POST] Create new version        → /api/deposit/depositions/{id}/actions/newversion
[DELETE] Remove old PDF          → /api/deposit/depositions/{new_id}/files/{file_id}
[POST] Upload new PDF            → /api/deposit/depositions/{new_id}/files
[PUT] Update metadata            → /api/deposit/depositions/{new_id} (with JSON body)
[POST] Publish                   → /api/deposit/depositions/{new_id}/actions/publish

Step 1: Create New Version

r = session.post(f"{BASE}/deposit/depositions/{DEPOSIT_ID}/actions/newversion")
new_deposit = r.json()
new_id = new_deposit["id"]

The response returns a new draft ID. The old published record remains accessible — the new version will link to it via the shared conceptdoi.

Step 2: Delete Old File

files = session.get(f"{BASE}/deposit/depositions/{new_id}/files").json()
for f in files:
    requests.delete(f"{BASE}/deposit/depositions/{new_id}/files/{f['id']}",
        cookies=cookies,
        headers={"X-CSRFToken": csrf, "Referer": "https://zenodo.org/"})

Zenodo's new version copies all files from the previous version. We must delete them before uploading replacements (there is no file replacement endpoint — only add and delete).

Step 3: Upload New PDF

with open("paper.pdf", "rb") as fh:
    r = requests.post(f"{BASE}/deposit/depositions/{new_id}/files",
        cookies=cookies,
        headers={"X-CSRFToken": csrf, "Referer": "https://zenodo.org/"},
        files={"file": ("paper.pdf", fh)})

Note the use of requests.post (not the session object) for file uploads. The session object's content-type header (application/json) conflicts with multipart form data.

Step 4: Update Metadata

r = session.get(f"{BASE}/deposit/depositions/{new_id}")
current = r.json()
meta = current.get("metadata", {})

meta["title"] = "Your Title"
meta["creators"] = [{"name": "Weiss, Tobias", "affiliation": "Independent"}]
meta["upload_type"] = "publication"
meta["publication_type"] = "preprint"
meta["access_right"] = "open"
meta["license"] = "CC-BY-4.0"

session.put(f"{BASE}/deposit/depositions/{new_id}", json={"metadata": meta})

The PUT call preserves the prereserved DOI — this is why we use the legacy API and not the v14 draft API.

Step 5: Publish

r = session.post(f"{BASE}/deposit/depositions/{new_id}/actions/publish")
# Returns 202 Accepted with state="done" on success

The response includes the final DOI and public URL.

Complete Working Script

import requests

BASE = "https://zenodo.org/api"
DEPOSIT_ID = "20479512"  # initial deposit ID
COOKIES = {
    "session": "<your_session>",
    "csrftoken": "<your_csrftoken>"
}
CSRF = COOKIES["csrftoken"]
HEADERS = {"X-CSRFToken": CSRF, "Referer": "https://zenodo.org/"}

# 1. New version
r = requests.post(f"{BASE}/deposit/depositions/{DEPOSIT_ID}/actions/newversion",
    cookies=COOKIES, headers=HEADERS)
new_id = r.json()["id"]

# 2. Delete old files
for f in requests.get(f"{BASE}/deposit/depositions/{new_id}/files",
                      cookies=COOKIES, headers=HEADERS).json():
    requests.delete(f"{BASE}/deposit/depositions/{new_id}/files/{f['id']}",
                    cookies=COOKIES, headers=HEADERS)

# 3. Upload new file
with open("paper.pdf", "rb") as fh:
    requests.post(f"{BASE}/deposit/depositions/{new_id}/files",
        cookies=COOKIES, headers=HEADERS,
        files={"file": ("paper.pdf", fh)})

# 4. Update metadata
meta = requests.get(f"{BASE}/deposit/depositions/{new_id}",
    cookies=COOKIES, headers=HEADERS).json()["metadata"]
meta.update({
    "title": "Your Title",
    "creators": [{"name": "Weiss, Tobias", "affiliation": "Independent"}],
    "upload_type": "publication",
    "publication_type": "preprint",
    "license": "CC-BY-4.0",
})
requests.put(f"{BASE}/deposit/depositions/{new_id}",
    cookies=COOKIES, headers=HEADERS, json={"metadata": meta})

# 5. Publish
r = requests.post(f"{BASE}/deposit/depositions/{new_id}/actions/publish",
    cookies=COOKIES, headers=HEADERS)
print(f"Published: {r.json()['doi_url']}")

Results

Revision	DOI	Filesize	What Changed
v1	10.5281/zenodo.20479512	195 KB	Initial publication
v2	10.5281/zenodo.20479919	221 KB	Author line fix, booktabs table formatting
v3	10.5281/zenodo.20480198	221 KB	Final author line cleanup

All three revisions share the concept DOI: 10.5281/zenodo.20479511. The latest version is always served at the concept DOI URL, while individual DOIs remain permanently accessible.

Lessons Learned

Use the legacy API. The /api/deposit/depositions endpoints are stable and the prereserve DOI mechanism works. The v14 /api/records/{id}/draft API has an unrecoverable DOI deadlock.
Session cookies expire. Zenodo session tokens have a limited lifetime. For production CI/CD, use a personal access token instead.
New versions copy all files. Always delete old files before uploading new ones. There is no file replacement endpoint.
Use requests.post for file uploads, not the session object. The Content-Type: application/json header on the session object conflicts with multipart form data.
Delete corrupted drafts via the legacy API. If a draft enters the DOI deadlock state, use DELETE /api/deposit/depositions/{id} with the same cookie auth to clean it up.

Future Work

CI/CD Integration

The most natural next step is wrapping the publish script into a GitHub Actions (or Forgejo Actions) workflow. On every push of a release tag (v*), the workflow would:

Build the PDF from the latest markdown source
Create a new Zenodo version
Upload, update metadata, and publish
Optionally attach the git tag as a Zenodo version identifier

This eliminates the remaining manual step of running a script with session cookies. A Zenodo personal access token stored as a repository secret replaces cookie authentication entirely.

BrowserMCP Cookie Extraction

The current workflow requires manually copying session and csrftoken cookies from the browser developer tools on every run. A BrowserMCP integration could automate this step entirely:

Launch a browser session via MCP (Playwright-based)
Navigate to zenodo.org (using existing login session)
Extract cookies from the browser context via page.evaluate(() => document.cookie) or the CDP Network.getCookies method
Pass the cookies directly to the Python publishing script

This eliminates the most fragile manual step. The MCP server running on the host machine can also handle the file upload — BrowserMCP has access to local files, so it can read the PDF and POST it to the Zenodo API directly without a separate Python script.

For Headless CI environments, BrowserMCP can authenticate via the Zenodo login page (email + password) and extract the resulting session cookies programmatically, making the entire flow unattended.

Bearer Token Migration

Session cookies were expedient but fragile (they expire, and the CSRF token must be refreshed). The Zenodo API supports Bearer token authentication via Authorization: Bearer <token>. A production pipeline should:

Generate a token from the Zenodo settings page
Store it in environment variables or a CI/CD secret
Use requests.Session(headers={"Authorization": f"Bearer {token}"}) — no cookies, no CSRF headers

The token scopes can be limited to deposit write access, reducing the security surface compared to a full session cookie.

Community & Collection Integration

Zenodo supports community collections that group related records. A future version of the workflow could:

Auto-submit records to a curated community (e.g., "Mathematics" or "Machine Learning")
Tag records with community-specific metadata fields
This would improve discoverability without manual curation after each upload

Multi-Asset Deposits

Currently only the PDF is uploaded. Many computational papers warrant supplementary material:

Training datasets (LMFDB-derived Hecke traces)
Model checkpoints (trained GNN weights)
Reproducibility notebooks (Jupyter/Colab)
The Zenodo API supports multiple files per deposit via repeated POST calls to the files endpoint

Monitoring & Email Notification

The publish endpoint returns 202 Accepted, meaning publication is asynchronous. A future enhancement could poll the record state and send a notification (email, Slack webhook, or Matrix message) once the DOI resolves. This is particularly useful for batch updates across multiple deposits.

Preprint Server Integration

Zenodo DOIs are increasingly accepted by preprint servers like arXiv, HAL, and OSF. A natural extension is a cross-posting workflow that:

Publishes to Zenodo (DOI generation)
Submits the same PDF and abstract to a preprint server
Links the Zenodo DOI in the preprint metadata

References

Motivation

The DOI Deadlock (or: Why Not Use the Modern API)

Zenodo runs InvenioRDM v14, which ships a REST API under /api/records/{id}/draft. The intended workflow is:

Create a draft via the browser or API
PUT /api/records/{id}/draft with metadata
POST /api/records/{id}/draft/actions/publish

The legacy deposit API (/api/deposit/depositions) avoids this entirely: it returns a prereserve_doi on creation that remains stable across subsequent PUT calls.

Authentication

Zenodo offers Bearer token auth, but for quick iteration I used session cookies obtained from a browser debugger. The cookies are passed as a Netscape-format file:

# Netscape HTTP Cookie File
zenodo.org	FALSE	/	TRUE	1780692402	session	<session_token>
zenodo.org	FALSE	/	TRUE	1780472171	csrftoken	<csrf_token>

Two critical headers are required alongside cookies:

X-CSRFToken: the csrftoken value (CSRF protection)
Referer: https://zenodo.org/ (origin validation)

Workflow

The script follows five steps:

[POST] Create new version        → /api/deposit/depositions/{id}/actions/newversion
[DELETE] Remove old PDF          → /api/deposit/depositions/{new_id}/files/{file_id}
[POST] Upload new PDF            → /api/deposit/depositions/{new_id}/files
[PUT] Update metadata            → /api/deposit/depositions/{new_id} (with JSON body)
[POST] Publish                   → /api/deposit/depositions/{new_id}/actions/publish

Step 1: Create New Version

r = session.post(f"{BASE}/deposit/depositions/{DEPOSIT_ID}/actions/newversion")
new_deposit = r.json()
new_id = new_deposit["id"]

The response returns a new draft ID. The old published record remains accessible — the new version will link to it via the shared conceptdoi.

Step 2: Delete Old File

files = session.get(f"{BASE}/deposit/depositions/{new_id}/files").json()
for f in files:
    requests.delete(f"{BASE}/deposit/depositions/{new_id}/files/{f['id']}",
        cookies=cookies,
        headers={"X-CSRFToken": csrf, "Referer": "https://zenodo.org/"})

Zenodo's new version copies all files from the previous version. We must delete them before uploading replacements (there is no file replacement endpoint — only add and delete).

Step 3: Upload New PDF

with open("paper.pdf", "rb") as fh:
    r = requests.post(f"{BASE}/deposit/depositions/{new_id}/files",
        cookies=cookies,
        headers={"X-CSRFToken": csrf, "Referer": "https://zenodo.org/"},
        files={"file": ("paper.pdf", fh)})

Note the use of requests.post (not the session object) for file uploads. The session object's content-type header (application/json) conflicts with multipart form data.

Step 4: Update Metadata

r = session.get(f"{BASE}/deposit/depositions/{new_id}")
current = r.json()
meta = current.get("metadata", {})

meta["title"] = "Your Title"
meta["creators"] = [{"name": "Weiss, Tobias", "affiliation": "Independent"}]
meta["upload_type"] = "publication"
meta["publication_type"] = "preprint"
meta["access_right"] = "open"
meta["license"] = "CC-BY-4.0"

session.put(f"{BASE}/deposit/depositions/{new_id}", json={"metadata": meta})

The PUT call preserves the prereserved DOI — this is why we use the legacy API and not the v14 draft API.

Step 5: Publish

r = session.post(f"{BASE}/deposit/depositions/{new_id}/actions/publish")
# Returns 202 Accepted with state="done" on success

The response includes the final DOI and public URL.

Complete Working Script

import requests

BASE = "https://zenodo.org/api"
DEPOSIT_ID = "20479512"  # initial deposit ID
COOKIES = {
    "session": "<your_session>",
    "csrftoken": "<your_csrftoken>"
}
CSRF = COOKIES["csrftoken"]
HEADERS = {"X-CSRFToken": CSRF, "Referer": "https://zenodo.org/"}

# 1. New version
r = requests.post(f"{BASE}/deposit/depositions/{DEPOSIT_ID}/actions/newversion",
    cookies=COOKIES, headers=HEADERS)
new_id = r.json()["id"]

# 2. Delete old files
for f in requests.get(f"{BASE}/deposit/depositions/{new_id}/files",
                      cookies=COOKIES, headers=HEADERS).json():
    requests.delete(f"{BASE}/deposit/depositions/{new_id}/files/{f['id']}",
                    cookies=COOKIES, headers=HEADERS)

# 3. Upload new file
with open("paper.pdf", "rb") as fh:
    requests.post(f"{BASE}/deposit/depositions/{new_id}/files",
        cookies=COOKIES, headers=HEADERS,
        files={"file": ("paper.pdf", fh)})

# 4. Update metadata
meta = requests.get(f"{BASE}/deposit/depositions/{new_id}",
    cookies=COOKIES, headers=HEADERS).json()["metadata"]
meta.update({
    "title": "Your Title",
    "creators": [{"name": "Weiss, Tobias", "affiliation": "Independent"}],
    "upload_type": "publication",
    "publication_type": "preprint",
    "license": "CC-BY-4.0",
})
requests.put(f"{BASE}/deposit/depositions/{new_id}",
    cookies=COOKIES, headers=HEADERS, json={"metadata": meta})

# 5. Publish
r = requests.post(f"{BASE}/deposit/depositions/{new_id}/actions/publish",
    cookies=COOKIES, headers=HEADERS)
print(f"Published: {r.json()['doi_url']}")

Results

Revision	DOI	Filesize	What Changed
v1	10.5281/zenodo.20479512	195 KB	Initial publication
v2	10.5281/zenodo.20479919	221 KB	Author line fix, booktabs table formatting
v3	10.5281/zenodo.20480198	221 KB	Final author line cleanup

All three revisions share the concept DOI: 10.5281/zenodo.20479511. The latest version is always served at the concept DOI URL, while individual DOIs remain permanently accessible.

Lessons Learned

Use the legacy API. The /api/deposit/depositions endpoints are stable and the prereserve DOI mechanism works. The v14 /api/records/{id}/draft API has an unrecoverable DOI deadlock.
Session cookies expire. Zenodo session tokens have a limited lifetime. For production CI/CD, use a personal access token instead.
New versions copy all files. Always delete old files before uploading new ones. There is no file replacement endpoint.
Use requests.post for file uploads, not the session object. The Content-Type: application/json header on the session object conflicts with multipart form data.
Delete corrupted drafts via the legacy API. If a draft enters the DOI deadlock state, use DELETE /api/deposit/depositions/{id} with the same cookie auth to clean it up.

Future Work

CI/CD Integration

The most natural next step is wrapping the publish script into a GitHub Actions (or Forgejo Actions) workflow. On every push of a release tag (v*), the workflow would:

Build the PDF from the latest markdown source
Create a new Zenodo version
Upload, update metadata, and publish
Optionally attach the git tag as a Zenodo version identifier

This eliminates the remaining manual step of running a script with session cookies. A Zenodo personal access token stored as a repository secret replaces cookie authentication entirely.

BrowserMCP Cookie Extraction

The current workflow requires manually copying session and csrftoken cookies from the browser developer tools on every run. A BrowserMCP integration could automate this step entirely:

Launch a browser session via MCP (Playwright-based)
Navigate to zenodo.org (using existing login session)
Extract cookies from the browser context via page.evaluate(() => document.cookie) or the CDP Network.getCookies method
Pass the cookies directly to the Python publishing script

For Headless CI environments, BrowserMCP can authenticate via the Zenodo login page (email + password) and extract the resulting session cookies programmatically, making the entire flow unattended.

Bearer Token Migration

Generate a token from the Zenodo settings page
Store it in environment variables or a CI/CD secret
Use requests.Session(headers={"Authorization": f"Bearer {token}"}) — no cookies, no CSRF headers

The token scopes can be limited to deposit write access, reducing the security surface compared to a full session cookie.

Community & Collection Integration

Zenodo supports community collections that group related records. A future version of the workflow could:

Auto-submit records to a curated community (e.g., "Mathematics" or "Machine Learning")
Tag records with community-specific metadata fields
This would improve discoverability without manual curation after each upload

Multi-Asset Deposits

Currently only the PDF is uploaded. Many computational papers warrant supplementary material:

Training datasets (LMFDB-derived Hecke traces)
Model checkpoints (trained GNN weights)
Reproducibility notebooks (Jupyter/Colab)
The Zenodo API supports multiple files per deposit via repeated POST calls to the files endpoint

Monitoring & Email Notification

Preprint Server Integration

Zenodo DOIs are increasingly accepted by preprint servers like arXiv, HAL, and OSF. A natural extension is a cross-posting workflow that:

Publishes to Zenodo (DOI generation)
Submits the same PDF and abstract to a preprint server
Links the Zenodo DOI in the preprint metadata

Motivation

The DOI Deadlock (or: Why Not Use the Modern API)

Authentication

Workflow

Step 1: Create New Version

Step 2: Delete Old File

Step 3: Upload New PDF

Step 4: Update Metadata

Step 5: Publish

Complete Working Script

Results

Lessons Learned

Future Work

CI/CD Integration

BrowserMCP Cookie Extraction

Bearer Token Migration

Community & Collection Integration

Multi-Asset Deposits

Monitoring & Email Notification

Preprint Server Integration

References

Never miss a deep-dive

Motivation

The DOI Deadlock (or: Why Not Use the Modern API)

Authentication

Workflow

Step 1: Create New Version

Step 2: Delete Old File

Step 3: Upload New PDF

Step 4: Update Metadata

Step 5: Publish

Complete Working Script

Results

Lessons Learned

Future Work

CI/CD Integration

BrowserMCP Cookie Extraction

Bearer Token Migration

Community & Collection Integration

Multi-Asset Deposits

Monitoring & Email Notification

Preprint Server Integration

References

Never miss a deep-dive