Automated Zenodo Publishing via the Legacy Deposit API
ResearchTools & InfrastructureThis article documents the automated publishing workflow used to deploy updates of the Comprehensive Modular Forms Preprint to Zenodo entirely via API calls — no browser uploads, no manual form filling.
Motivation
The Riemann Project preprint underwent three revisions in a single day: initial publication, a table formatting improvement, and an author line correction. Each revision required a same-DOI update (via Zenodo's new version system). Doing this through the web UI would have been error-prone and slow. An automated workflow meant each revision cost ~30 seconds of script execution instead of 5 minutes of clicking through forms.
The DOI Deadlock (or: Why Not Use the Modern API)
Zenodo runs InvenioRDM v14, which ships a REST API under /api/records/{id}/draft. The intended workflow is:
- Create a draft via the browser or API
PUT /api/records/{id}/draftwith metadataPOST /api/records/{id}/draft/actions/publish
This fails in a subtle but unrecoverable way. When the browser creates a draft, InvenioRDM auto-reserves a DOI in the backend. But PUT /api/records/{id}/draft replaces the entire draft resource — including the metadata — wiping the DOI reference. The PID remains registered in the backend, so re-reserving it fails with "A PID already exists", while publishing fails with "Missing DOI for required field". The draft is corrupted and must be deleted.
The legacy deposit API (/api/deposit/depositions) avoids this entirely: it returns a prereserve_doi on creation that remains stable across subsequent PUT calls.
Authentication
Zenodo offers Bearer token auth, but for quick iteration I used session cookies obtained from a browser debugger. The cookies are passed as a Netscape-format file:
# Netscape HTTP Cookie File
zenodo.org FALSE / TRUE 1780692402 session <session_token>
zenodo.org FALSE / TRUE 1780472171 csrftoken <csrf_token>
Two critical headers are required alongside cookies:
X-CSRFToken: the csrftoken value (CSRF protection)Referer: https://zenodo.org/(origin validation)
Workflow
The script follows five steps:
[POST] Create new version → /api/deposit/depositions/{id}/actions/newversion
[DELETE] Remove old PDF → /api/deposit/depositions/{new_id}/files/{file_id}
[POST] Upload new PDF → /api/deposit/depositions/{new_id}/files
[PUT] Update metadata → /api/deposit/depositions/{new_id} (with JSON body)
[POST] Publish → /api/deposit/depositions/{new_id}/actions/publish
Step 1: Create New Version
r = session.post(f"{BASE}/deposit/depositions/{DEPOSIT_ID}/actions/newversion")
new_deposit = r.json()
new_id = new_deposit["id"]
The response returns a new draft ID. The old published record remains accessible — the new version will link to it via the shared conceptdoi.
Step 2: Delete Old File
files = session.get(f"{BASE}/deposit/depositions/{new_id}/files").json()
for f in files:
requests.delete(f"{BASE}/deposit/depositions/{new_id}/files/{f['id']}",
cookies=cookies,
headers={"X-CSRFToken": csrf, "Referer": "https://zenodo.org/"})
Zenodo's new version copies all files from the previous version. We must delete them before uploading replacements (there is no file replacement endpoint — only add and delete).
Step 3: Upload New PDF
with open("paper.pdf", "rb") as fh:
r = requests.post(f"{BASE}/deposit/depositions/{new_id}/files",
cookies=cookies,
headers={"X-CSRFToken": csrf, "Referer": "https://zenodo.org/"},
files={"file": ("paper.pdf", fh)})
Note the use of requests.post (not the session object) for file uploads. The session object's content-type header (application/json) conflicts with multipart form data.
Step 4: Update Metadata
r = session.get(f"{BASE}/deposit/depositions/{new_id}")
current = r.json()
meta = current.get("metadata", {})
meta["title"] = "Your Title"
meta["creators"] = [{"name": "Weiss, Tobias", "affiliation": "Independent"}]
meta["upload_type"] = "publication"
meta["publication_type"] = "preprint"
meta["access_right"] = "open"
meta["license"] = "CC-BY-4.0"
session.put(f"{BASE}/deposit/depositions/{new_id}", json={"metadata": meta})
The PUT call preserves the prereserved DOI — this is why we use the legacy API and not the v14 draft API.
Step 5: Publish
r = session.post(f"{BASE}/deposit/depositions/{new_id}/actions/publish")
# Returns 202 Accepted with state="done" on success
The response includes the final DOI and public URL.
Complete Working Script
import requests
BASE = "https://zenodo.org/api"
DEPOSIT_ID = "20479512" # initial deposit ID
COOKIES = {
"session": "<your_session>",
"csrftoken": "<your_csrftoken>"
}
CSRF = COOKIES["csrftoken"]
HEADERS = {"X-CSRFToken": CSRF, "Referer": "https://zenodo.org/"}
# 1. New version
r = requests.post(f"{BASE}/deposit/depositions/{DEPOSIT_ID}/actions/newversion",
cookies=COOKIES, headers=HEADERS)
new_id = r.json()["id"]
# 2. Delete old files
for f in requests.get(f"{BASE}/deposit/depositions/{new_id}/files",
cookies=COOKIES, headers=HEADERS).json():
requests.delete(f"{BASE}/deposit/depositions/{new_id}/files/{f['id']}",
cookies=COOKIES, headers=HEADERS)
# 3. Upload new file
with open("paper.pdf", "rb") as fh:
requests.post(f"{BASE}/deposit/depositions/{new_id}/files",
cookies=COOKIES, headers=HEADERS,
files={"file": ("paper.pdf", fh)})
# 4. Update metadata
meta = requests.get(f"{BASE}/deposit/depositions/{new_id}",
cookies=COOKIES, headers=HEADERS).json()["metadata"]
meta.update({
"title": "Your Title",
"creators": [{"name": "Weiss, Tobias", "affiliation": "Independent"}],
"upload_type": "publication",
"publication_type": "preprint",
"license": "CC-BY-4.0",
})
requests.put(f"{BASE}/deposit/depositions/{new_id}",
cookies=COOKIES, headers=HEADERS, json={"metadata": meta})
# 5. Publish
r = requests.post(f"{BASE}/deposit/depositions/{new_id}/actions/publish",
cookies=COOKIES, headers=HEADERS)
print(f"Published: {r.json()['doi_url']}")
Results
| Revision | DOI | Filesize | What Changed |
|---|---|---|---|
| v1 | 10.5281/zenodo.20479512 | 195 KB | Initial publication |
| v2 | 10.5281/zenodo.20479919 | 221 KB | Author line fix, booktabs table formatting |
| v3 | 10.5281/zenodo.20480198 | 221 KB | Final author line cleanup |
All three revisions share the concept DOI: 10.5281/zenodo.20479511. The latest version is always served at the concept DOI URL, while individual DOIs remain permanently accessible.
Lessons Learned
-
Use the legacy API. The
/api/deposit/depositionsendpoints are stable and the prereserve DOI mechanism works. The v14/api/records/{id}/draftAPI has an unrecoverable DOI deadlock. -
Session cookies expire. Zenodo session tokens have a limited lifetime. For production CI/CD, use a personal access token instead.
-
New versions copy all files. Always delete old files before uploading new ones. There is no file replacement endpoint.
-
Use
requests.postfor file uploads, not the session object. TheContent-Type: application/jsonheader on the session object conflicts with multipart form data. -
Delete corrupted drafts via the legacy API. If a draft enters the DOI deadlock state, use
DELETE /api/deposit/depositions/{id}with the same cookie auth to clean it up.
Future Work
CI/CD Integration
The most natural next step is wrapping the publish script into a GitHub Actions (or Forgejo Actions) workflow. On every push of a release tag (v*), the workflow would:
- Build the PDF from the latest markdown source
- Create a new Zenodo version
- Upload, update metadata, and publish
- Optionally attach the git tag as a Zenodo version identifier
This eliminates the remaining manual step of running a script with session cookies. A Zenodo personal access token stored as a repository secret replaces cookie authentication entirely.
BrowserMCP Cookie Extraction
The current workflow requires manually copying session and csrftoken cookies from the browser developer tools on every run. A BrowserMCP integration could automate this step entirely:
- Launch a browser session via MCP (Playwright-based)
- Navigate to
zenodo.org(using existing login session) - Extract cookies from the browser context via
page.evaluate(() => document.cookie)or the CDPNetwork.getCookiesmethod - Pass the cookies directly to the Python publishing script
This eliminates the most fragile manual step. The MCP server running on the host machine can also handle the file upload — BrowserMCP has access to local files, so it can read the PDF and POST it to the Zenodo API directly without a separate Python script.
For Headless CI environments, BrowserMCP can authenticate via the Zenodo login page (email + password) and extract the resulting session cookies programmatically, making the entire flow unattended.
Bearer Token Migration
Session cookies were expedient but fragile (they expire, and the CSRF token must be refreshed). The Zenodo API supports Bearer token authentication via Authorization: Bearer <token>. A production pipeline should:
- Generate a token from the Zenodo settings page
- Store it in environment variables or a CI/CD secret
- Use
requests.Session(headers={"Authorization": f"Bearer {token}"})— no cookies, no CSRF headers
The token scopes can be limited to deposit write access, reducing the security surface compared to a full session cookie.
Community & Collection Integration
Zenodo supports community collections that group related records. A future version of the workflow could:
- Auto-submit records to a curated community (e.g., "Mathematics" or "Machine Learning")
- Tag records with community-specific metadata fields
- This would improve discoverability without manual curation after each upload
Multi-Asset Deposits
Currently only the PDF is uploaded. Many computational papers warrant supplementary material:
- Training datasets (LMFDB-derived Hecke traces)
- Model checkpoints (trained GNN weights)
- Reproducibility notebooks (Jupyter/Colab)
- The Zenodo API supports multiple files per deposit via repeated POST calls to the files endpoint
Monitoring & Email Notification
The publish endpoint returns 202 Accepted, meaning publication is asynchronous. A future enhancement could poll the record state and send a notification (email, Slack webhook, or Matrix message) once the DOI resolves. This is particularly useful for batch updates across multiple deposits.
Preprint Server Integration
Zenodo DOIs are increasingly accepted by preprint servers like arXiv, HAL, and OSF. A natural extension is a cross-posting workflow that:
- Publishes to Zenodo (DOI generation)
- Submits the same PDF and abstract to a preprint server
- Links the Zenodo DOI in the preprint metadata