ArXiv declares independence from Cornell
- frankling_ - 44107 sekunder sedanThe recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.
The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).
Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.
In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.
"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.
- swiftcoder - 37532 sekunder sedan> raised concerns about the proposed $300,000 salary for arXiv’s new CEO, saying it seemed high
Is a mid-to-high engineering salary outlandish for a CEO of what is likely to be a fairly major non-profit? Even non-profits have to be somewhat competitive when it comes to salary, and the ideal candidate is likely someone who would be balancing this against a tenured position at a major university
- halperter - 51058 sekunder sedanStatement by arXiv: https://tech.cornell.edu/arxiv/
- whiplash451 - 20861 sekunder sedanI'm not sure why we're so focused on filtering what gets into arxiv (which is an uphill battle and DOA at this point) vs fixing the indexing, i.e. the page rank of academia.
Google "sorted out" a messy web with pagerank. Academic papers link to each others. What prevents us from building a ranking from there?
I'm conscious I might be over-simplifying things, but curious to see what I am missing.
- krick - 20083 sekunder sedanIt's not that hard to make a mirror or arXiv. Basically, anybody who can pay for hosting (which, I suppose, isn't very cheap now when the whole world uses it). It's a problem to make users switch, because academia seems to have this weird tradition of resisting all practices that, god forbid, might improve global research capabilities and move forward the scientific progress. But then, if arXiv actually becomes unusable, I suppose they won't really have much choice than to switch?
And, FWIW, I do think that arXiv truly has a vast potential to be improved. It is currently in the position to change the whole process of how the research results are shared, yet it is still, as others have said, only a PDF hosting. And since the universities couldn't break out of the whole Elsevier & co. scam despite the internet existing for the 30 years, to me, breaking free from the university affiliation sounds like a good thing.
But, of course, I am talking only about the possibilities being out there. I know nothing about the people in charge of the whole endeavor, and ultimately in depends on them only, if it sails or sinks.
- psalminen - 50391 sekunder sedanI might be missing something, but I still don't get the why. I don't see any "problem" that needs to be solved.
- lifeisstillgood - 13143 sekunder sedanI am sure it’s a dumb idea but why is there a problem for say the National Science Foundation or something to run a website that replicates ArXiv - if you are from an accredited university or whatever you can publish papers, fulfilling the “pdf store” function.
Then getting peer reviewed is a harder process but one can see some form of credit on the site coming from doing a decent reviewers job.
I suspect I am missing a lot of nuance …
- MetaMonk - 3859 sekunder sedan
- taormina - 23570 sekunder sedanGiven that Cornell charges what, $50k a year as an Ivy League, $300k feels like almost nothing.
- hereme888 - 21088 sekunder sedanFrom my limited experience, arXiv appears to include many low-quality, unreproducible papers, and some are straight-up self-marketing rather than serious scientific work.
- jeremie_strand - 2384 sekunder sedanArXiv provides such an easy interface to navigate scientific papers, most are from computer science of course. Hope they can grow bigger and solve the paywall pain in open research. Any implication to Bioxiv?
- dataflow - 51730 sekunder sedanThis sounds terrible. Of course there's a huge risk of it becoming made for-profit. It almost makes you wonder if the academic publishers are behind this push somehow.
Could they not have made it into some legal structure that puts universities at the top? Say, with a bunch of universities owning shares that comprise the entirety of the ownership of arXiv, but that would allow arXiv to independently raise funds?
- asimpleusecase - 42936 sekunder sedanI wonder if there are plans to licence the content for AI training
- contubernio - 30070 sekunder sedanWhat is worrisome about this development, and corollary actions like the hiring of a CEO with a $300,000/year salary, is that the essentially independent and community based platform will disappear. The ArXiv exists because mathematicians and physicists, and later computer scientists and engineers, posted there, freely, their work, with minimal attention to licensing and other commercial aspects. It has thrived because it required no peer review and made interesting things accessible quickly to whomever cared to read them.
A setup as a US-based "non-profit" is worrisome, if only because 300K is an obscene salary even in a for-profit setting. That the US-based posters can't see this is evidence of the basic problem which is that the US, both left and right, has been taken over by a neoliberal feudal antidemocratic nativist mindset that is anathema to the sort of free interchange of ideas that underlay the ArXiv's development in the hands of mathematicians and physicists now swept aside and ignored by machine learning grifters and technicians who program computers.
- bonoboTP - 36755 sekunder sedanI fear their Mozilla-ification and Wikipedia-ification. Scope creep, various outreach feel-good programs, ballooning costs, lost focus etc. And other types of enshittification.
Any change to the basic premise will be a negative step.
They should just be boring quiet unopininionated neutral background infrastructure.
- hirako2000 - 20573 sekunder sedanDo research papers published on Elsevier's sort of media remain more prestigious?
I read a dozen papers a month, typically on arxiv, never from paywalled journals. I find the quality on par. But maybe I'm missing something.
- Aerolfos - 40968 sekunder sedanAnd they hired a LinkedIn business idiot to run the new organization - so the aim is for an infinite growth tech startup in terms of governance, despite the technical legal status of non-profit. It shows in the language they use in the announcement, too ("improved financial viability in the long run")
OpenAI shows exactly how well that works and what that kind of governance does to a company and to its support of science and the commons.
TL;DR, it's fucked.
- Garlef - 44281 sekunder sedanMaybe they should implement a graph based trust system:
You need your favourite academic gatekeeper (= thesis advisor) to vouch for you in order to be allowed to upload.
Then AI slop gets flagged and the shame spreads through the graph. And flaggings need to have evidence attached that can again be flagged.
- vedantxn - 31526 sekunder sedanwe got this before gta 6
- tokai - 19319 sekunder sedanThis is exactly what happened last time when scientific publishing got cornered. Journals run by departments and research groups were spun out or sold off to publishers and independent orgs. And they continued to slowly boil the frog over 50 years with fees and gate keeping.
Its especially problematic because while ArXiv love to claim to be working for open science, they don't default to open licensing. Much of the publications they host are not Open Access, and are only read access. So there is definitely the potential to close things off at some point in the future, when some CEO need to increase value.
- tornikeo - 51046 sekunder sedanNow the question is, will arxiv wage a decade long bloody war with Cornell, using heavy infantry (PhD students), archers (reviewers) and field artillery (AI slop papers), or will the independence be mostly peaceful? Only time can tell.
- losvedir - 21841 sekunder sedanarXiv is great. It's just a problem that there's so much slop. What if arXiv offered a subscription service that people in different fields could use to just see a curated selection of the top papers in their field each month. Established researchers in each field could then review some of the preprints for putting into the curated monthly list.
Oh, wait.
- OutOfHere - 46014 sekunder sedanWith 300K for the CEO, its enshittification will commence imminently. It will now serve to maximize revenue. Just wait and watch while they issue a premium membership, payment requirements for authors, and other revenue generators to please their investors.
- Peteragain - 45656 sekunder sedan.. and soon to be dependent on US military funding? Controlled by someone who has run-ins with universities? This'll end in tears.
- juped - 28599 sekunder sedan>Cornell, for example, had a limited capacity to pay software developers to maintain and upgrade the site, which still has a very no-frills look and feel.
arXiv is doomed. It was nice while it lasted.
- shevy-java - 41365 sekunder sedan"Recently arXiv’s growth has accelerated. Since 2022, it has expanded its staff to 27, in large part to deal with a 50% increase in submitted manuscripts."
I am wary of that. IMO the business model is damaged therein. You can say in 2022 we had 27; bankrupt in 2030.
- adamnemecek - 52185 sekunder sedanGood call, ArXiv seems like one of the most important institutions out there right now.
- Drblessing - 20609 sekunder sedanArXiv is dead. Expect a paywall within three years, or other enshittification and slop added.
- ryguz - 14225 sekunder sedan[dead]
- bobokaytop - 42217 sekunder sedan[dead]
- Ghengeaua - 38512 sekunder sedan[dead]
- stefantalpalaru - 25612 sekunder sedan[dead]
- unit149 - 50508 sekunder sedan[dead]
- eastern-sun - 36254 sekunder sedan[dead]
- tgtracing - 50127 sekunder sedan[dead]
- ACCount37 - 32506 sekunder sedanFrankly, the only beef I have with arXiv as is: its insistence on blocking AI access.
I had to tell my AI to set up an MCP for "fetch while bypassing arXiv's rate limit" so that it doesn't burn 40k tokens looking for workarounds every time it wants to look at a paper and gets hit with a "sorry, meatbags only" wall.
Very annoying, given how relevant arXiv papers are for ML specifically, and how many of papers there are. Can't "human flesh search" through all of them to pick the relevant ones for your work, and they just had to insist on making it harder for AIs to do it too.
- davnicwil - 48509 sekunder sedanVery unrelated to the article, but I think 'arXiv' as a brand is bad, and really detrimental to what the institution aims to accomplish.
That is, it's not readily parseable, it really gives an insider term vibe - like this isn't for you if you don't already know what it means or how you should read or say it. It sort of reminds me of the overuse of latin and latinate terms generally in the old professions and, well, the academy.
Just always struck me as being somewhat at odds with the goal.
Nördnytt! 🤓