STM brainstorming session - 2013


Just attended the STM brainstroming session. I’ll update these notes in due course, and fix spelling issues, but I wanted to get the post live first.

Notes I’ll just mention the things that I found interesting.

## Round1

Science Gists get a mention, yay!!

Google scholar library gets a mention.

Visualising data as maps is mentioned, mentions that there are no standards

Howard mentions much richer tagging in the article, and upfront semantic tagging. Someone mentions PeerJ as a submissions system that can do this.

New payment methods.

A growing trend in peer review experiments. John Sack mentions prescore, a service that evaluates the people participating in the peer review process, and gives those people an impact rating.

Of course research data is getting a bunch of mentions.

Detection of scientific fraud needs to be dealt with. This is related to reproducability but not always. Fraud happens.

The importance of communicating science through images and graphics, not visualisation, rather explaining the contents of an article through an image, sometimes called graphical abstracts. Readers start reading articles through the images, and writers start writing through the images, the role of a publisher could be to create image creation and curation services.

The notion of virtual laboratories is meniotned, sucah a chemconnective out of carnegie mellon. This is going to require different types of content than we are used to.

A machine processable format to enable textmining, perhaps a format that all publishers can make avilalbel, normalise all the formats. Pharma have been asking for this too.

Smart articles are mentioned, the article that finds you, and not you finding the articles. (this matches my point #2).

Round 2

Following on from standards for text mining - standard formats for research data (interestingly I’ll be at a meeting about this next week). This also touches on reproducabilty. Crossref guy mentions virtual machine enviornment issue.

The cloud is mentioned, for running all of the computers, the hosting, will run on several clouds.

I mention the eclipse of the authoring tool.

eBooks is mentioned. The development of the good display of ebooks using ePub3, it might not happen, but it might be good to have happen.

The importance of google and google scholar, and the control that they drive over the sites. We just jump and do whatever they say. ACS is not alone in that upwards of 40-50% of usage can originate from google. (I do mention that OA can help).

Unoticable search, like google now.

The notiion of text mining and digital humanities. In connection with this the issue of data privacy is mentioned.

Metadata, this is the future. Tim O’Reilly has mentioned the notion of digital smog. Project Mesur from Johan Bollen is mentioned. Watching the behaviour of scholars via DOI resolvers, and finding where users are at the intersection of new and emerging fields of study. There is someting very interesing there.

Licencing at the component level, below the traditional level of the article, this is realted to the metadata issue.

Standards for supplimentary data for articles (this is sounds like standard formats for research data), though one might say is that

Crossref mentions hosted journals as a platform. I mention scholastica, based out of chicago.

Jeremey mentions services for individuals. (Eefke mentions that it could be related to predictive analytics, Jeremey mentions that the important thing is thinking about turing this into services for individuals).

The work around altmetrics is mentioned as something that has really exploded. Beckham medial library have mentioned that they are collecting 350 measures of impact. Mentioned that last year . Crossref is taking on the PLOS ALM tool, and providing that as a service for publishers. They have started to look at metrics that PLOS have. They have log files from the dx.doi resolution log files. They are considering whether they can make that data avilalbe.

Machine readable article reuse rights is mentioned by Howard. Says that CC is fine, but there are a lot of publishers who don’t want to use those license. Eefke mentions the NISO groups.

John Sack mentions the effects of dealing with industrial espionage. He says that their servers are getting hit hard by usage patterns that look like those it’s coming from china. This is counter complient usage, you can’t just filter it out, it’s the.

Someone mentioned that someone tried to log in to their peer review system 10k times within the same hour from the same IP address.

The infrastrucutre and business model coming from funder mandates.

Sharing sutes, how can publishers work with the usage on sharing sites. (From Elsevier). Want to make it a win win situation, do not want to obstruct the scientific collaboration. (really intersging).

Richard from OUP - an increasing value in privacy and secrecy in a world that is open.

Wearable Computers, content creation and content comsumption on these devices, example geo tagging. Touches on privacy issue.

The last word, Socail networkding for scientists, and whether their needs are filled by linkedin or google plus, or whether they should build those sites.

IBM guy comments. Many of these ideas impact on cognitive computing. Mentions that the doctors don’t have the time to read all of the articles. The machines are going to read these articles for the doctor, gets into the question of computers that can also interpret images, like X-Rays. Now they are starting to get into rich media. Now everythin is being recoreded on a you tube challen (each executive seems to have their own you tube channell). Can Watson interpret the grahps, the spreadsheets. There is the whole questions. Then there is the other aspect that these computers can theoretically come up with new knowledge. How does that get disseminated. Can the computer publihs to the world that it is creating. (what would it look like for the computer to apply for grant funding).

John mentioned the research that happened at Stanford that showed that AI can generate hypothesis. The collossus experiment is mentioned.

# Some general discussion

A question come up on what are the metrics that will come out from the text and data mining world, what will be success? (I suggest that we indeed don’t have the answer to that question).

# Now we are going to group these themes into common themes.

We are looking for people to standup and talk about what they think the main themses are. Here are some:

Machine to Machine, and machine enabled.

Many items came up in this, from machine readable data, to the general issue of standards. Computational lingusitsics as an enabling technology.

New authoring mechanisims

From new authoring tools, to experiential data, and this leads in to big data. There is some discussion that ALMs and enrichment might fall under this category, but enrichement and metadtaa might be …

Enrichment and metadta is it’s own topic.

A better way to describe this is artefact enrichment.

Helping the human

Helping the human at both ends, submissions systems that coulld help get the content in, the peer review process, the publication process, these are all pain points for authors and readers. Can this publisher to human interaction be made more efficient.

B3B vs B2C is an interesting tension, the author as the new customer, and where is the librarian.

The individual is becoming more imporant (I mention APC and the rise of the social media as something that can make these complaints louder). Looking after the author in the way we have to gear up to is a different way to thinking about how to manage the business. Someone mentions the cost of geraing up to this is a hidden cost.

## Howard asks about media

Is this an area that we just do? Has it become rote? I suggest that it has.

## Headaches

I mention that 150 GB is a headache for me. Someone mentions conversions of small chunks of traditional written english or spanish content, and translating it to a local audio or video file that can be disseminated via smarphones, epscially in Afriac, across langages, in BRIC countries

Someone mentions that the google translate widget was put on their platform. The usage is phenomenal - that’s really interesing to me. John Sack mentioned that the editor;s of the journals complained a few years ago as the quality was so bad. It was put on places like the aims and scope, and it has helped drive submissions. It is also used on the abstracts. (could one do a tranlstion of the gist of the aritlce, like the digest). It was not marketed as pure translation. IBM for the first time passed the point where they have more employees in India than in the US, and soon this will also be true for China. Nairobi is the latest place where IBM have a research lab - you go where the customers are, you go where the world is. If you did that 5 years ago, you might have thought that South Africa would have been the place to be.

Comment on where is the growth in submissions are coming from - obviously Asia,

## Google

Eefke mentioned that this is the first time for a while that Google has been mentioned. Howard mentions that google is changing. Knowledge graph is mentioned. The death of the UI is mentioned. Is google the Publisher UI now? It’s mentione dthat the UI can now speak to you, or drop a

## The Steam gamings platforms is mentioned.

This allows games to be allowed to compete on a common platform.

## Google Scholar

What will happen to google schoalr. The sentiment is taht this product is going to be dead in the water.

Hmmm, my notes become a bit distracted at this point. There is a conversation about the death of the UI, being replaed by google, hooking into siri, into google now, selling that as a service.

IBM person mentions the development of an ecosystem, they have just opned a Watson developers platform that helps move towards an ecosystem.

THere is a discussion of APIs. I’m going to stop talking now, I’ve talked a little bit too much today. There is the mention of STM being a place where these kinds of standards can emerge.

Last year the hybrid reading experience was a trend last year. John mentions eLife lens as an experiment in the reading experience. The Elsevier person mentions that the reading experience is changing because of the rise of the PDF. He thinks that the tablet might kill the PDF. Howard things that the PDF has had a resurgance because the the PDF.

Another trend from last year was the movement from the institution to the individual, and this year we are talking about the author. Someone raises the question of what do you get for your money if you are an author. If you were to sign up to Kudos, for example, who is going to pay for that - it’s said that publishers are currently paying for that, would funders pay for that in the future. The transactional stuff of how you watch money flow is difficult. For a hybird journal this is really complicated. There is also a thing about the various pots of money.

Another trend from last year OA is still a hot topic. We are beginning to understand it better. Is there a place in the world for a common payment system for paying for OA. There was a hope that OAK could have done something, but it didn’t happen. The Ebsco’s and SWETS are possibly looking to move into that market. It might be the university that decides what system is going to be used. One mechanisim would be institutional payment accounts. This also comes back to the comment about authoring tools.

I was unable to not talk any more, and I mentioned my idea around vendors havine a bigger capacity to do digital product development. The rise of more startups is also mentioned.

(As a comment, looking over the discussion to date, it’s hard to view a few key actionables that will actually effect the industry. What do we d?).

John Sack mentions that a lot of what we have up here is evolution from last year. One trend that John sees is that there are more startups. They are driving the conversation, they might not be making money, but they are driving the convesation. Open access is also driving this.

There is a lot of change that has to come in education, and that is drving money into the system too. Talking about education, have there been discussions around the differnece in the role of scholarly material in education, rather than in the research context, for example K12, Professor who is teaching. There is huge growth in the budgets in education in Inida and China.

Research data is mentioned again. We might think of the new ego-system, rather than an eco-system, in terms of our relationships with our authors. There is also the data citation component.

We saw a bunch of startups around citation managmend, now about the authoring proces, and increasingly about the managment of the research data component. There is also the open data pressure from funding agencies. There is a growing need to handle data. Libraries are morphing into the experts on how to do grants, how to manage data, how to support the research process, and how to support researchers.

# We are moving to the final summary.

We are that STM ecosystem. As an industry trend can see us collaborating more for the common good. The issue is raised, why can’t we standardise around submission systems?

My one thought is that some of these issues are easier to deal with if you are an OA publisher (As Hpward says, we still have not solved the Machine to Macine MD issue - and I agree with that)

Someone mentions that most of the revenue that comes from institutional subscriptions, the “transition” is going to be a multiyear tool.

What are the trends and standards that are either dead now, or what are our predctions that have not matured. It would make for an intersting article or white paper. Eefke says that many of the startups that were mentioned in previous years have now been aqcuired, or are dead.

How do you do innovation within the establishment. Startups have it easy. For larger publhsers, you always have to make some kind of jsutification to a set of people to whom your idea can never succeed.

The issue of where data sharing used to be unknown in schoalry research. Holding on to your data is becoming no longer a culturally viable option (I think we are still behind on this, in that data sharing remains the exeption rather than the rule).

Archives and the grey literaure, and content being mopped up from these archives, is an interesting trends. The feeling is that for publishers to stuff this content into dark archives will no longer cut it. We run the risk of getting excited by exiting stuff, but we have to think about this. There is some disagreement on this point.

The increase in the amount of semantic information that is becoming avialabe is intersing. The growth of DOIs, it’s ending up in organisation that is not siloed, like crossref and ORCID. It is gong to be interesting to see waht else comes out from this. John mentions that it’s interesting that corssref becomming a hub is interesting.

Jeremey says that we have a lot to learn, it’s early days, where these trends will take us, we don’t know. For publishers who are in transition, it is going to be a very interesing project.

Standards are mentioned. 70 million articles having been managed at Portico shows that there are wildly different levels of quality, even within a single publisher. It’s exiting, but there is a lot of work.

The transition to multiple devices is exiting, some preparatory work has happned, but there is a lot of work left. We can never more as fast as we would like.

Howard says that the validation of all of this metadata is really important. There is a difference between good enought for a human and good enough for a machine.

John mentions that social media has not been mentioned that much. Sharing tools were mentioned from the Elsevier guy. He says that it’s not a part of most platforms, but it’s a part of the life of most users. Someone mentions that the ussage they get from traffic come from social media has been really good. It’s driving full text downloads. You can do marketing campegins on top of this.

The biggest oppportunities for catastopic disintermediation are right at the beginning and right at the end. At one end google, at the other end authoring tools.

Collective responsability from us as publishers to do fraud detection is really important. We have to go after the abuse of science. He does not know how to handle that. As an industry wide group we need to go after that. For example when malicously incorrect articles are being sent. Pulling out missing clinical trials has been mentiond. Can you do analysis on anamolous looking data. There could be a question that could be takeld as an industry standard, like doing automatic checking of figures. It is mentioned that this is difficult, that COPE is a good way to tackle this after the fact, but it is reiterated that we want to capture these instances after the fact.

What happens with societeis, OA is going to affect societies more in the next few years.

The opportunity for linked data is mentinoed. Could this be used as a fraud detection tool, such as “this particular author does not seem to have enough connections to have done this reseracher”. An example is mentioned that a faculty discovered that their math faculty’s strengths were not what they thoguht they were.

We get back to the chair, and his final comments. The key phrases he sees are “moving from B2B to B2C” and the label of moving to an “ego-driven system”. Many of the things we discussed last year we discussed today. One thing we didn’t really talk about this were Mendeley Readcube. We didn’t have a conversation about these sites this year - why is that. Do we see them as becoming part of our own platforms, are they still a threat? Perhaps the authoring mechanisims is an areas. It’s mentioned that these are more people to collaborate wiht, and to work with. If they can build technology that a publisher can use, then that’s great. Are they community driven? Does one of them appeal to specific communities. It’s mentioned that endnote also belongs in that space. And that concludes this afternoon.