02 March 2011

Sharing research code

The more I work on my research code, PyTRT (a research code with several dozen transport methods—diffusion, Monte Carlo, P1, SN, IMC, etc.—implemented in 1D and 2D), the more I'm convinced this could be a useful tool to many people in methods development. I'm forced to ask myself what I should do with it after I graduate. It's entirely likely that I will continue to use it in my next career, but should I release it to the public, or to my department, or to whom? Should I try to sell it?

It was developed using funds from an NSF GRF, which "encourages Fellows to share software and inventions, once appropriate protection for them has been secured, and otherwise act to make the innovations they embody widely useful and usable," and from an NEUP fellowship, under which the "DOE claims no rights to any inventions or writings that might result from its fellowship awards." That seems to leave me all the flexibility I need.

Does anyone have an opinion to offer?

8 comments:

Anthony Scopatz said...

Put it in SciPy ;).

Actually, I was just having a discussion yesterday about the future direction of SciPy. Myself and others would like to see sucessful projects from the outside world start filling in the (large) holes in the current project.

I have often thought that general transport methods would be good to have available.

So, despite the above emoticon, I am dead serious here. I'll help you with the migration process if you decide to go this direction...

Mike Graham said...

Anthony,

I think I disagree. scipy just isn't the right place for cutting-edge research-level methods and for the most part probably isn't the place for specific-physics-oriented methods at all.

I believe last night you made the suggestion that some Python software from another domain--biology--should be incorporated in scipy; from what I observed, several of the scientific developers present thought this was a bad idea and no one else that I recall seemed to jump to its defense.

scipy is already large and bordering on unmaintainable, and I don't really see the utility in dumping everything we can think of into it. Incorporating code into scipy has ramifications on deployment, release cycles, portability, etc. which we wouldn't want for many projects.

If we did want to enhance scipy to address the problems faced in nuclear transport modeling, I think we should try for something of broader use. For example, I don't think Python has a good story for mesh representation for computational sciences. It would be great to see an architecture for this, one which could support the different grid topologies and features used in various methods, something that we could easily pass to TVTK/mayavi but that could also be used by pymeshgeneration, pypolymersimulation, pyturbulentflow, pynuke, pyelectromagnetics, and so forth. The more domain-specific things get, I see the cost outweighing the benefit.


It would be interesting to know what the community at large really wants out of these tools and whether making scipy a one-stop-shop for scientific computing software for Python might be the right way to do so. To this end there a couple recent efforts of which you are aware but bear mentioning:

- A convore group was created at to serve as a multidisciplinary discussion forum for scientific computing in Python and to bootstrap more appropriate fora, and

- A bit less directly, your new podcast has begun to be one such forum for scientific programmers.

Anthony Scopatz said...

I think Mike made a comment but then may have deleted it, because I am not seeing it here anymore.

Can anyone confirm or deny?

Mike said...

I didn't delete my post (at least not on purpose).

Seth said...

[Reposting Mike's comment which somehow disappeared but was emailed to me with the other comments: ]

Anthony,

I think I disagree. scipy just isn't the right place for cutting-edge research-level methods and for the most part probably isn't the place for specific-physics-oriented methods at all.

I believe last night you made the suggestion that some Python software from another domain--biology--should be incorporated in scipy; from what I observed, several of the scientific developers present thought this was a bad idea and no one else that I recall seemed to jump to its defense.

scipy is already large and bordering on unmaintainable, and I don't really see the utility in dumping everything we can think of into it. Incorporating code into scipy has ramifications on deployment, release cycles, portability, etc. which we wouldn't want for many projects.

If we did want to enhance scipy to address the problems faced in nuclear transport modeling, I think we should try for something of broader use. For example, I don't think Python has a good story for mesh representation for computational sciences. It would be great to see an architecture for this, one which could support the different grid topologies and features used in various methods, something that we could easily pass to TVTK/mayavi but that could also be used by pymeshgeneration, pypolymersimulation, pyturbulentflow, pynuke, pyelectromagnetics, and so forth. The more domain-specific things get, I see the cost outweighing the benefit.


It would be interesting to know what the community at large really wants out of these tools and whether making scipy a one-stop-shop for scientific computing software for Python might be the right way to do so. To this end there a couple recent efforts of which you are aware but bear mentioning:

- A convore group was created at to serve as a multidisciplinary discussion forum for scientific computing in Python and to bootstrap more appropriate fora, and

- A bit less directly, your new podcast has begun to be one such forum for scientific programmers.

Anthony Scopatz said...

[uuugh I just wrote a huge comment and then blogger lost it. I'll summarize here.]

I was, likely poorly, mixing the terms SciPy to mean both SciPy and SciKits. There have been a couple of very successful SciKits: learn and statsmodels.

If you are worried about both having visibility AND more direct ownership, this is the problem that scikits were meant to solve.

I have been meaning to start a nuke scikit for a while and would be ecstatic to contribute a project like this.

I also agree that a de facto mesh or geometry spec in python is a fantastic idea.

On a side note: largely I see the problems in scientific computing as cultural, so I am trying facilitate social solutions to them. I hope this explains my drive to improve issues like code sharing, awareness, testing, etc.

Please let me know what you think!

Mike said...

Being in the scikit namespace certainly makes more sense to me. I still have a somewhat hard time seeing that it would be a net gain, though I understand the desire to improve visibility and foster community.

The repaste of my post left out the links that were in the original post. As a reminder, those were
The convore thread: convore.com/python-scientific-computing
The inSCIght podcast: feeds.feedburner.com/Inscight

Anthony Scopatz said...

The net gain is just what you stated; there is a larger community and greater visibility around this code.

It would be giving this research effort back to the community. Other people can work on it and improve it. If the project is successful, then a tangible benefit is that Seth would have received a lot of development on a project his project that he did not have to implement. All of the standard open-source rules apply.

As I stated before, I would be very excited about working on a scikits.nuke project. My code would benefit, as would everyone else who worked on it.

Thanks for posting those links Mike. Additionally, I have started a thread on convore on this topic: https://convore.com/python-scientific-computing/scikitsnuke/

Post a Comment