Abstract
Background
Psychedelic drugs facilitate profound changes in consciousness and have potential to provide insights into the nature of human mental processes and their relation to brain physiology. Yet published scientific literature reflects a very limited understanding of the effects of these drugs, especially for newer synthetic compounds. The number of clinical trials and range of drugs formally studied is dwarfed by the number of written descriptions of the many drugs taken by people. Analysis of these descriptions using machine-learning techniques can provide a framework for learning about these drug use experiences.
Methods
We collected 1000 reports of 10 drugs from the drug information website Erowid.org and formed a term-document frequency matrix. Using variable selection and a random-forest classifier, we identified a subset of words that differentiated between drugs.
Results
A random forest using a subset of 110 predictor variables classified with accuracy comparable to a random forest using the full set of 3934 predictors. Our estimated accuracy was 51.1%, which compares favorably to the 10% expected from chance. Reports of MDMA had the highest accuracy at 86.9%; those describing DPT had the lowest at 20.1%. Hierarchical clustering suggested similarities between certain drugs, such as DMT and Salvia divinorum.
Conclusion
Machine-learning techniques can reveal consistencies in descriptions of drug use experiences that vary by drug class. This may be useful for developing hypotheses about the pharmacology and toxicity of new and poorly characterized drugs.
Coyle, J. R., Presti, D. E., Baggott, M. J. (2012). Quantitative Analysis of Narrative Reports of Psychedelic Drugs.
Link to full text