50 reasons to learn the shell for doing data science
"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems.
Talk Title | 50 reasons to learn the shell for doing data science |
Speakers | Jeroen Janssens (Data Science Workshops) |
Conference | Strata Data Conference |
Conf Tag | Making Data Work |
Location | London, United Kingdom |
Date | May 22-24, 2018 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
“Anyone who does not have the command line at their beck and call is really missing something,” tweeted Tim O’Reilly when Jeroen Janssens’s Data Science at the Command Line was recently made available online for free. As Tim’s tweet suggests, the command line (and its ecosystem of power tools) is not just standing the test of time; it’s more popular than ever. Join Jeroen to learn what you’re missing out on if you’re not applying the command line and many of its power tools to typical data science problems. The Unix command line isn’t just available on web servers, wireless routers, and supercomputers. It can also be found on macOS, the Raspberry Pi, and, most recently, Windows 10. Although invented decades ago, it turns out to be an amazing environment for efficiently performing tedious but essential data science tasks—and in some situations, it even outperforms new technologies. By combining small, powerful command-line tools like grep, sort, awk, parallel, jq, and csvsql, you can quickly obtain, scrub, explore, and even model your data. If you’ve ever wondered what the command line is or what it can do for you, this session is for you. Jeroen walks you through applying the command line to some typical data science problems and covers the core concepts of the command line. You’ll learn how to break a data science problem into smaller problems, choose the appropriate command-line tools, and chain them together and how to integrate the command line with your existing data science workflow, whether it consists of the Jupyter Notebook, R, or Excel. You’ll leave ready to get started with the command line and will probably want to learn more about this exciting piece of technology. And why not? It’s been around for almost 50 years. It’s not like it’s going anywhere soon.