Skip to main content

Data Scientist


A data scientist is a professional responsible for collecting, analyzing and interpreting extremely large amounts of data. The data scientist role is an offshoot of several traditional technical roles, including mathematician, scientist, statistician and computer science professional.

In business, data scientists typically work in teams to mine big data for information that can be used to predict customer behavior and identify new revenue opportunities. In many organizations, data scientists are also responsible for setting best practices for collecting data, using analysis tools and interpreting data.

The demand for data science skills has grown significantly over the years as companies look to glean useful information from big data, the voluminous amounts of structured, unstructured and semi-structured data that a large enterprise or internet of things produces and collects.

In job postings, necessary skills typically include the following:
  • Advanced degree, with a specialization in statistics, computer science, data science, economics, mathematics, operations research or another quantitative field.
  • Expertise in all phases of data science, from initial discovery through cleaning, model selection, validation and deployment.
  • Knowledge and understanding of common data warehouse structures,
  • Experience with using statistical approaches to solve analytical problems.
  • Proficiency in common machine learning frameworks.
  • Experience with public cloud platforms and services.
  • Familiarity with a wide variety of data sources, including databases, public or private APIs and standard data formats, like JSON, YAML and XML.
  • Ability to identify new opportunities to apply machine learning to business processes to improve their efficiency and effectiveness.
  • Ability to design and implement reporting dashboards that can track key business metrics and provide actionable insights.
  • Experience with techniques for both qualitative and quantitative analysis.
  • Ability to share qualitative and quantitative analysis in a way the audience will understand.
  • Familiarity with machine learning techniques, such as K-nearest neighbors, Naive Bayes, random forests and support vector machines.
  • Ability to design and implement validation tests.
  • Experience in data visualization tools, such as Tableau and Power BI.
  • Coding skills, such as R, Python or Scala.
  • Ability to aggregate data from disparate sources.
  • Ability to conduct ad hoc analysis and present results in a clear manner


Comments

Popular posts from this blog

Black swan

A  black swan event  is an incident that occurs randomly and unexpectedly and has wide-spread ramifications. The event is usually followed with reflection and a flawed rationalization that it was inevitable. The phrase illustrates the frailty of inductive reasoning and the danger of making sweeping generalizations from limited observations. The term came from the idea that if a man saw a thousand swans and they were all white, he might logically conclude that all swans are white. The flaw in his logic is that even when the premises are true, the conclusion can still be false. In other words, just because the man has never seen a black swan, it does not mean they do not exist. As Dutch explorers discovered in 1697, black swans are simply outliers -- rare birds, unknown to Europeans until Willem de Vlamingh and his crew visited Australia. Statistician Nassim Nicholas Taleb uses the phrase black swan as a metaphor for how humans deal with unpredictable events in his 2007...

A Graphics Processing Unit (GPU)

A graphics processing unit (GPU) is a computer chip that performs rapid mathematical calculations, primarily for the purpose of rendering images. A GPU may be found integrated with a central processing unit (CPU) on the same circuit, on a graphics card or in the motherboard of a personal computer or server. In the early days of computing, the CPU performed these calculations. As more graphics-intensive applications such as AutoCAD were developed; however, their demands put strain on the CPU and degraded performance. GPUs came about as a way to offload those tasks from CPUs, freeing up their processing power. NVIDIA, AMD, Intel and ARM are some of the major players in the GPU market. GPU vs. CPU A graphics processing unit is able to render images more quickly than a central processing unit because of its parallel processing architecture, which allows it to perform multiple calculations at the same time. A single CPU does not have this capability, although multi...

6G (sixth-generation wireless)

6G (sixth-generation wireless) is the successor to 5G cellular technology. 6G networks will be able to use higher frequencies than 5G networks and provide substantially higher capacity and much lower latency. One of the goals of the 6G Internet will be to support one micro-second latency communications, representing 1,000 times faster -- or 1/1000th the latency -- than one millisecond throughput. The 6G technology market is expected to facilitate large improvements in the areas of imaging, presence technology and location awareness. Working in conjunction with AI, the computational infrastructure of 6G will be able to autonomously determine the best location for computing to occur; this includes decisions about data storage, processing and sharing.  Advantages of 6G over 5G 6G is expected to support 1 terabyte per second (Tbps) speeds. This level of capacity and latency will be unprecedented and wi...