To learn more about the development of coding and the emergence of low and no-code technologies, we asked Mamdouh Refaat, chief data scientist, to describe how he has seen the times and technology change, and what users can do now compared to what they could do when he was first starting out. Here’s what he had to say:
In the mid-1990’s when I changed my work focus from developing finite element analysis (FEA) algorithms (the field in which I hold my Ph.D. in engineering) to developing machine learning algorithms, we could do that only using the SAS coding language. At that time, I worked in a bank in Switzerland, and we called our field “data mining.” How “data mining” evolved to “data science” and machine learning is a different story. But in those days, we didn’t have many options; either users could program in low-level languages like C++ or Java or use the only high-level data modeling language available at that time, which was the SAS language. The only other language that was used to manipulate data was SQL provided by database engines. However, SQL didn’t provide any modeling features.
As the technology progressed, all the machine learning software vendors started developing tools for data scientists that didn’t require coding – instead, users could just point and click. Users expected these software tools would augment the SAS and SQL languages for data manipulation.
When R and then Python came on the scene, it was almost like driving in reverse. It was back to more coding than point and click. Nowadays, coding in Python is synonymous with data science. However, that’s not true, and that connotation shouldn’t be made.
Coding – either for data preparation or for developing of machine learning models – offers data scientists maximum flexibility and is limited only by the availability of libraries or repositories of ready-to-use code to perform the needed tasks. That’s where Python has been successful. Today, we can find a good implementation in Python that correlates to almost any task a data scientist can think of. With this statement, it’s easy to think Python is the end of story. But that’s not the case.
Because data scientists have done a good job showing that machine learning can be successfully applied to a multitude of problems, the demand for machine learning models has skyrocketed in the last decade; today, almost all large organizations either use machine learning or plan to very soon. But with this wider adoption came the repeated statement of the shortage in qualified data scientists to develop models and implement them. Responding to this challenge, machine learning vendors – like Altair – have invested in modeling tools that can be used by what Gartner calls the “citizen data scientist.” The low code/no code software tools are more suited to these users, who may not have the full theoretical knowledge of data science and machine learning techniques, but have good understanding of the organization data and business processes. However, the advanced data scientist who is comfortable with coding can also use the no-code software to increase productivity by developing a no-code model and then fine tuning via coding if needed.
This was one of the main strategic directions that Altair has taken over the last three years to support both coding experts and coding novices. With recent acquisitions, Altair now offers the option of low/no coding development of the entire lifecycle of modeling, from data acquisition to deployment. In addition, Altair’s tools offer users the ability to code in the languages of SQL, SAS, R, and Python. These two modes of working with data and developing models are seamlessly integrated into one workflow-based environment within our tools.
So, the (citizen/or not) data scientist/analyst can now work in both modes, and languages, while working in the same environment and cooperating seamlessly with other users.
Another challenge data scientists have been facing for some time is when working with more than one language or software, the data storage formats and the need for the tedious and error-prone task of import and export operations. For example, when users wanted to prepare the data using the SAS language and then use a no-code software for modeling, like Altair Knowledge Studio, they used to store the results in SAS dataset, then import the data to Knowledge Studio to develop, say, interactive decision trees. However, the market is now moving to automatic data conversion. For example, Altair products like Knowledge Studio and SmartWorks Analytics now automatically convert the data from one format to the other without burdening the user with the details. This is enabled by the ability to natively program within these products in the languages of Python, R, SQL, and SAS.
In conclusion, Altair answers the question in the title of this article by doing two things:
- Enabling users to use both no/low code and code-intensive workflows (in the languages of Python, R, SQL, and Python) in a streamlined, workflow-based set of products, and
- Automating the data conversion from one format to the other to enable the use of different languages, and coding/no-code operations on the data and for model development.