[{"id":1008,"title":"A simple form of Rayleigh pitot tube formula","sha":"b4c8cb632b9f24281ce59fe80f0c2f33","user_id":null,"state":"published","body":"Rayleigh pitot tube formula is a very basic and commonly used formula in aerodynamics. It gives the rate of the total pressure behind a normal shock wave and the pressure of freestream at a given Mach number. Although the relation is rigorous in theory, it is difficult to understand the changing relationships between these variables due to its complexity as shown in the above formula. A numerically equivalent formula has been discovered. It is very simple while it still retains a high-precision. The coefficient of determination between the two formula is greater than 99.9% in a domain sufficient for practical applications. The new formula was detected by a special machine learning method, functional learning. The function blocks are limited to rational functions. A divide and conquer scheme is also applied during the learning process.","created_at":"2022-09-23T00:19:09.139Z","updated_at":"2022-10-29T11:46:28.042Z","zenodo_id":7232620,"doi":"https://doi.org/10.5281/zenodo.7232620","subject":null,"tags":["aerodynamics","machine learning","rayleigh pitot tube"],"vote_count":1,"view_count":1,"score":1.1,"deleted":false,"muted":false,"tweeted":true,"attachment_file_name":"A_simple_form_of_Rayleigh_pitot_tube_formula.pdf","attachment_content_type":"application/pdf","attachment_file_size":91484,"attachment_updated_at":"2022-09-23T00:57:24.279Z"},{"id":1163,"title":"Synthetic Dataset Generation for Concept Drift Adaptation","sha":"9bbaa226b588ed5949a658feeb282467","user_id":null,"state":"published","body":"Concept drift, a phenomenon where the statistical properties of the target variable change over time, poses a significant challenge in data stream mining. The low amount of real word datasets with concept drift make this challenge harder on many researchers. This brief proposes an approach to generate synthetic datasets that incorporate concept drift, aiding in the training and testing of machine learning models for detecting and adapting to such drifts. The process involves defining the concept drift, generating synthetic data reflecting this drift, splitting the data into training and testing sets, and iteratively training, testing, and improving the model based on its performance. This approach aims to enhance the model’s adaptability to concept drift, thereby improving its predictive accuracy over time.","created_at":"2023-10-01T22:10:45.497Z","updated_at":"2023-11-16T08:11:43.267Z","zenodo_id":8423520,"doi":"https://doi.org/10.5281/zenodo.8423520","subject":null,"tags":["concept drift","incremental learning","data minning","machine learning","online learning"],"vote_count":1,"view_count":1,"score":1.1,"deleted":false,"muted":false,"tweeted":false,"attachment_file_name":null,"attachment_content_type":null,"attachment_file_size":null,"attachment_updated_at":null},{"id":1399,"title":"Capitalist capital allocation as meta-reward function for AI training model","sha":"cecf7e1646a5f412b2c60014d226ef87","user_id":null,"state":"published","body":"Machine learning protocols utilize rewards function during training as a means of tuning parameters toward obtaining desirable outputs from a model. One challenge for the current AI industry is the difficulty of translating real-world utility into reward functions for individual models that are trained primarily on training sets of online content, which seldom has an accurate valuation of its real-world utility attached to it. However, computing resources are allocated to different models based on their value to the companies producing and, at a higher layer, capital in the form of investment or revenue is preferentially allocated to those companies whose models are perceived as having higher potential for real-world utility. In essence, this process functions as a meta-reward function for the AI ecosystem overall.  ","created_at":"2024-07-24T10:18:55.003Z","updated_at":"2024-07-30T06:00:36.428Z","zenodo_id":13129560,"doi":"https://doi.org/10.5281/zenodo.13129560","subject":null,"tags":["ai","machine learning","economics"],"vote_count":0,"view_count":1,"score":0.1,"deleted":false,"muted":false,"tweeted":false,"attachment_file_name":null,"attachment_content_type":null,"attachment_file_size":null,"attachment_updated_at":null},{"id":797,"title":"Applying Machine Learning to Detect Code Quality Issues","sha":"b6bdcb6121c98e1d1a0bc0cc5870135d","user_id":null,"state":"published","body":"Detecting the potential problems in the code before the product is released can prevent the problems in production and lower the cost of the system operation. The automated code review tools are relying on detecting code patterns that are know to cause problems. This methods are unable to find new type of problems.\r\nWe can apply machine learning to the problem. The question would be how to find a good training set. The answer to this question would be: the public open source repositories (GitHub, Bitbucket, etc.). With each commit, the developer is providing the code fragment (the modifications of the original file) and the information about the change (fix, new feature, merge, etc.). \r\nWe can train model that will use the defect fixes to learn the pattens in which the code can be broken. We can extract the features from the code based on NLP, however we can go a step further and enhance the features. Since the code follows strict syntactical rules, we can generate features based on the parsed code tree (ex. level of nesting, recursion, etc.).\r\nThe model would return sections of the code that have high probability of having defect fixes in future. \r\n","created_at":"2021-01-28T15:48:08.006Z","updated_at":"2021-05-21T06:01:19.679Z","zenodo_id":4775046,"doi":"https://doi.org/10.5281/zenodo.4775046","subject":null,"tags":["ml","ai","machine learning","code quality"],"vote_count":0,"view_count":1,"score":0.1,"deleted":false,"muted":false,"tweeted":false,"attachment_file_name":null,"attachment_content_type":null,"attachment_file_size":null,"attachment_updated_at":null},{"id":722,"title":"Vein Dynamics to predict the immune response and susceptibility to disorders","sha":"665ce75e67bd2011f08c961a4f1978f4","user_id":null,"state":"published","body":"The immune system acts as a protective framework against the pathogenic microbes. Understanding the immune system can give us insights of how vulnerable someone is to foreign invaders. Amidst the recent events such as virus outbreaks it becomes highly advantageous if we pinpoint the individuals who are more prone to be affected.\r\n\r\nThe vein patterns such as finger veins or palm veins are amongst the most robust features which are [unique](https://www.mdpi.com/2078-2489/9/9/213) to individuals. These vascular networks which are determined by genetic and chromosomal patterns can give us more insights to a person’s response to fight foreign microbes. In order to understand how these vascular patterns can predict the immune response and susceptibility to foreign microbes we can make use of artificial intelligence and cutting-edge machine learning algorithms, which can be trained to correlate with immune markers such as C-reactive proteins, differential cell count, lymphocyte proliferation and subset distribution and red blood cell distribution width. A designated score sheet which categorizes the vein patterns with the immune markers can help us mark the groups that are more immunocompromised and thereby keep us prepared during pandemics. \r\n","created_at":"2020-03-25T03:00:11.873Z","updated_at":"2020-04-08T06:00:21.745Z","zenodo_id":3742668,"doi":"https://doi.org/10.5281/zenodo.3742668","subject":null,"tags":["vein patterns","immune response","machine learning","artificial intelligence"],"vote_count":0,"view_count":1,"score":0.1,"deleted":false,"muted":false,"tweeted":false,"attachment_file_name":null,"attachment_content_type":null,"attachment_file_size":null,"attachment_updated_at":null},{"id":437,"title":"Phylogenetic tree representation in n-dimensional space","sha":"5c1a60d0fbd361655eefbdf6592ebdf9","user_id":null,"state":"published","body":"Traditional phylogenetic trees are represented as bifurcating trees, where the leaf nodes represent taxa and the internal nodes represent common ancestors. Bifurcating trees offer advantages of interpreting common ancestors as well as being widely accepted; however, this representation could limit latent evolutionary relationships that may be present but are impossible to represent in a two-dimensional bifurcating tree.\r\n\r\nI propose that higher dimensional representations of phylogenetic trees be explored in order to show the complex evolutionary relationships that are present between taxa. Given the substantial amount of genetic data that could be used for phylogenetic studies, one could feasibly create methods that utilize these data and represent it in three or more dimensions. This higher dimensional representation of phylogenetic trees could offer new insights into traditionally difficult trees to reconstruct because it would offer more precise representations of the relationship between taxa than a bifurcating tree.\r\n\r\nLastly, this high dimensional representation of the tree could be reduced down to a traditional bifurcating tree representation through processes such as agglomerative clustering, which would allow the high dimensional representation to be compared to existing traditional trees. ","created_at":"2018-01-29T20:07:14.010Z","updated_at":"2018-01-30T06:01:09.781Z","zenodo_id":1162366,"doi":"https://doi.org/10.5281/zenodo.1162366","subject":null,"tags":["phylogenetics","machine learning","data analysis"],"vote_count":0,"view_count":1,"score":0.1,"deleted":false,"muted":false,"tweeted":true,"attachment_file_name":null,"attachment_content_type":null,"attachment_file_size":null,"attachment_updated_at":null},{"id":449,"title":"A service to assess the quality of documentation of open source software","sha":"4baaee8f0f211a3c98c3d2be18156fee","user_id":null,"state":"published","body":"Could one build a service that checks the completeness and quality of documentation of an open source repository?\r\n\r\nPotentially a group could build up a set of repositories with documentation ratings, which could then be used to train a ML/DL model, which could then be used to provide the service?\r\n\r\nThe service could be used by a developer to iterative improve documentation, then check the quality of the changes.\r\n\r\nIt could also be used by a potential user of the software as a factor in deciding whether to use the software or not.\r\n","created_at":"2018-03-27T11:18:15.933Z","updated_at":"2021-01-28T15:47:00.031Z","zenodo_id":1208715,"doi":"https://doi.org/10.5281/zenodo.1208715","subject":null,"tags":["open source","machine learning","documentation"],"vote_count":3,"view_count":1,"score":3.1,"deleted":false,"muted":false,"tweeted":true,"attachment_file_name":null,"attachment_content_type":null,"attachment_file_size":null,"attachment_updated_at":null},{"id":849,"title":"Backward stepwise elimination: A model-based method for nonlinear dimension reduction","sha":"538e3235e64732b3eff07455a6c1a11d","user_id":null,"state":"published","body":"In multidimensional data modeling, dimension reduction is not intuitive. Forward feature selection is usually deceptive. That is, a strongly related feature may have a small correlation coefficient (near zero) to the objective, especially when the target model is nonlinear. Therefore, we suggest using backward stepwise elimination (BSE) for dimension reduction. BSE is an iterative model-based method. It starts from an all-feature model and iteratively eliminates the feature which does not significantly affect the goodness of the surrogate model until no more features could be eliminated without downgrading the model goodness.  The surrogate model could be constructed with the extreme learning machine, Kriging, or some other machine learning techniques. The goodness of a model could be evaluated with the coefficient of determination $R^2$, or Nash-Sutcliffe coefficient E, etc. ","created_at":"2021-05-18T05:09:52.227Z","updated_at":"2021-08-08T21:13:04.914Z","zenodo_id":4775143,"doi":"https://doi.org/10.5281/zenodo.4775143","subject":null,"tags":["machine learning","dimension reduction","feature selection","backward stepwise elimination"],"vote_count":1,"view_count":1,"score":1.1,"deleted":false,"muted":false,"tweeted":false,"attachment_file_name":"An_illustrative_example.jpg","attachment_content_type":"image/jpeg","attachment_file_size":143726,"attachment_updated_at":"2021-05-18T05:11:34.159Z"},{"id":347,"title":"Applying Supervised Machine Learning to the Fight against Online Child Sexual Abuse","sha":"bdee9d1d92c75852cab4ff1c9e71368d","user_id":null,"state":"published","body":"Some reports of NGOs and anecdotal evidence suggest that child abuse materials (CAMs) share some characteristics extensively. They mostly take place indoor settings, victims' face or genitalia is visible and there are few visual clues about the abuser(s).\r\n\r\nFor the known CAMs, there are methods and projects to detect and remove them automatically such as PhotoDNA of Microsoft Inc. and Baseline project of Interpol. However, detection of new CAMs still heavily relies on outdated and inefficient methods such as user reporting of ordinary people.\r\n\r\nBy coding specific attributes of every known CAM such as indoor/outdoor setting, on bed/couch/table, and visibility of face/breast/genitalia throughly, supervised machine learning can be used in the detection of new CAMs. As the number of coded known CAMs increases, the algorithm would be better at determining the features of known CAMs due to mentioned similarities of them. Since the databases of Interpol and NCMEC have millions of known CAMs, intervention of supervisors would decrease. Naturally, the algorithm also might be helpful to detect new CAMs to some extent after the inclusion of all known CAMs. Besides faster victim identifications, this would also decrease the spreading rate of new CAMs throughout the internet remarkably if it is successful.","created_at":"2016-10-30T11:31:09.484Z","updated_at":"2016-11-22T06:00:29.042Z","zenodo_id":167646,"doi":"https://doi.org/10.5281/zenodo.167646","subject":null,"tags":["online child sexual abuse","machine learning","crime prevention","child abuse materials"],"vote_count":0,"view_count":1,"score":0.1,"deleted":false,"muted":false,"tweeted":false,"attachment_file_name":null,"attachment_content_type":null,"attachment_file_size":null,"attachment_updated_at":null},{"id":633,"title":"Dermatoglyphics to predict genetic disorders","sha":"b4ed72f1220f9d126a3b96f13b862820","user_id":null,"state":"published","body":"In recent times, genetic disorders have risen to be one of the significant causes of mortality. As we improve our understanding of the human genome, we see that nearly all diseases have a genetic component linked to them. Early diagnosis of these genetic diseases is crucial for successful treatment.\r\n\r\nThe dermatoglyphics or the fingerprints of a human individual are among the very few features that can uniquely identify an individual and are determined by various phenomena, including genetic and chromosomal factors. Thus the analysis of fingerprint patterns might reveal insights into the genetic makeup of an individual. Fingerprints are proven to act as reliable biological markers for several medical conditions such as hypertension and other coronary diseases (https://doi.org/10.1016/j.ihj.2018.07.007). Fingerprint patterns can be exploited using artificial intelligence and cutting-edge machine learning algorithms, which can be trained to detect markers that correspond to several genetic disorders. \r\n","created_at":"2019-12-26T19:17:31.082Z","updated_at":"2020-02-09T06:00:38.357Z","zenodo_id":3660060,"doi":"https://doi.org/10.5281/zenodo.3660060","subject":null,"tags":["fingerprints","genetic disorders","machine learning","artificial intelligence"],"vote_count":0,"view_count":1,"score":0.1,"deleted":false,"muted":false,"tweeted":false,"attachment_file_name":null,"attachment_content_type":null,"attachment_file_size":null,"attachment_updated_at":null}]